Data Preparation
Set Up
Importing packages for analysis:
Import the dataset:
Basic Inspection:
Data Cleaning - Part 1
Removal of excess columns and null value handling
Removal of columns not needed for analysis.
Null values in 'price' should be removed due to inability to accurately calculate potential value of listing and low frequency of NaN.
Neighbourhood cannot be reliably identified through its neighbourhood group. The 15 records of null values will be dropped from the analysis.
Null 'Neighbourhood Group' handling
Neighbourhood Group Formatting & Null Handling:
Left Join to replace values:
Data Cleaning - Part 2
Remove Duplicates
Remove Outliers
Checking the min/max of availability365 for invalid integers:
Data Prep for Analysis
Data Type Conversion
Price needs to be turned nurmeric before Revenue is created. Price has a US currency sign and commas that need to be removed to accomplish this.
Adding Revenue
Adding Customer Review rating
Total Records Dropped
Data Analysis
Neighbourhoods
Manhattan & Brooklyn & Queens account for 95.5% of the listings. Manhattan neighbourhoods have the largest presence in the top 15, followed by Brooklyn and 1 for Queens. These neighbourhoods and groups represent the top areas for current Airbnb listings.
Potential Revenue
What is the average listing price for a neighbourhood?
Brooklyn does not appear in the top 20 neighbourhoods by average revenue for a listing. Its large presence in the market and low variability in average revenue indicates that the pricing is lower per listing. A closer look at the availability in these markets will provide more information on a need for more listings. Recommendations will likely include a portfolio of high and low revenue generating properties based on demand for certain neighbourhoods.
Market Saturation
Using a neighbourhood's average availability can provide insight to which areas have low availability and can accomodate more listings for best performance. Compare its avg availability to avg revenue.
Brooklyn and Manhattan have the lowest avg availability. This could make them potential investments. Brooklyn has demand since it is ranked 2nd by count of listings and it has a low average availability. New listings in this market would be on the lower end of price but likely to perform well in the market. Manhattan has low availability as well with the most listings in the market.
Manhattan, Bronx, and Queens having promising regression lines that indicate higher revenue listings have less availability.
Get coefficients of above subsets to support!
Listing Type
Room Type
Private room and entire home/apt are the most popular type of listings. These should likely be focused for investment opportunity.
Price Variability
There is huge variability in pricing but consistent trends across IQR. Median for Shared Rooms is slightly higher in price than the median of others.
Filtering for Manhattan:
Hotel rooms are the most expensive but not likely the type of investment being investigated. Surprisingly, shared rooms have the next highest average revenue potential.
Customer Reviews
Are there any themes with the top rated listings? What gets the most ratings?
Insights
Revenue has equity among the neighbourhood groups
Invest in Manhattan and Queens
Popular Neighbourhoods
Pricing of Neighbourhoods
Looking at only Manhattan & Queens, I've created a top 20 neighbourhood list by avg revenue.
Average Availability
Lower availability neighbourhoods will allow for more listings to enter the market.
Joining lists to find neighbourhoods that are both low in availability and high in revenue.
Room Type: Price & Availability
Creating smaller df with investment neighbourhoods
Room type visual for investment neighbourhoods (low availability, high revenue):
Entire home/Apt is 3 quarters of the market followed by Private Room.
Instant Bookable
Recommendations
Investing in Manhattan & Queens
Neighbourhood groups: (1) Manhattan & (2) Queens
Neighbourhoods:
Room Type: Entire home/Apt (Manhattan) & Private Room (Queens). Most suitable for investment
Instant Bookable: Doesn't matter to the consumer. Leans slightly towards not.
Approximately how many listings will be needed to achieve potential $2M in annual revenue? ???????
Export to csv for further analysis.
Constraint: Cost to buy the listing cannot be known, therefore actual profits and cost to execute cannot be determined.
Project is to recommend best type of listings for good market performance.