Final Project: Looking for Airbnbs in Los Angeles?
By: Krtin Juneja and Elizabeth Veilleux
Abstract:
This project focuses on data about Airbnbs in Los Angeles. We both love to travel and the next area we want to travel to is Los Angeles. We are searching for an Airbnb in the Los Angeles area that we can rent. Since we are college students we tend to not have the biggest income coming in so we want to find an Airbnb that is rather cheap. This led us to thinking about what factors play a role in Airbnb prices increasing. Is it the amount of bedrooms? How about the average rent price of the area? Or maybe just the area or room type? We want to determine this with the data we currently have available to us. SO, we will be running a multiple linear regression to see what does contribute to the price of an Airbnb in Los Angeles along with creating an Airbnb price predictor with machine learning. Additionally, we will see if certain neighborhoods tend to have significantly higher prices.
About the Datasets:
We have a dataset dealing with Los Angeles Airbnb data. The listings dataset gives various details about the Airbnb rental itself along with host information. Additionally, we have a dataset with rent prices for LA neighborhoods from 2010-2016 (we will be focusing on just 2016 data). This rent prices dataset is useful because it provides rent amounts for LA neighborhoods which we can utilize as a predictor since it gives some insight on how real estate and cost is normally for the area. However, it is important to note that the Airbnb data is from 2020-2021 so there will be some discrepancies as the rent would definitely be hire in 2020-2021 due to factors like inflation.
GOAL: To see what are the best predictors of Airbnb rental prices in Los Angeles and if certain neighborhoods are more pricey.
Loading in the Data:
Cleaning the data:
Multiple Linear Regression:
Intercept:
-382.5153626993822
Coefficients:
[ 3.77936532e+01 2.17147009e+02 -2.92019654e+01 -1.72431912e+02
-1.09265314e+02 3.60870110e+02 -7.91728846e+01 2.15727008e-01]
OLS Regression Results
==============================================================================
Dep. Variable: price R-squared: 0.239
Model: OLS Adj. R-squared: 0.239
Method: Least Squares F-statistic: 1219.
Date: Tue, 29 Jun 2021 Prob (F-statistic): 0.00
Time: 14:36:22 Log-Likelihood: -2.1001e+05
No. Observations: 27155 AIC: 4.200e+05
Df Residuals: 27147 BIC: 4.201e+05
Df Model: 7
Covariance Type: nonrobust
===================================================================================
coef std err t P>|t| [0.025 0.975]
-----------------------------------------------------------------------------------
const -306.0123 14.345 -21.332 0.000 -334.130 -277.894
accommodates 37.7937 2.577 14.667 0.000 32.743 42.844
bedrooms 217.1470 5.371 40.430 0.000 206.620 227.674
beds -29.2020 3.445 -8.478 0.000 -35.953 -22.451
Entire Property -248.9350 12.571 -19.803 0.000 -273.574 -224.296
Private Room -185.7684 12.380 -15.006 0.000 -210.034 -161.503
Hotel Room 284.3670 42.000 6.771 0.000 202.045 366.689
Shared Room -155.6760 18.272 -8.520 0.000 -191.490 -119.862
Amount 0.2157 0.008 28.718 0.000 0.201 0.230
==============================================================================
Omnibus: 48273.835 Durbin-Watson: 1.764
Prob(Omnibus): 0.000 Jarque-Bera (JB): 72218492.949
Skew: 12.808 Prob(JB): 0.00
Kurtosis: 254.340 Cond. No. 2.01e+18
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 1.77e-26. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
Predicting Airbnb Price with Machine Learning
accommodates 113.975227
bedrooms 113.975227
beds 113.975227
Entire Property 113.975227
Private Room 113.975227
Hotel Room 113.975227
Shared Room 113.975227
Amount 113.975227
dtype: float64
/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel_launcher.py:17: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel_launcher.py:18: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
The proportionn of the predictions that were correct is: 0.0032091750841750843
The value of F1 is: 0.41
accommodates 113.975227
bedrooms 113.975227
beds 113.975227
Entire Property 113.975227
Private Room 113.975227
Hotel Room 113.975227
Shared Room 113.975227
Amount 113.975227
dtype: float64
/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel_launcher.py:17: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel_launcher.py:18: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
The proportionn of the predictions that were correct is: 0.00319135878237388
The value of F1 is: 0.42
Determining if certain LA neighborhoods have significantly higher prices
/shared-libs/python3.7/py/lib/python3.7/site-packages/numpy/core/fromnumeric.py:3622: RuntimeWarning: Degrees of freedom <= 0 for slice
**kwargs)
/shared-libs/python3.7/py/lib/python3.7/site-packages/numpy/core/_methods.py:226: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)