Apartment Prices in Bogotá (Price - Size, Location, and Neighborhood)
In this project I am going to make a model to predict the apartment prices in Bogotá using the size, the location and the neighborhood of the apartment.
First I am going to explore the data.
Import the libraries
Read the data.
The data was obtained from the Properarti web page.
It is necessary to use the properties that are apartments, with a surface covered major than 0, those must be located in Bogotá, the currency used is COP, the operation type is Venta and the price is different from zero. The properties selected are that which are between quantile 0.1 and 0.9, this is to remove the outliers. The duplicates are erased.
Explore
We explore the information of the dataframe.
We draw a scatter map box to see the location of the apartment. We can see that there are zones where the prices are higher.
We calculate the correlation matrix, and then ,we make a heatmap to see which variables are correlated.
There are no strong correlation between variables.
Model
First the data is separated in the features X, surfaced covered, lat and lon; and the target y, price.
Then the X and y is split in X train, y train and X test and y test.
Build the model
First the Baseline is calculated.
Now it is calculated the mean absolute error of the baseline.
Iterate
We make a pipeline with a One Hot Encoder, a Simple Imputer and Ridge.
We fit the model
Evaluate
First we predict with the X train.
It is calculated the mean absolute error of training.
We can see the model beats the baseline in 183441221.
Finally, we evaluate the model with the test data.
It is calculated the mean absolute error of test data.
Communicate Results
Finally, the results are communicated. We make a function that can be used to make prediction using the data that the user have.