Apartment Prices in Bogotá
In this project I am going to make a model to predict the apartment prices in Bogotá using the size of the apartment.
First I am going to explore the data.
Import the libraries
Read the data.
The data was obtained from the Properarti web page.
It is necessary to use the properties that are apartments, with a surface covered major than 0, those must be located in Bogotá, the currency used is COP, the operation type is Venta and the price is different from zero. The duplicates are erased.
The data frame head.
Apartamento En Venta En Bogota Cedritos Cod. VINH2865
Apartamento En Venta En Bogota Santa Teresa Cod. VMIS1052002
Apartaestudio En Venta En Bogota Samper Mendoza Cod. VREI-16820
Apartaestudio En Venta En Bogota Chapinero Central Cod. VPRE17172
Apartamento En Venta En Bogota Cedritos Cod. VINH2867
In the surfaced covered box plot is obvious that there are many outliers. It is necessary to remove them.
With a histogram of the surface covered we can analyze the Distribution of the apartment sizes.
With a scatter plot price versus area it is possible to analyze if there is a correlation between Area and Price.
There is a correlation between price and area according with the graphic. The correlation coefficient is 0.8. It is possible to say that there is a strong correlation between the two variables.
First the data is separated in the feature X, surfaced covered, and the target y, price.
Then the X and y is split in X train, y train and X test and y test.
Build the model
First the Baseline is calculated.
The y pred baseline is drawing in the graphic of Price vs Area.
Now it is calculated the mean absolute error of the baseline.
Baseline MAE: 353776409.62
It is used Linear Regression. We instantiate it and fit it.
First we predict with the X train.
It is calculated the mean absolute error of training.
Training MAE: 178709833.34
We can see the model beats the baseline in 175066576.
Finally, we evaluate the model with the test data.
It is calculated the mean absolute error of test data.
Test MAE: 169316334.44
We can see the model beats the baseline in 184460075.
Finally the results are communicated. We obtain the model intercept and coefficient.
Model Intercept: -188181644.07
Model coefficient for "surface_covered_in_m2": 8023314.5
The next is the formula for the apartment price.
apt_price = -188181644.07 + 8023314.5 * surface_covered
We plot the graphic of the formula in the plot of Price vs Surface.
We can see that our model is capable to predict the prices of the apartments in Bogotá.