# Apartment Prices in Bogotá

In this project I am going to make a model to predict the apartment prices in Bogotá using the size of the apartment.

First I am going to explore the data.

Import the libraries

Read the data.

The data was obtained from the Properarti web page.

It is necessary to use the properties that are apartments, with a surface covered major than 0, those must be located in Bogotá, the currency used is COP, the operation type is Venta and the price is different from zero. The duplicates are erased.

The data frame head.

0

Apartamento

Apartamento En Venta En Bogota Cedritos Cod. VINH2865

1

Apartamento

Apartamento En Venta En Bogota Santa Teresa Cod. VMIS1052002

2

Apartamento

Apartaestudio En Venta En Bogota Samper Mendoza Cod. VREI-16820

3

Apartamento

Apartaestudio En Venta En Bogota Chapinero Central Cod. VPRE17172

4

Apartamento

Apartamento En Venta En Bogota Cedritos Cod. VINH2867

## Explore

In the surfaced covered box plot is obvious that there are many outliers. It is necessary to remove them.

With a histogram of the surface covered we can analyze the Distribution of the apartment sizes.

With a scatter plot price versus area it is possible to analyze if there is a correlation between Area and Price.

There is a correlation between price and area according with the graphic. The correlation coefficient is 0.8. It is possible to say that there is a strong correlation between the two variables.

## Model

First the data is separated in the feature X, surfaced covered, and the target y, price.

Then the X and y is split in X train, y train and X test and y test.

## Build the model

First the Baseline is calculated.

The y pred baseline is drawing in the graphic of Price vs Area.

Now it is calculated the mean absolute error of the baseline.

```
Baseline MAE: 353776409.62
```

## Iterate

It is used Linear Regression. We instantiate it and fit it.

## Evaluate

First we predict with the X train.

It is calculated the mean absolute error of training.

```
Training MAE: 178709833.34
```

We can see the model beats the baseline in 175066576.

Finally, we evaluate the model with the test data.

It is calculated the mean absolute error of test data.

```
Test MAE: 169316334.44
```

We can see the model beats the baseline in 184460075.

## Communicate Results

Finally the results are communicated. We obtain the model intercept and coefficient.

```
Model Intercept: -188181644.07
```

```
Model coefficient for "surface_covered_in_m2": 8023314.5
```

The next is the formula for the apartment price.

```
apt_price = -188181644.07 + 8023314.5 * surface_covered
```

We plot the graphic of the formula in the plot of Price vs Surface.

We can see that our model is capable to predict the prices of the apartments in Bogotá.