Basic Statistics in Python with Pandas

1. Reading a dataset

# In this cell we are only importing some usefull libraries import pandas as pd # Data analytics manipulation tool (use dataframes) import seaborn as sns # Package for visualization

# Here, we will read our cars.csv file df = pd.read_csv('cars.csv')

2. Exploring columns types

df.describe()

df["price_usd"].mean()

df["price_usd"].median()

df["price_usd"].plot.hist(bins = 20)

sns.displot(df, x = 'price_usd', hue = 'engine_type', multiple = 'stack')

df.groupby('engine_type').count()

df.value_counts()

df_audi_q7 = df[(df['manufacturer_name'] == 'Audi') & (df['model_name'] == 'Q7')] sns.histplot(df_audi_q7, x = 'price_usd', hue = 'year_produced')

3. Standard deviation and quantiles

If we take the median of an ordered list of values, and average the squares of all values in the list minus the median, we will have the "variation". The square root of the variation is called "standard deviation"

# Standard deviation df['price_usd'].std()

# Range is the maximum - minimum value max_val = df['price_usd'].max() min_val = df['price_usd'].min() rg = max_val - min_val print(f'Max = {max_val}, Min = {min_val}, Range = {rg}')

# Quantiles: Remember we the quantile 2 is the median median = df['price_usd'].median() Q1 = df['price_usd'].quantile(0.25) Q3 = df['price_usd'].quantile(0.75) min_val = df['price_usd'].quantile(0.0) max_val = df['price_usd'].quantile(1.0) print(min_val, Q1, median, Q3, max_val)

# Inter quantile range: range where are the majority of elements iqr = Q3 - Q1 iqr

sns.histplot(df, x = 'price_usd')

# sns.boxplot(df['price_usd']) sns.boxplot(data = df, x = 'price_usd')

sns.boxplot(data = df, x = 'engine_fuel', y = 'price_usd')

sns.displot(df, x = 'engine_fuel', y = 'price_usd')

sns.histplot(df, hue = 'engine_fuel', x = 'price_usd')

.css-15w88e5{color:var(--chakra-colors-fg-neutral-primary);font-weight:inherit;letter-spacing:-0.09px;}Basic Statistics in Python with Pandas

1. Reading a dataset

2. Exploring columns types

3. Standard deviation and quantiles

Basic Statistics in Python with Pandas