¿Para que sirve la estadística descriptiva?
Estadística Descriptiva vs Inferencial
Estas 2 ramas de la estadística se diferencian en:
¿Puedes mentir con estadística?
¿Por qué aprender estadística?
Plan del curso
Estadística descriptiva para analítica
Tipos de datos
Se comprenden en 2 categorías fundamentales:
import pandas as pd
pd.options.display.float_format = '{:.2f}'.format
df = pd.read_csv('../datasets/cars.csv')
df
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 38531 entries, 0 to 38530
Data columns (total 30 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 manufacturer_name 38531 non-null object
1 model_name 38531 non-null object
2 transmission 38531 non-null object
3 color 38531 non-null object
4 odometer_value 38531 non-null int64
5 year_produced 38531 non-null int64
6 engine_fuel 38531 non-null object
7 engine_has_gas 38531 non-null bool
8 engine_type 38531 non-null object
9 engine_capacity 38521 non-null float64
10 body_type 38531 non-null object
11 has_warranty 38531 non-null bool
12 state 38531 non-null object
13 drivetrain 38531 non-null object
14 price_usd 38531 non-null float64
15 is_exchangeable 38531 non-null bool
16 location_region 38531 non-null object
17 number_of_photos 38531 non-null int64
18 up_counter 38531 non-null int64
19 feature_0 38531 non-null bool
20 feature_1 38531 non-null bool
21 feature_2 38531 non-null bool
22 feature_3 38531 non-null bool
23 feature_4 38531 non-null bool
24 feature_5 38531 non-null bool
25 feature_6 38531 non-null bool
26 feature_7 38531 non-null bool
27 feature_8 38531 non-null bool
28 feature_9 38531 non-null bool
29 duration_listed 38531 non-null int64
dtypes: bool(13), float64(2), int64(5), object(10)
memory usage: 5.5+ MB
df.dtypes
df.describe()
df.describe(include='all')
Medidas de tendencia central
Metáfora de Bill Gates en un bar
Medidas de tendencia central en Python
df['price_usd'].mean()
df['price_usd'].median()
df['price_usd'].plot.hist(bins=20)
import seaborn as sns
sns.displot(df, x='price_usd', hue='manufacturer_name')
sns.displot(df, x = 'price_usd', hue = 'engine_type')
sns.displot(df, x = 'price_usd', hue = 'engine_type', multiple = 'stack')
df.groupby('engine_type').count()
Q7_df = df[(df['manufacturer_name']=='Audi') & (df['model_name']=='Q7')]
Q7_df
sns.histplot(Q7_df, x = 'price_usd', hue = 'year_produced')