Trabajo Práctico Integrador
Análisis Exploratorio
0
10-5894942
male
1
41-1676468
female
2
64-6396924
male
3
35-2426788
male
4
60-9387304
male
5
67-3666190
female
6
27-7702214
female
7
46-2257650
male
8
40-1499649
male
9
67-7378468
male
0
10-5894942
male
1
41-1676468
female
2
64-6396924
male
3
35-2426788
male
4
60-9387304
male
1013
82-7312119
male
1014
45-3445439
male
1015
02-3651562
male
1016
05-5203587
female
1017
13-3347050
male
count
1011.0
1011.0
mean
66.48071216617211
69.06330365974283
std
15.326879704379337
14.694107007851635
min
13.0
27.0
25%
56.0
60.0
50%
67.0
70.0
75%
77.0
79.0
max
100.0
100.0
Original: 1018filas
Cantidad de filas duplicadas: 18
Final: 1000filas
Index(['id', 'gender', 'race/ethnicity', 'parental level of education',
'lunch', 'employed', 'test preparation course', 'math score',
'physics score', 'chemistry score', 'algebra_score'],
dtype='object')
Index(['gender', 'race/ethnicity', 'parental level of education', 'lunch',
'employed', 'test preparation course', 'math score', 'physics score',
'chemistry score', 'algebra_score'],
dtype='object')
Gender 0
Ethnicity 0
Parental level of education 0
Lunch 0
Employed 0
Test preparation course 0
Math score 0
Physics score 0
Chemistry score 0
Algebra_score 0
dtype: int64
Antes: 993 filas/n
Math score 21.0
Physics score 19.0
Chemistry score 21.0
Algebra_score 19.0
dtype: float64
/nDespués: 984 filas
/tmp/ipykernel_73/1937933279.py:6: FutureWarning: Automatic reindexing on DataFrame vs Series comparisons is deprecated and will raise ValueError in a future version. Do `left, right = left.align(right, axis=1, copy=False)` before e.g. `left == right`
df = df[~((df < (Q1-1.5 * IQR)) | (df > (Q3 + 1.5 * IQR))).any(axis=1)]
/tmp/ipykernel_73/1937933279.py:6: FutureWarning: Automatic reindexing on DataFrame vs Series comparisons is deprecated and will raise ValueError in a future version. Do `left, right = left.align(right, axis=1, copy=False)` before e.g. `left == right`
df = df[~((df < (Q1-1.5 * IQR)) | (df > (Q3 + 1.5 * IQR))).any(axis=1)]
Math score Physics score Chemistry score Algebra_score
Math score 1.000000 0.812055 0.798312 0.916674
Physics score 0.812055 1.000000 0.951536 0.968358
Chemistry score 0.798312 0.951536 1.000000 0.964652
Algebra_score 0.916674 0.968358 0.964652 1.000000
Respondiendo preguntas
Ejemplo: ¿Hay alguna relación entre el promedio de notas obtenidas y el hecho de haber realizado el curso preparatorio?
/tmp/ipykernel_73/1476807842.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['Average Score'] = df.mean(axis = 1)
0
male
group A
1
female
group D
2
male
group E
3
male
group B
4
male
group E
5
female
group D
6
female
group A
7
male
group E
8
male
group D
9
male
group C
Conclusiones: Si bien la cantidad de alumnos que no realizó el curso preparatorio casi duplica a la de quiene lo han completado, esta diferencia no se ve reflejada significativamente en el promedio de notas.
Se recomienda auditar los contenidos del curso, a fines de lograr una mejora en el rendimiento académico y aumentar el interés del alumnado.