0
10-5894942
male
1
41-1676468
female
2
64-6396924
male
3
35-2426788
male
4
60-9387304
male
5
67-3666190
female
6
27-7702214
female
7
46-2257650
male
8
40-1499649
male
9
67-7378468
male
Trabajo Práctico Integrador
Análisis exploratorio
0
10-5894942
male
1
41-1676468
female
2
64-6396924
male
3
35-2426788
male
4
60-9387304
male
5
67-3666190
female
6
27-7702214
female
7
46-2257650
male
8
40-1499649
male
9
67-7378468
male
0
10-5894942
male
1
41-1676468
female
2
64-6396924
male
3
35-2426788
male
4
60-9387304
male
1013
82-7312119
male
1014
45-3445439
male
1015
02-3651562
male
1016
05-5203587
female
1017
13-3347050
male
count
1011.0
1011.0
mean
66.48071216617211
69.06330365974283
std
15.326879704379337
14.694107007851635
min
13.0
27.0
25%
56.0
60.0
50%
67.0
70.0
75%
77.0
79.0
max
100.0
100.0
Original: 1018 filas
Cantidad de filas duplicadas: 18
Final: 1000 filas
Index(['id', 'gender', 'race/ethnicity', 'parental level of education',
'lunch', 'employed', 'test preparation course', 'math score',
'physics score', 'chemistry score', 'algebra_score'],
dtype='object')
Index(['gender', 'race/ethnicity', 'parental level of education', 'lunch',
'employed', 'test preparation course', 'math score', 'physics score',
'chemistry score', 'algebra_score'],
dtype='object')
Gender 0
Ethnicity 0
Parental level of education 0
Lunch 0
Employed 0
Test preparation course 0
Math score 7
Physics score 7
Chemistry score 7
Algebra score 7
dtype: int64
Gender 0
Ethnicity 0
Parental level of education 0
Lunch 0
Employed 0
Test preparation course 0
Math score 0
Physics score 0
Chemistry score 0
Algebra score 0
dtype: int64
Antes: 993 filas
Math score 21.0
Physics score 19.0
Chemistry score 21.0
Algebra score 19.0
dtype: float64
Después: 984 filas
/tmp/ipykernel_248/1586991544.py:6: FutureWarning: Automatic reindexing on DataFrame vs Series comparisons is deprecated and will raise ValueError in a future version. Do `left, right = left.align(right, axis=1, copy=False)` before e.g. `left == right`
df = df[~((df < (Q1 - 1.5 *IQR)) | (df > (Q3 + 1.5 * IQR))).any(axis=1)]
/tmp/ipykernel_248/1586991544.py:6: FutureWarning: Automatic reindexing on DataFrame vs Series comparisons is deprecated and will raise ValueError in a future version. Do `left, right = left.align(right, axis=1, copy=False)` before e.g. `left == right`
df = df[~((df < (Q1 - 1.5 *IQR)) | (df > (Q3 + 1.5 * IQR))).any(axis=1)]
Math score Physics score Chemistry score Algebra score
Math score 1.000000 0.812055 0.798312 0.916674
Physics score 0.812055 1.000000 0.951536 0.968358
Chemistry score 0.798312 0.951536 1.000000 0.964652
Algebra score 0.916674 0.968358 0.964652 1.000000
Respondiendo preguntas
Ejemplo: ¿Hay alguna relación entre el promedio de notas obtenidas y el hecho de haber realizado el curso preparatorio?
0
male
group A
1
female
group D
2
male
group E
3
male
group B
4
male
group E
5
female
group D
6
female
group A
7
male
group E
8
male
group D
9
male
group C
Realizaron el curso: 332
No realizaron el curso: 652
Conclusión: Si bien la cantidad de alumnos que no realizo el curso casi duplica a la de quienes lo han completado, esta diferencia no se ve reflejada significativamente en el promedio de las notas.
Se recomienda auditar los contenidos del curso, a fines de lograr una mejora en el rendimiento académico y aumentar el interés del alumnado.