Demographic data analizer

Created by Darío López Díaz. Work in progress..

The idea of this project is to analize demographic data, which consist of education, race, income and work time per week data of people from different countries. We will compute several values of interest, along the data. The data frame is coming from: Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt

The data looks in the following way:

df = pd.read_csv('adult.data.csv') df.head()

print(df.groupby('race').size()) sns.countplot(data=df,x='race') plt.xticks(rotation=45) plt.show()

df[['age','sex']].set_index('sex').drop(index='Female').mean()

df.filter(items=['education']).value_counts()['Bachelors'] / df.filter(items=['education']).value_counts().sum() * 100

Salary_Degree = df.filter(items=['education','salary']).value_counts()['Bachelors']+df.filter(items=['education','salary']).value_counts()['Masters']+df.filter(items=['education','salary']).value_counts()['Doctorate'] NumOf_Degree = df.filter(items=['education']).value_counts()['Bachelors']+df.filter(items=['education']).value_counts()['Masters']+df.filter(items=['education']).value_counts()['Doctorate'] ((Salary_Degree / NumOf_Degree)*100)['>50K']

Non_Advance = df.filter(items=['education','salary']).value_counts().drop(labels=['Bachelors','Masters','Doctorate']) Non_Advance_More50 = Non_Advance.drop(labels=['<=50K'],level=1).sum() Non_Advance_Less50 = Non_Advance.drop(labels=['>50K'],level=1).sum() Non_Advance_More50 / (Non_Advance_More50 + Non_Advance_Less50) * 100

df['hours-per-week'].min()

Min_Hours_More50 = df[df['hours-per-week']==1]['salary'].value_counts()['>50K'] Min_Hours_Total = df[df['hours-per-week']==1]['salary'].value_counts().sum() Min_Hours_More50 / Min_Hours_Total *100

People_Over50byCountry = df[['native-country','salary']].set_index('salary').drop(labels='<=50K').groupby('native-country').size() print(People_Over50byCountry.idxmax()) People_Over50byCountry.max() / People_Over50byCountry.sum() * 100

People_More50 = df[['occupation','native-country','salary']].set_index('salary').drop(labels='<=50K') People_More50_India = People_More50.set_index('native-country').loc['India'] People_More50_India.groupby('occupation').size().idxmax()

.css-15w88e5{color:var(--chakra-colors-fg-neutral-primary);font-weight:inherit;letter-spacing:-0.09px;}Demographic data analizer

Demographic data analizer