Analyzing salaries dataset with Deepnote

In this article, We will explore the SF Salaries dataset using Pandas in Deepnote. This dataset contains information about salaries and job titles of San Francisco employees from 2011 to 2014. Let’s walk through the tasks we performed:

Importing and loading data

We imported the Pandas library and loaded the dataset Salaries.csv into a DataFrame called sal.

Which you can find here.

import pandas as pd
sal = pd.read_csv('Salaries.csv')

Exploring the dataset

We started by checking the first few rows of the dataset using .head() and also used .info() to get an overview of the DataFrame including data types and the number of entries.

sal.head()
sal.info()

The dataset consists of 148,654 entries with various columns including employee names, job titles, base pay, overtime pay, benefits, and more.

Data analysis tasks

Average base pay:

sal['BasePay'].mean()

Highest overtime pay:

sal['OvertimePay'].max()

Job title of JOSEPH DRISCOLL:

sal[sal['EmployeeName'] == 'JOSEPH DRISCOLL']['JobTitle']

Total pay benefits of JOSEPH DRISCOLL:

sal[sal['EmployeeName'] == 'JOSEPH DRISCOLL']['TotalPayBenefits']

Highest paid person (including benefits):

sal.loc[sal['TotalPayBenefits'].idxmax()]

Lowest paid person (including benefits):

sal.loc[sal['TotalPayBenefits'].idxmin()]

Average base pay per year:

sal.groupby('Year').mean()['BasePay']

Number of unique job titles:

sal['JobTitle'].nunique()

Top 5 most common jobs:

sal['JobTitle'].value_counts().head(5)

Job titles with only one person in 2013:

sum(sal[sal['Year'] == 2013]['JobTitle'].value_counts() == 1)

Number of job titles containing ‘Chief’:

def chief_string(title):
    if 'chief' in title.lower():
        return True
    else:
        return False

sum(sal['JobTitle'].apply(lambda x: chief_string(x)))

Bonus: Correlation between job title length and salary:

sal['title_len'] = sal['JobTitle'].apply(len)
sal[['title_len', 'TotalPayBenefits']].corr()

Example notebook

Example app

Conclusion

We explored and analyzed the SF Salaries dataset using Pandas in Deepnote. Each task provided insights into different aspects of the dataset, showcasing the versatility and power of Pandas for data analysis tasks. If you encounter any issues, please get in touch with our support. Happy dataset analyzing in Deepnote!