Analyzing SF Salaries Dataset with Deepnote
In this exercise, we explored the SF Salaries dataset using Pandas in Deepnote. This dataset contains information about salaries and job titles of employees in San Francisco from the years 2011 to 2014. Let’s walk through the tasks we performed:
Importing and Loading Data
First, we imported the Pandas library and loaded the dataset Salaries.csv into a DataFrame called sal.
Exploring the Dataset
We started by checking the first few rows of the dataset using .head() and also used .info() to get an overview of the DataFrame including data types and number of entries
The dataset consists of 148,654 entries with various columns including employee names, job titles, base pay, overtime pay, benefits, and more.
Data Analysis Tasks
Average BasePay:
Highest OvertimePay:
Job Title of JOSEPH DRISCOLL:
Total Pay Benefits of JOSEPH DRISCOLL:
Highest Paid Person (including benefits):
Lowest Paid Person (including benefits):
Average BasePay per Year:
Number of Unique Job Titles:
Top 5 Most Common Jobs:
Job Titles with only One Person in 2013:
Number of Job Titles containing ‘Chief’:
Bonus: Correlation between Job Title Length and Salary:
Conclusion
We successfully explored and analyzed the SF Salaries dataset using Pandas in Deepnote. Each task provided insights into different aspects of the dataset, showcasing the versatility and power of Pandas for data analysis tasks. Well done!