Analyzing SF Salaries Dataset with Deepnote
In this exercise, we explored the SF Salaries dataset using Pandas in Deepnote. This dataset contains information about salaries and job titles of employees in San Francisco from the years 2011 to 2014. Let’s walk through the tasks we performed:
Importing and Loading Data
First, we imported the Pandas library and loaded the dataset Salaries.csv into a DataFrame called sal.
Run to view results
Exploring the Dataset
We started by checking the first few rows of the dataset using .head() and also used .info() to get an overview of the DataFrame including data types and number of entries
Run to view results
The dataset consists of 148,654 entries with various columns including employee names, job titles, base pay, overtime pay, benefits, and more.
Data Analysis Tasks
Average BasePay:
Run to view results
Highest OvertimePay:
Run to view results
Job Title of JOSEPH DRISCOLL:
Run to view results
Total Pay Benefits of JOSEPH DRISCOLL:
Run to view results
Highest Paid Person (including benefits):
Run to view results
Lowest Paid Person (including benefits):
Run to view results
Average BasePay per Year:
Run to view results
Number of Unique Job Titles:
Run to view results
Top 5 Most Common Jobs:
Run to view results
Job Titles with only One Person in 2013:
Run to view results
Number of Job Titles containing ‘Chief’:
Run to view results
Bonus: Correlation between Job Title Length and Salary:
Run to view results
Conclusion
We successfully explored and analyzed the SF Salaries dataset using Pandas in Deepnote. Each task provided insights into different aspects of the dataset, showcasing the versatility and power of Pandas for data analysis tasks. Well done!