450+ Practice Questions From Pandas, NumPy, and SQL.
Introduction
This notebook has been created for you to practice three of the most common tools used in building any machine learning or data science applications, i.e., Pandas, NumPy, and SQL!
The practice questions provided will serve as a great resource for those who are looking to familiarize themselves with some of the most common functions used in these tools.
Appropriate descriptions have been provided for all the questions in this entire exercise which will help you navigate through this exercise easily. If a dataset is to be loaded in the python environment, that has also been provided for you. You can find it on the right panel in the Files section. Do NOT delete any of the files/folders listed there.
The whole exercise has been divided into nine separate notebooks. Below are the links to all the other notebooks for you to jump from one notebook to another:
How to use this notebook?
At the top right corner, you will find a Duplicate button. This will allow you to create a unique notebook for your own practice and write solutions to the question listed in this notebook.
Let's begin 🚀!
SQL Notebook 1
Sample Query
1. Print the first five rows
2. Print the number of rows
3. Print the first five rows of the Company_Name and Employment_Status Column
4. Print the first five rows where the Employee_Rating > 4.5
5. Print the number of rows having Employee_Salary > 600000
6. Print the number of rows with Employee_Salary > 600000 and Employee_Rating > 4.5
7. Print the first five rows with Employee_Salary > 600000 and Employee_Rating > 4.5
8. Print all the distinct companies in the dataset
9. Print the number of distinct companies in the dataset
10. Print all the distinct companies, city pairs in the dataset
11. Print the number of distinct companies, city pairs in the dataset
12. Print the number of Full time employees in the dataset.
13. Print the number of employees with job title either 'Production engineer' or 'New Russellton'.
14. Print the number of employees with job title either 'Production engineer' and company name 'Scott Inc'
15. Print the number of employees with job title either 'Production engineer' or 'New Russellton' and company name either 'Scott Inc' or 'Baker, Allen and Edwards'.
16. Print the number of distinct cities with employees having job title either 'Production engineer' or 'New Russellton' and company name either 'Scott Inc' or 'Baker, Allen and Edwards'.
17. Print the number of Intern employees in the dataset.
18. Print the number of employees with first name 'Matthew'.
19. Print the first five rows corresponding to the employees with highest salary
20. Print the first five rows corresponding to the employees with the highest salary in 'James and Sons' company
21. Print the first five rows corresponding to the employees with the highest salary working either in 'James and Sons' company or living in 'Wardfort' city
22. Print the total number of distinct records in the data.
23. Print the mean salary of all the employees in the data
24. Print the mean rating of all the employees in the data
25-27. Print the maximum, minimum and median Employee_Salary.
1. maximum salary
2. minimum salary
3. median of salary
28-32. Print the distribution of the following columns: (the frequency of individual entries).
1. Company_Name
2. Employee_Job_Title
3. Employee_City
4. Employee_Country
5. Employment_Status