450+ Practice Questions From Pandas, NumPy, and SQL.

Introduction

This notebook has been created for you to practice three of the most common tools used in building any machine learning or data science applications, i.e., Pandas, NumPy, and SQL!

The practice questions provided will serve as a great resource for those who are looking to familiarize themselves with some of the most common functions used in these tools.

Appropriate descriptions have been provided for all the questions in this entire exercise which will help you navigate through this exercise easily. If a dataset is to be loaded in the python environment, that has also been provided for you. You can find it on the right panel in the Files section. Do NOT delete any of the files/folders listed there.

The whole exercise has been divided into nine separate notebooks. Below are the links to all the other notebooks for you to jump from one notebook to another:

How to use this notebook?

At the top right corner, you will find a Duplicate button. This will allow you to create a unique notebook for your own practice and write solutions to the question listed in this notebook.

Let's begin 🚀!

SQL Notebook 1

import pandas as pd

Run to view results

data = pd.read_csv("employee_dataset.csv") data.head()

Run to view results

Sample Query

select * from data; -- select all rows and columns from the data.

Run to view results

1. Print the first five rows

SELECT * FROM data LIMIT 5;

Run to view results

2. Print the number of rows

select count(*) as row_count from data;

Run to view results

3. Print the first five rows of the Company_Name and Employment_Status Column

select Company_Name,Employment_Status from data LIMIT 5;

Run to view results

4. Print the first five rows where the Employee_Rating > 4.5

SELECT * FROM data where Employee_Rating>4.5 LIMIT 5;

Run to view results

5. Print the number of rows having Employee_Salary > 600000

select count(*) as row_count from data where Employee_Salary>600000;

Run to view results

6. Print the number of rows with Employee_Salary > 600000 and Employee_Rating > 4.5

select count(*) as row_count from data where Employee_Salary>600000 and Employee_Rating>4.5;

Run to view results

7. Print the first five rows with Employee_Salary > 600000 and Employee_Rating > 4.5

select * from data where Employee_Salary>600000 and Employee_Rating>4.5 LIMIT 5;

Run to view results

8. Print all the distinct companies in the dataset

select distinct Company_Name from data

Run to view results

9. Print the number of distinct companies in the dataset

select count(distinct Company_Name) as num_dis_comp from data;

Run to view results

10. Print all the distinct companies, city pairs in the dataset

select DISTINCT Company_Name,Employee_City from data;

Run to view results

11. Print the number of distinct companies, city pairs in the dataset

SELECT count(DISTINCT (Company_Name,Employee_City)) as cd from data;

Run to view results

12. Print the number of Full time employees in the dataset.

select count(*) as num_full_time_emp from data where Employment_Status='Full Time';

Run to view results

13. Print the number of employees with job title either 'Production engineer' or 'New Russellton'.

SELECT count(*) as num_of_emp from data where Employee_Job_Title='Production engineer' or Employee_Job_Title='Russelton';

Run to view results

14. Print the number of employees with job title either 'Production engineer' and company name 'Scott Inc'

SELECT count(*) as num_of_emp_job from data where Employee_Job_Title='Production engineer' and Company_Name='Scott Inc';

Run to view results

15. Print the number of employees with job title either 'Production engineer' or 'New Russellton' and company name either 'Scott Inc' or 'Baker, Allen and Edwards'.

SELECT count(*) as count from data where Employee_Job_Title='Production engineer' or Employee_Job_Title='New Russelton' and Company_Name='Scott Inc' or Company_Name='Baker, Allen and Edwards';

Run to view results

16. Print the number of distinct cities with employees having job title either 'Production engineer' or 'New Russellton' and company name either 'Scott Inc' or 'Baker, Allen and Edwards'.

SELECT count(distinct (Employee_City,Employee_Job_Title)) from data where Employee_Job_Title='Production engineer' or Employee_Job_Title='New Russellton' and Company_Name='Scott Inc' or Company_Name='Baker, Allen and Edwards';

Run to view results

17. Print the number of Intern employees in the dataset.

SELECT count(*) as ci from data where Employee_Job_Title='Intern';

Run to view results

18. Print the number of employees with first name 'Matthew'.

SELECT COUNT(*) AS num_matthews FROM data WHERE SUBSTRING(name FROM 1 FOR POSITION(' ' IN name) - 1) = 'Matthew';

Run to view results

19. Print the first five rows corresponding to the employees with highest salary

SELECT * FROM ( SELECT * FROM data WHERE Employee_Salary IN (SELECT DISTINCT Employee_Salary FROM data ORDER BY Employee_Salary DESC LIMIT 5) ) AS highest_salaries ORDER BY Employee_Salary DESC LIMIT 5;

Run to view results

20. Print the first five rows corresponding to the employees with the highest salary in 'James and Sons' company

SELECT DISTINCT Name,Company_Name,Employee_Salary from data where data.Company_Name='James and Sons' ORDER BY Employee_Salary DESC limit 5;

Run to view results

21. Print the first five rows corresponding to the employees with the highest salary working either in 'James and Sons' company or living in 'Wardfort' city

select DISTINCT Name,Company_Name,Employee_Salary,Employee_City from data where data.Company_Name='James and Sons' OR data.Employee_City='Wardfort' ORDER BY Employee_Salary DESC LIMIT 5;

Run to view results

22. Print the total number of distinct records in the data.

SELECT COUNT(*) AS total_distinct_records FROM ( SELECT DISTINCT * FROM data );

Run to view results

23. Print the mean salary of all the employees in the data

SELECT AVG(Employee_Salary) AS mean_salary from data;

Run to view results

24. Print the mean rating of all the employees in the data

select avg(Employee_Rating) as mean_rating from data;

Run to view results

25-27. Print the maximum, minimum and median Employee_Salary.

1. maximum salary

SELECT max(Employee_Salary) as max_sal from data;

Run to view results

2. minimum salary

SELECT min(Employee_Salary) as min_sal from data;

Run to view results

3. median of salary

Run to view results

28-32. Print the distribution of the following columns: (the frequency of individual entries).

1. Company_Name

Run to view results

2. Employee_Job_Title

Run to view results

3. Employee_City

Run to view results

4. Employee_Country

Run to view results

5. Employment_Status

Run to view results

33. Print the company with the most number of employees.

Run to view results

34. Print the number of employees in the above company.

Run to view results

35. Print the company with the least number of employees.

Run to view results

36. Print the number of employees in the above company.

Run to view results

37. Print the employee details with the maximum salary

Run to view results

38. Print the employee details with the maximum rating

Run to view results

39. Print the Company_Name with most number of employees in 'Wardfort' city.

Run to view results

40. Print 'Employee_Salary' column as string.

Run to view results

41. Print the Employee_City with the most number of 'Production engineer'.

Run to view results

42. Print the Company_Name with the most number of Full-time Employees.

Run to view results

43. Print the Company_Name with the highest average 'Employee_Rating'.

Run to view results

44. Print the number of employees working in 'Ricardomouth' and 'Kristaburgh' location combined.

Run to view results

45. Print the distinct Company_Name corresponding to the 5 highest paid employees in the dataset.

Run to view results

46. Check if any of the columns has NULL values.

Run to view results

47. Print the data type of every column in the data.

Run to view results

48. Print the number of employees with Employee_Rating greater than the average Employee_Rating

Run to view results

49. Find the employee which has the maximum salary among the ones with the minimum Employee_Rating

Run to view results

50. Sort the table in ascending order of Employee_Salary

Run to view results

51. Sort the table in descending order of Employee_Rating

Run to view results

52. Print the name of 100th employee after sorting on Name column

Run to view results

53. Print the first 5 rows of the first 5 columns.

Run to view results

54. Print the number of employees whose first name starts with the letter 'V'.

Run to view results

55. Print the number of employees whose last name starts with the letter 'R'.

Run to view results

56. Select the rows 2 to 7 and the columns 3 to 7 (both included)

Run to view results

.css-15w88e5{color:var(--chakra-colors-fg-neutral-primary);font-weight:inherit;letter-spacing:-0.09px;}450+ Practice Questions From Pandas, NumPy, and SQL.

Introduction

How to use this notebook?

SQL Notebook 1

Sample Query

1. Print the first five rows

2. Print the number of rows

3. Print the first five rows of the Company_Name and Employment_Status Column

4. Print the first five rows where the Employee_Rating > 4.5

5. Print the number of rows having Employee_Salary > 600000

6. Print the number of rows with Employee_Salary > 600000 and Employee_Rating > 4.5

7. Print the first five rows with Employee_Salary > 600000 and Employee_Rating > 4.5

8. Print all the distinct companies in the dataset

9. Print the number of distinct companies in the dataset

10. Print all the distinct companies, city pairs in the dataset

11. Print the number of distinct companies, city pairs in the dataset

12. Print the number of Full time employees in the dataset.

13. Print the number of employees with job title either 'Production engineer' or 'New Russellton'.

14. Print the number of employees with job title either 'Production engineer' and company name 'Scott Inc'

15. Print the number of employees with job title either 'Production engineer' or 'New Russellton' and company name either 'Scott Inc' or 'Baker, Allen and Edwards'.

16. Print the number of distinct cities with employees having job title either 'Production engineer' or 'New Russellton' and company name either 'Scott Inc' or 'Baker, Allen and Edwards'.

17. Print the number of Intern employees in the dataset.

18. Print the number of employees with first name 'Matthew'.

19. Print the first five rows corresponding to the employees with highest salary

20. Print the first five rows corresponding to the employees with the highest salary in 'James and Sons' company

21. Print the first five rows corresponding to the employees with the highest salary working either in 'James and Sons' company or living in 'Wardfort' city

22. Print the total number of distinct records in the data.

23. Print the mean salary of all the employees in the data

24. Print the mean rating of all the employees in the data

25-27. Print the maximum, minimum and median Employee_Salary.

28-32. Print the distribution of the following columns: (the frequency of individual entries).

33. Print the company with the most number of employees.

34. Print the number of employees in the above company.

35. Print the company with the least number of employees.

36. Print the number of employees in the above company.

37. Print the employee details with the maximum salary

38. Print the employee details with the maximum rating

39. Print the Company_Name with most number of employees in 'Wardfort' city.

40. Print 'Employee_Salary' column as string.

41. Print the Employee_City with the most number of 'Production engineer'.

42. Print the Company_Name with the most number of Full-time Employees.

43. Print the Company_Name with the highest average 'Employee_Rating'.

44. Print the number of employees working in 'Ricardomouth' and 'Kristaburgh' location combined.

45. Print the distinct Company_Name corresponding to the 5 highest paid employees in the dataset.

46. Check if any of the columns has NULL values.

47. Print the data type of every column in the data.

48. Print the number of employees with Employee_Rating greater than the average Employee_Rating

49. Find the employee which has the maximum salary among the ones with the minimum Employee_Rating

50. Sort the table in ascending order of Employee_Salary

51. Sort the table in descending order of Employee_Rating

52. Print the name of 100th employee after sorting on Name column

53. Print the first 5 rows of the first 5 columns.

54. Print the number of employees whose first name starts with the letter 'V'.

55. Print the number of employees whose last name starts with the letter 'R'.

56. Select the rows 2 to 7 and the columns 3 to 7 (both included)

450+ Practice Questions From Pandas, NumPy, and SQL.