Homework 2: Data Cleaning Basics
In this homework, we will be going over a couple of concepts we learned in lecture 2! Everything in this homework was covered in lecture so feel free to reference the slides and remember you can also google!
Lecture slides: https://docs.google.com/presentation/d/1TIML5REJKThU4XVHe28m2-zy-NGByi-9Y8upwGGPGrs/edit#slide=id.g215ffca71a4_0_1
0
0
4.8
1
1
6.4
2
2
5.2
3
3
8.1
4
4
4.6
5
5
5.4
6
6
nan
7
7
6.5
8
8
6.9
9
9
4.2
Question 1:
0
0
4.8
1
1
6.4
2
2
5.2
7
7
6.5
8
8
6.9
13
13
6.6
14
14
7.1
15
15
6.5
17
17
6.4
21
21
5.2
Question 2:
0
0
4.8
1
1
6.4
2
2
5.2
7
7
6.5
8
8
6.9
13
13
6.6
14
14
7.1
15
15
6.5
17
17
6.4
21
21
5.2
Question 3:
1
1
6.4
2
2
5.2
7
7
6.5
8
8
6.9
13
13
6.6
14
14
7.1
15
15
6.5
17
17
6.4
21
21
5.2
25
25
5.1
Question 4
Remove duplicate values and irrelevant data/observations.
Fix data types by using type() and asType()
filter any unwanted outliers, which are values over Q3 + 1.5 * IQR or any values under Q1 - 1.5 * IQR
handle all missing data by dropping the observations if the dataset is large, but if the dataset is not large, input values based on either the mean, median, linear regression, or any value.
validate
CONGRATS! You've finished the coding part of your homework. For the last part of your homework include a summary of 5 things you learned from this week's DSS's Article:
Ultimately, we can't prevent students from using A.I. chatbots, as students will always find a way to take advantage of it.
ChatGPT can actually be used as a teaching aid, almost as a calculator and help students out in a beneficial way.
People can accept ChatGPT with open arms as there are academic uses for a chatbot while learning.
It can be important to allow students to have AI exposure as it is almost guaranteed students will encounter AI in the real world.
Students can often teach themselves by using ChatGPT.