Module 1 Homework
Charles Gagnon, U00316286
- Download Anaconda for Python or R
- Copy the data in Problem 2.8 into a CSV file
- Import your CSV file and create a dataframe
- Describe your dataframe (statistically and in words)
- Normalize your dataframe
- Create a scatter plot of your data. Include both the interpolated and regular plots.
- Split your data into 60% training and 40% validation
- Upload your Jupyter notebook or simply a text file of your code.
Next we import "prob_2.8.csv" in the /income/ dataframe. The data in the CSV is from Table 2.18 (Shmueli).
Field | Description |
---|---|
AGE | Age in years |
INCOME | Income in US dollars (USD) |
References:
Shmueli, Galit,Bruce, Peter C.,Gedeck, Peter,Patel, Nitin R.. Data Mining for Business Analytics (Kindle Locations 1595-1597). Wiley. Kindle Edition.