🏭 The Engineering Project with Python
Project Description
In this project, I'll pretend I've been recently hired as a data analyst for a manufacturing/engineering /science company. More specifically, I've been hired as a data analyst for a mining company called Metals R' Us & I've been given data from their flotation plant.
Topics/Skills covered
✅ Variables ✅ Print statements ✅ Mathematical Operations ✅ Functions ✅ Loops ✅ IDE's ✅ Libraries ✅ Read in data w/ Pandas ✅ Descriptive analytics w/ Pandas ✅ Filtering w/ Pandas ✅ Data visualization
💾 Dataset
You can find more information about the dataset and can download it using this link (https://www.kaggle.com/datasets/edumagalhaes/quality-prediction-in-a-mining-process?resource=download)
1. Import the libraries and the dataset.
3. Counting the number of rows and Columns
Our dataset has 737,453 rows and 24 columns (Attributes)
4. Indexing Data frame (Rows & Columns)
For example, if I needed only lines 100-104, & all the columns, I'd use:
4. Working with Dates
5. 📊 Descriptive Analytics
My boss has asked me to give some summary statistics for each column.
The % Iron Concentrate is the most important variable. But my engineer peer tells me that the % Silica Concentrate, Ore Pupl pH, & Flotation Column 05 Level are all really important as well. My boss says something weird happened on June 1, 2017, & wants me to investigate.
I need to pair the data down to only have these columns & rows between these two dates.
6. Pair Plot (Multiple Scatters) & Correlation
I want to see if there are any relations among the important variables. The graph above doesn't seem to show any important correlation among the variables.
This can be confirmed with a correlation matrix & noticing all the correlation values are low.
7. 📈 Line Charts
Your boss is a bit confused & wants to see the data to help him understand more. He wants to see how the % Iron Concentrate changes throughout that day. In this case, I will use a line chart to visualize the data.