Overview
This is a simple exploratory data analysis. the dataset used in the analysis is from onkod petroleum company and is collected manually by the author. the main focus in this EDA is data visualization. data cleaning was minimal in this project because of the small size of the dataset and the absence of missing and dirty data.
Exploratory Data Analysis
The daily price per unit progression of both gas and patrol colored by the pump number. the first line is the patrol and the lower one is the gas.
This is total liters sold of both gas and patrol grouped by day and sorted from the highest value to the lowest.
A line chart visualization of the above data. there is no much information that we can extract from this line graph due to the limited amount of data that we are using. we will increase our dataset and see if we could spot any meaningful patterns.
The chart below is the distribution of the petrol liters sold daily colored with the number of the pump. in the p1 pump, 80-160 liters were the most frequent numbers in the dataset. also in pone pump, 240-200 liters were the most frequent. none of the pumps has reached a 300 liters in this dataset.
The chart below is the distribution of the total amount of revenue calculated directly from the dataset not the daily revenue reported by the pump admins colored the reporting pump number. the average daily revenue calculated is between 1.5 million to 2 millions. the currency is of the local country.
The chart below is the distribution of the gas liters sold daily colored by the shift; '0' being night shift and '1' being day shift. it looks like 50 liters is the average in the night shifts and 50-100 liters is the most frequent in the day shift. the is also some outliers in the day shift. we have only five instances that are larger than 150 liters.
The chart below is the distribution of the patrol liters sold daily colored by the shift; '0' being night shift and '1' being day shift. it looks like 80-120 liters are the average in the night shifts and 140-180 liters were the most frequent in the day shift. there is also 10 instances in the day shifts were number of liters were larger then 200 liters.
Relationships Between Our Features.
In this section, we will investigate the relationships that exist among our features and plot the ones that are strong and interesting.
Lets see the correlation coefficient of all features in the dataset by using the corr function which used Pearson coefficient by default.
Lets visualize the above data using correlation matrix table . this is much easier to read and understand. it also displays the correlation coefficients of the entire variables. from here we could easily pick any pair of variables of interest and investigate their relationship separately. we would not need some of these pairs because the relationship is just obvious. for example, the more patrol liters we sell the more money we make, and the opposite is correct. that is why we are seeing a strong positive correlation between pms(patrol) and pms-sales columns.
This is a scatter plot of the correlation between the the total number of liters sold per day and the total cash reported in that day. the correlation is almost a perfect positive. which is understandable, the more liters we sell, the more cash we generate.
This is the visualization of the correlation between pms(PETROL) and the daily total sales revenue which also includes the ago(DIESEL). the correlation is a strong positive. it also shows that we sell very high number of petrol liters in the day shifts than the night shifts.
This is the visualization of the correlation between ago(DIESEL) and the daily total sales revenue which also includes the pms(PETROL). the correlation is a strong positive and stronger than the above correlation. it also shows that we sell more diesel in day shifts and we do in night shifts.
In the correlation matrix table, the relationship between the shift(0 for night and 1 for day) and the numner of petrol liters we sell per shift which is 0.809, very strong positive correlation. it seems from the scatter plot below that we sell more liters of petrol in the day shifts than we do in the night shifts. i also checked the correlation between the number of liters of ago (Diesel) we sell per shift and the shift and it is also positive correlation of about 0.49.