How does the employment rate in the agricultural sector impact different crop productions in different countries?
by Shan Zhou (szhou2@falcon.bentley.deu)
Welcome to the online repository of my final project for MA705. This codebase is designed to explore the research question: "How does the employment rate in the agricultural sector impact different crop productions in different countries?" It includes data preprocessing, merging techniques, and statistical analysis methods to examine the relationship between agricultural employment rates and crop yields across various nations. The datasets used are sourced from OECD. The code is structured into modules for data cleaning, merging, analysis, and visualization, ensuring ease of navigation and comprehension. This repository not only demonstrates my analytical capabilities but also serves as a resource for those interested in the intersection of labor economics and agricultural productivity.
Import Libraries and Datasets
First, import necessary libraries. We will primarily be using Pandas for data manipulation as well as Numpy, Matplotlib, Stats, and Sklearn for doing mathematical calculations, plotting figures, and modeling, respectively.
Run to view results
Next, import the necessary datasets. I used the OECD to scrape statistics for Employment by activity and Crop Production. These are the datasets I found:
Run to view results
Run to view results
Data Cleaning
Run to view results
Run to view results
Data Merging
Merging two datasets involves aligning them based on common attributes, in this case, country and year. The 'crop_production.csv' file encompasses data spanning from 1990 to 2030, covering 38 countries. On the other hand, 'employment_by_activity.csv' contains data from 1955 to 2022, involving 44 countries. By employing an inner merge approach, it is possible to extract and combine the overlapping data that exists in both files, thereby focusing on the countries and years common to both datasets.
Run to view results
Following the merger of the two datasets, the consolidated data now includes information from 16 countries. However, there is a noticeable variation in the starting year of crop production data for these countries. To ensure uniformity and consistency in the dataset, it is essential to select only those countries with complete data spanning from 1990 to 2022. This approach will align the data across all countries for the specified period, providing a more consistent and comparable dataset for analysis.
Run to view results
Exploratory Data Analysis (EDA)
Time Series Analysis: This will involve examining how crop production and the number of employees in the agricultural sector have changed over time in each country.
Run to view results
Interpretation: 1. Inverse Relationship for Wheat: There is a notable inverse relationship between wheat production and the number of employees in the agricultural sector across various countries. As wheat production increases, the number of employees tends to decrease. 2. Trends in the United States: This inverse relationship between production and employment is consistent across different crop types in the United States, suggesting a broader trend within the country's agricultural sector. 3. Exaggerated Changes in Australia (AUS): Australian data shows significant fluctuations in both production and employment. These changes are more pronounced than in other countries. 4. Steady Decline in Japan (JPN): In Japan, both maize production and the number of employees in agriculture have been steadily decreasing over time.
To gain a deeper understanding beyond what was revealed in the exploratory data analysis (EDA), a statistical analysis was conducted to further investigate the relationships implied by the initial findings and more directly address the research question.
Correlation Analysis
When analyzing the relationship between employment rates and crop production, conducting a correlation analysis is an essential first step. This statistical method allows us to quantify the strength and direction of the association between these two variables. By calculating the Pearson correlation coefficient, we can determine whether changing in the number of employees is associated with an increase or decrease in crop production and how strong this association is. Additionally, the p-value obtained alongside the coefficient helps us understand whether the observed correlation is statistically significant. This analysis is crucial because it informs us whether a more detailed exploration, like regression analysis, is warranted. It helps to initially identify patterns or trends in the data, guiding further, more complex analyses and decision-making processes in agricultural planning and policy formulation.
Run to view results
Below is a bar graph, for better visual of the correlation result.
Run to view results
Interpretation: JPN – MAIZE, KOR – RICE, and KOR-SOYBEAN shows very strong positive correlation coefficient of 0.9211, 0.8540, and 0.7667 respectively, with a p-value close to 0 indicates a very strong positive relationship between the number of employees and crop production in those countries, which is statistically significant. On other hand, KOR-WHEAT and USA-MAIZE shows very strong negative correlation coefficient of 0.7888 and 0.6733, with a p-value close to 0 indicates a very strong positive relationship between the number of employees and maize production in Korea, which is statistically significant.
Regression Analysis
Based on the result from correlation analysis, JPN - MAIZE, KOR - RICE, KOR - SOYBEAN, KOR - WHEAT, and USA - MAIZE, these pairs stand out either because of the strength of their correlations or their significance levels, indicating a more robust relationship that could be explored through regression analysis. Particularly, the pairs with very strong correlations (above 0.7 or below -0.7) are prime candidates because the employment rate is likely to be a strong predictor of crop production for these crops in these countries.
Run to view results
Interpretation: The results of the regression analysis show different dynamics for different country-crop pairs. In Japan, there is a slight increase in maize production with additional labor, and a high R2 suggests that labor is a strong predictor of production. In Korea, labor has a substantial positive impact on rice production, with a significant R2 indicating its importance. However, for soybean production in Korea, the impact of labor is less pronounced and other factors are also important. Surprisingly, increased labor seems to have a negative effect on wheat production in Korea, suggesting possible inefficiencies. In the US, corn production has a strongly negative coefficient, suggesting a decrease in production per additional worker, possibly due to overstaffing or technological progress. The relatively low R2 for the US suggests that factors other than labor have a significant impact on production levels. These results underscore the complexity of the relationship between labor and agricultural production, which varies considerably across crops and countries.