Employee churn prediction
Analyze employee churn. Find out why employees are leaving the company, and learn to predict who will leave the company.
In the past, most of the focus was on the ‘rates’ such as attrition rate and retention rates. HR Managers compute the previous rates and try to predict future rates using data warehousing tools. These rates present the aggregate impact of churn, but this is the half picture. Another approach can be the focus on individual records in addition to aggregate.
In customer churn, you can predict who and when a customer will stop buying. Employee churn is similar to customer churn. It mainly focuses on the employee rather than the customer. Here, you can predict who, and when an employee will terminate the service. Employee churn is expensive, and incremental improvements will give significant results. It will help us in designing better retention plans and improving employee satisfaction.
What is Employee Churn?
Employee churn can be defined as a leak or departure of an intellectual asset from a company or organization. Alternatively, in simple words, you can say, when employees leave the organization is known as churn. Another definition can be when a member of a population leaves a population, which is known as churn.
In Research, it was found that employee churn will be affected by age, tenure, pay, job satisfaction, salary, working conditions, growth potential, and employee perceptions of fairness. Some other variables such as age, gender, ethnicity, education, and marital status, were essential factors in the prediction of employee churn. In some cases such as the employee with a niche, skills are harder to replace. It affects the ongoing work and productivity of existing employees. Acquiring new employees as a replacement has its costs such as hiring costs and training costs. Also, the new employee will take time to learn skills at a similar level of technical or business expertise knowledge as an older employee. Organizations tackle this problem by applying machine learning techniques to predict employee churn, which helps them in taking necessary actions.
The following points help you to understand, employee and customer churn in a better way:
The business chooses the employee to hire someone while in marketing you don’t get to choose your customers.
Employees will be the face of your company, and collectively, the employees produce everything your company does.
Losing a customer affects revenues and brand image. Acquiring new customers is difficult and costly compared to retaining existing customers. Employee churn is also painful for companies and organizations. It requires time and effort in finding and training a replacement.
Employee churn has unique dynamics compared to customer churn. It helps us in designing better employee retention plans and improving employee satisfaction. Data science algorithms can predict future churn.
Importing modules
Importing dataset
About Dataset
This dataset has 14,999 employees, and 10 attributes(6 integer, 2 float, and 2 objects). No variable column has null/missing values.
Description of each columns:
satisfaction_level: It is employee satisfaction point, which ranges from 0-1. last_evaluation: It is evaluated performance by the employer, which also ranges from 0-1. number_projects: How many numbers of projects assigned to an employee? average_monthly_hours: How many average numbers of hours worked by an employee in a month? time_spent_company: time_spent_company means employee experience. The number of years spent by an employee in the company. work_accident: Whether an employee has had a work accident or not. promotion_last_5years: Whether an employee has had a promotion in the last 5 years or not. Departments: Employee's working department/division. Salary: Salary level of the employee such as low, medium and high. left: Whether the employee has left the company or not.
Data Insights
In the given dataset, we have two types of employee one who stayed and another who left the company. So, you can divide data into two groups and compare their characteristics. Here, you can find the average of both the groups using groupby() and mean() function.
Employees who left the company had low satisfaction level, low promotion rate, low salary, and worked more compare to who stayed in the company.
Data Visualization Employees Left
Let's check how many employees were left?
Here, you can see out of 15,000 approx 3,571 were left, and 11,428 stayed. The no of employee left is 23 % of the total employment.
Number of projects
Most of the employee is doing 3 to 5 projects
Time spent in company
Most of the employee experience between 2-4 years. Also, there is a massive gap between 3 years and 4 years experienced employee.
Subplots using searborn (univariate analysis)
You can observe the following points in the above visualization:
Comparing all the features against Target variable ( "left" )
You can observe the following points in the above visualization:
Data Analysis and Visualization summary
Following features are most influencing a person to leave the company:
Cluster Analysis
Let's find out the groups of employees who left. You can observe that the most important factor for any employee to stay or leave is satisfaction and performance in the company. So let's bunch them in the group of people using cluster analysis.
Here, Employee who left the company can be grouped into 3 type of employees:
Prediction Model
Encoding categorical data
The salary column in the dataset is low medium high.
Lots of machine learning algorithms require numerical input data, so you need to represent categorical columns in a numerical column.
In order to encode this data, I am mapping each value to a number. e.g. Salary column's value can be represented as low:0, medium:1, and high:2.
This process is known as label encoding, and sklearn conveniently will do this for you using LabelEncoder.
Splitting Train and test data
To understand model performance, dividing the dataset into a training set and a test set.
Model Building (Employee churn prediction model)
Using Gradient Boosting Classifier
Evaluating Model Performance
Conclusion
Well, you got a classification rate of 97%, considered as good accuracy.
Precision: Precision is about being precise, i.e., how precise your model is. In other words, you can say, when a model makes a prediction, how often it is correct. In your prediction case, when your Gradient Boosting model predicted an employee is going to leave, that employee actually left 95% of the time.
Recall: If there is an employee who left present in the test set and your Gradient Boosting model can identify it 92% of the time.