Introduction.
Employee retention strategies for some employers around the world have so far fallen short irrespective of employee head count / company size. Thankfully though, presently-available data goes a way to mitigate low staff attrition rates and help focus on future attrition resulting in happier, more effective staff members who will remain with the company for a longer time period. The data collected for this project will help me focus on the overall satisfaction level of the staff by gaining deeper insights into their last evaluation scores, their number of projects, their daily hours, monthly hours, time spent at the company, whether or not they have experienced an accident at work, whether or not they have had a promotion in the last five years, as well as the department they work in and their annual salary. So, this data tells a part of the story, but enough to make a positive difference to the company and the staff members within it.
The data.
14,999 non-null features across 10 columns with only some light column label treatment required.
Outliers.
There are no data points that could be reasonably classed as outliers.
Statistics and correlations.
The satisfaction level looks like a grade of 0-1 with a mean of 0.61, so, satisfaction is slightly above average here.
The number of projects max is 7 with a mean of 3.8 and a low standard deviation, meaning the projects are reasonably well spread-out.
Work accident, left and promotion_last_years all appear to be binary columns (with no 25%, 50% or 75%).
Correlation heatmap.
I will look at "left" as the target variable for this project as employee retention is the most important factor for me. Satisfied employees do still leave and any strategy a company can implement to lessen employee turnover is ultimately a good one.
So, the three strongest correlations to the "left" variable are:
• 1: "time_spend_company".
• 2: "average_monthly_hours".
• 3: "number_project".
Analysis.
Data volume / distribution.
• Satisfaction level is quite evenly spread with a spike in the 0.1-0.11 region making up for the trough it precedes. That is a spike of 693 employees experiencing low satisfaction so that will be important.
• There is a dollop of data in the last evaluation column below 0.6, but like the satisfaction level, things look otherwise even and it doesn't look as though any ML model will struggle to parse this information too much so far.
• Number of projects four, three, five, two, six and 7seven, in that order, are the most stacked project amounts.
• Three years is the amount of time the majority of employees in this dataset have remained at the company. From there we see two years, then four years onwards decreasing in volume as the years progress to the penultimate value, nine years. A small spike of 214 employees remain at ten years making that timeframe the sixth-most common tenure, following the six year value.
• The majority of employees here did not experience an accident at work.
• 3,571 employees left the company, 11,428 did not.
• 319 employees had a promotion in the last five years.
Pair plot of all values against the target variable, 'left'.
With the orange hues representative of employees who left the company, we see:
• Lower evaluation scores, plus a relatively large amount of high evaluation scores.
• A spike in number of projects == 2 followed by an immediate dip and rise up to 5-6 projects.
• A peak in the low average monthly hours.
• Peaks in the lower ranges of time spent at the company, diminishing completely in the upper ranges.
• No work accidents.
• No promotion in the last 5 years and a low salary.
Satisfaction level vs. left.
Probably as one would expect, the median satisfaction figure for those who left is much lower for those who remained; 0.41 compared to the median of 0.69 for those who remained at the business, and the Q1 is quite a high 0.13 for those who left, compared to a figure of 0.54 for those who remained in employ.
With that in mind, the spike of (888) employees leaving in the low satisfaction range of 0.08 - 0.11 is tall and sharp.
There is a cluster of 1178 employees leaving at satisfaction levels 0.36 - 0.45, so getting to grips with what other variables might shed further light on why staff with satisfaction levels in the middle-range leave will be interesting, if that information is interpretable.
There is another, wider spread of around 900 employees between satisfaction levels 0.72 - 0.93 who also left. Staff leaving due to low satisfaction is logical and their reasons are normally pretty common across the majority of unsatisfied staff in most industries, but comparatively satisfied staff leaving the business raises questions of a different type.
Last evaluation vs. left.
Here we see a bit of a slump in employees leaving in the middle (0.6 - 0.8) evaluation groups. This could be for many different reasons as per, but in my experience I would say that it could possibly be due to this being "The comfort zone", where the feedback for some staff members was good, but not exceptional. In some cases, the pressure stemming from good evaluation scores to continually perform (sometimes with the addition of extra work once the employee has proved reliable) can have an adverse effect on an employee's mental health, and consequently, their performance. We don't know if that is the case here - this is an area I will look more into vs. projects & time spent working - but it has happened in my experience and is at least one concept worthy of a little consideration.
Number of projects vs. left.
The values for 'Left' (in blue):
• Represent the largest volume for those with two projects.
• Dissipate for those with three projects.
• Begin showing more volume consecutively for those with four projects, five projects, and again for those with six projects.
• Although the bar for those with seven projects looks quite small, there is no accompanying blue bar to denote any employees with seven projects remained at the company. The figure of zero for those with eight projects exists due to that project amount not being recorded.
Number of projects and average monthly hours.
There are max and min outliers for those who left working certain project amounts over the month:
• Those with 2 projects working hours above 200 PCM.
• Those with four, five, six and seven projects working under 200 hours PCM.
• And a couple of staff members with five projects working over 300 hours PCM.
Evaluation score vs. average monthly hours.
• The cluster of employees on the bottom-left in yellow who have a last evaluation score of 0.4 - 0.6 have between 125 and 160 monthly hours to their names.
• The second cluster of yellow markers in the top-right mostly have very good evaluation scores (above 0.7) and worked over 200 hours.
There are some employees with low evaluation scores working long hours, but the majority of the employees who left were high-performers working long hours and those with relatively low evaluation scores with low working hour figures.
Time spent at the company & average monthly hours vs. left.
• The largest sum of employees, a total of 1586, left the company at the three year mark.
• This is followed by the four year mark, at a total of 890.
• Then we see the five year mark with 833 employees leaving.
• The rate for two years is then only 53, with years seven and over seeing no employees leave.
3,518 employees left between years three and six (inclusive).
Time spent at the company vs. left, average.
This equates to the largest average of the payroll leaving at the five year mark, with years four, three, six and two following in that order.
Time spent at the company vs. promotion in the last 5 years.
• The employees who have been at the company for 7 years have the highest average promotion figure, followed by those who have been at the company for 10 years, then 8 years.
• With the exception of year three, there appears to be a good, continuous scale from year two upwards to year seven representing the promotion average rising in unison as the years progress.
Work accident vs. left.
The figures below show that the majority (27%) of employees who left did not have a work accident.
Work accident vs. employee satisfaction.
The figures are reversed for employee satisfaction, with the majority of employees experiencing low satisfaction having also experienced an accident at work. As per any data, this may not be correlated, but those employees would be worthy of a conversation re: how things are going within the company and whether or not the satisfaction level is correlated to their accident.
Promotion last 5 years and satisfaction level vs. left.
• Of the employees who left, 6% had a promotion in the last five years.
• Of the employees who remained, 24% had a promotion in the last five years.
• The majority of employees who left the company vs. these variables is a figure of 56%, those employees received no promotion in the last 5 years and had a below-average satisfaction level.
• The figure which follows the majority is an average of 23% for employees who received a promotion in the last five years but had above-average satisfaction.
• The absolute lowest attrition rate is that of 2%, for employees who had a promotion in the last five years and above-average satisfaction levels.
Department and salary range vs. average of left.
• The above-average figures for employees who left are in the low salary bracket in the Sales, Technical, Support, Management, Marketing and IT departments.
• There are above-average figures for employees leaving in the medium salary bracket in the Accounting, and Human resources departments. Product management and R&D both see an almost equal amount of employees leaving in both the medium and low salary brackets, respectively.
• There aren't any surprising figures for employees with high salaries leaving, but the highest figures for employees leaving within this salary bracket belong in the HR, Technical and Marketing departments.
Satisfaction level, department and salary.
• The highest salary values exist in HR, Management, Sales, IT & Support and these departments also - whether a matter of cause & effect or not - reflect the greatest satisfaction scores overall.
• There are lighter satisfaction hues reflecting good satisfaction levels for employees in the medium salary range in the Support, Sales, IT and Marketing departments.
• With the exception of R&D (and its satisfaction score of 0.63 for the lowest earners), most of the lowest salaries see the lowest satisfaction scores.
Satisfaction level by average monthly hours and time spent at company.
Clusters of leavers in the average monthly hours > 200 range at satisfaction level < 0.2.
(The top-left yellow scatterplot markers on a vertical axes)
• These employees had a high average last evaluation score (0.86) and an almost-average time spent at company figure (4.1).
The last evaluation average for this group:
The time spent at company average for this group:
Clusters of leavers in the average monthly hours > 100 range at satisfaction level > 0.35.
(The bottom-middle-left yellow scatterplot markers)
• These employees had an average last evaluation score (0.51) and a below-average time spent at company figure (3.0).
Last evaluation average:
The time spent at company average for this group:
Clusters of leavers in the average monthly hours > 200 range at satisfaction level > 0.7.
(The top-right yellow scatterplot markers)
• And these employees had the highest average last evaluation score (0.91) and a slightly above-average time spent at company figure (5.1).
Last evaluation average:
The time spent at company average for this group:
Salary ranges for the employees in the above scatterplot.
For the left-hand clusters of leavers in the scatterplot (low satisfaction, high average monthly hours), the highest-paid employees were in the R&D and HR depts while the employees with the lowest salaries were in the Support, Marketing and Technical departments:
For the middle cluster (med-low satisfaction, low average monthly hours), the highest-paid employees were in the HR and R&D depts while the employees with the lowest salaries were in the Support, Marketing and Sales departments:
And re: the right-hand cluster (high satisfaction, high average monthly hours), the highest-paid leavers were in the Technical and Accounting depts while those with the lowest salaries were in the Marketing, Sales and IT departments:
Satisfaction level by last evaluation score.
From the statistical description we can see that the satisfaction level 0 - 0.2 has the highest mean, the highest 25%, 50% and 75% evaluation figures alongside quite a tight standard deviation. So it's debateable whether these employees were either graded higher as a means of motivation (unlikely), or they were well-graded in their last evaluation and that has led to them taking on more work which - as stated as a possibility earlier in the EDA - has added a bit of extra pressure to perform and resulted in lower satisfaction levels.
The next-highest evaluation figures belong to the highest-rated satisfaction levels, 0.7 -1. It's there we see a mean of 0.76, a 25% of 0.62, a 50% of 0.770 and a 75% of 0.9.
Satisfaction level by department and last evaluation.
A visual representation of the above information in a sunburst chart shows the lighter yellow hues in the 0 - 0.2 satisfaction range, this is surrounded by more yellow hues in its related pie objects, equating to high last evaluation figures. Those evaluation figures were led by the Support, Management and IT departments in that order, followed by the Technical department.
In the highest of the satisfaction level bins (0.7 - 1), we see the Marketing department hold the highest last evaluation score followed by Product management, Accounting and Technical.
Average monthly hours per department.
• The sales department works the longest average monthly hours, followed by Technical, Support, IT, Product management, Marketing, R&D, Accounting, HR and Management.
Promotion last 5 years per department.
• Once more we see the Sales department as the most common dept., with 31% of the promotions going to this department. Sales is followed by Management (21.6%) and Marketing as the next most common departments for promotions.
Time spent at the company per department.
• A similar story for the time spent at the company due to the Sales department being the most stacked of all departments, here we see the Sales dept. with the largest percentage (27.9), followed by Technical (17.7), and then Support (14.4).
• HR is the department with employees spending the least time with the company.
Last evaluation per department.
• Of the department evaluations, the Support dept. has the highest Median (74), followed by Technical and Accounting (73).
• Technical and Product management have the highest Q3 figures (88), followed by Support and Marketing (87).
Last evaluation per department vs. left.
• Of the employees who left each department, Product management, R&D, IT, Support and Technical had the highest (0.8 - 0.825) median evaluation scores.
• Marketing and HR had the lowest median evaluation scores.
• The highest evaluation Q3 of the leavers (0.9 - 0.91) belongs to the Technical and R&D departments, followed by the Product management, IT, Support and Sales depts.
Sum of projects per employee.
Total salary per project.
Projects per department.
• The Sales dept. sees the largest share of projects, followed by Technical, support, and IT.
Modeling.
As this is an easy dataset for almost any tree-based algorithm to parse, I will opt for LightGBM and although some cross-validation runs will be required, my gut says default parameters will be fine. To go a step further in the reliability stakes, I will use XGB with ten rounds of early stopping and a couple of other tools such as Shap analysis as per.
Dummifying salary and department features.
LightGBM classifier with default params.
LightGBM score.
LightGBM classification report.
LightGBM confusion matrix.
LightGBM Mean Absolute Error.
Train auc vs. test auc with LightGBM.
Removing the first word and trailing dunderscore ('remainder__' && 'onehotencoder__') added to the beginning of each feature name by OHE.
LightGBM feature importance and Shap results.
• The most important feature of all is obviously employee satisfaction. This could be a result of many other problems not included in this dataset so it would be unfair for me to make assumptions on what other the other possible causes may be, and in my professional experience there are many issues that data can't cover. The most important issue to get a handle on that I can think of is, some of the highest-evaluated employees have the lowest satisfaction scores. This can be resolved with nothing more intricate than utilising HR to have simple sit-downs. Eyeballing the department in-person to get a handle on the department atmosphere and employee body language to make sure assigning this extra work to HR is necessary isn't a bad start. Whether the employees are feeling victimised by their previous successes, have been given a good evaluation score as a means of motivation (as opposed to a better salary), felt like it was just time to move to pastures new after an evaluation score which would result in a good reference, or any other reason, getting to know the employees on a personal level during their evaluation won't take up much extra time and will eliminate the majority of possibilities.
• The average monthly hours in this dataset, as witnessed already, can be either too low or too high. Both appear to be resulting in employee dissatisfaction in some fashion. Once more, with no assumptions re: this company / going purely on personal management experience, zero-hour contracts and / or an ineffectual feeling can result in employees leaving a workplace as much as being overworked.
• Time spent at the company: The five year tenure is worthy of investigation here with years four & three as the next-most common timeframes for employees leaving, and once again can be partially resolved with annual reviews or interviews for employees who have spent two or more years at the company to enquire as to what they would want for their future within the company (obviously within reason). The offer of flexible scheduling for employees of a certain term - if not already implemented -, drawing a company-specific rewards program or partner rewards could be one good way to utilise the less motivated HR employees, reducing burnout & promoting a better work / life balance would be a good idea for those of a certain tenure leaving due to the additional strain of long hours and extra projects.
• The last evaluation has been analysed to a fault with no real concrete conclusions due to the black-box nature of this analysis, only the (mostly educated?) assumption that low-scoring employees will possibly 'jump ship' before they feel they will lose their job, they may feel the additional pressure of a high-scoring evaluation or even be energised by a good evaluation and look for a step-up in another company. The lack of employees leaving in the middle grounds of the valuation scores is *almost* evidence enough to lead me to believe those employees were happy to get on with their jobs unhindered by any additional pressure of either polarity.
• There is a visible sweet-spot as far as the number of projects go. These are 3, 4 and 5 projects. Too little and the employees are unhappy, too many and the employees are only slightly more happy but still leave, just in smaller amounts. Spreading the smaller project amounts out and, if possible, eliminating the requirement for 6 or more projects entirely for employees of a certain salary or hourly sum would be the most efficient method to resolve this, depending on the staff personalities involved.
• The Technical department has seen the highest figure of high-earning staff leave as well as a relatively high number of low-earning staff, which is a shame because the Technical department is among the highest evaluated and among the best for company longevity. They are putting in among the larger shares of monthly hours and have average to low promotion rates.
• Sales: The largest department and I would suspect one of the most important from a stakeholder perspective. Sales sees high average staff retention, high average promotion rates, high average monthly hours worked, high evaluation scores of the employees who left and relatively low salaries. If the Sales department is this large, as favoured as it appears to be, and works as long as the data would suggest, then the other departments will likely be struggling to keep up (depending on the product).
• Support sees among the lowest-earning and the higher average monthly hours. Their low-earners experience the lowest satisfaction levels, resulting in the absolute highest amount of low-earners exiting from this department. In contrast, the Support employees have the highest Median evaluation figure and the highest Q3 figure along with Technical. Support also has the highest median evaluation scores of any of the leavers. In short, as per some of the Technical and Sales department's employees, they are highly-rated yet probably overworked and underpaid.
• HR has around 160 less total projects than the Project management department but almost as many clusters of num projects == 2, which could be an issue. HR also has among the lowest Q1 figures for the last evaluation, and the lowest Q1 evaluation figures for employees who left. This department has among the highest staff turnover / least amount of time spent at the company as well as being among the highest-paid in the clusters for low and medium average monthly hours & med-low satisfaction which could result in employees feeling as if they're being paid above their worth & "jumping before they're pushed" as it were. They have some of the lowest average promotion rates as well as the second-lowest average monthly hours, and they are among the medium pay scale for the employees working long hours (possibly hence 'salary medium' appearing as the next most important feature). The lowest-earning HR staff are actually more satisfied than the medium-earning HR staff. Considering all of the information at hand, employees in this department might feel neglected, bored and / or laden with the feeling that they can't prove their worth. This could also be seen as a positive; those HR employees in the high satisfaction, high monthly average hour cluster were the lower-earners.... But is this is a case of job satisfaction over salary, or is it that these employees feel that their worth is better shown through working long hours? This could be a bone of contention and would be better off being further analysed through department-level feedback. Then, utilising this department to take on the extra work necessary in investigating the other issues outlined would hopefully boost morale among some.
• Low and medium salaries. This needs no explanation and due to the fact that I don't know anything more about this company, I would prefer to not comment on any possible recommendations.
• Work accident: A bit of a non-issue here in the respect that those with no work accidents were the highest proportion of leavers, and this is likely due to the low probability of the employees experiencing accidents in the workplace. Although, because satisfaction is high on the importance charts and the employees with low satisfaction were largely those who experienced accidents, this would be one area of relatively high concern.