Analytical capability has been improving at exponential rates in recently and is crucial for the expansion and diversification of businesses. However, Human Resource departments are struggling to implement these new practices even though human capital is one of the most important assets of a business.
This article will look at a classification model to determine its usefulness in finding patterns and predicting employee turnover and termination reasons. The model analyzed here is a Random Forest model which groups individuals by comparing the influence of explanatory variables and similarities of individuals.
Employee Turnover Data Description
The data in this model is a simulated employee database by Davide Polizzi and publicly available on Kaggle. The data includes employee names and IDs, demographic variables, action outcomes, performance reviews, engagement surveys, and departmental classifiers.
The model will look to group employees based on their employment status to determine common influences of employee turnover. The variables used are marital status, gender, department, performance reviews, diversity fair hires, pay rate, position, manager, engagement survey scores, employee satisfaction, and the number of special projects completed.
Random Forest Model of Employee Turnover
Random Forests group and average out decisions trees based on different test and training datasets to determine the best splitting points for each variable. For this example, the model looked to group employees by their termination status and identify the most influential variables in determining an employee’s termination.
There are two different variables represent this, a binary termination status and an Action ID, which differentiates between termination for cause or voluntarily. While both have useful insights, the Action ID will provide more actionable results since we can differentiate between voluntary and involuntary termination and get an understanding of why employee turnover is occurring.
As shown below, Figures 1-4 show the results from the model and looks at the influence of variables for Termination for Cause (left) against Voluntary Termination (right) to provide a comparison of how each affects the outcome. The most influential variables were Manager ID, Pay Rate, Engagement Survey Scores, and Performance Scores.
For Manager ID, the impact is highly split between low and high IDs indicating some managers struggle mostly with employees voluntarily leaving whereas others have greater numbers of terminations for cause. A more in-depth look into the data and specific managers is necessary to determine why this might be and if there is an issue of engagement or managerial style that needs correcting or if it’s an issue of job level or position.
Some other noteworthy observations from the most influential variables are low pay, low and mid engagement scores, and high-performance scores all increased voluntary terminations. In contrast, pay rates on either extreme, mid engagement scores, and low performance scores all increase the likelihood to have termination for cause.
Power of Employee Turnover Models
While this data was hypothetical and simulated, the model created here shows the potential of data analytics as an application to human resources. For a real HR department, specific recommendations could be investigating further into managers and potential engagement issues driving turnover rates and other ways to hire and retain the best talent. Overall, the ability to determine the driving causes of turnover and differentiate how these affect different individuals and departments would allow HR departments to better tailor programs and target the root cause of personnel issues.