员工流失分析是指分析离开公司、企业的员工的行为,并将他们与公司中的现有员工进行比较。它有助于找出哪些员工可能很快离开。所以,如果你想学习如何分析员工流失,这篇文章适合你。本文中,将带您完成使用Python进行员工流失分析的任务。
员工流失分析
员工流失分析是一种行为分析,我们研究离开公司的员工的行为和特征,并将其特征与现有员工进行比较,以找到可能很快离开公司的员工。
就招聘和培训成本、生产力损失和员工士气下降而言,员工的高流失率对任何公司来说都是昂贵的。通过识别员工流失的原因,企业可以采取措施减少员工流失,留住宝贵的员工。
使用Python进行员工流失分析
导入必要的Python库和数据集来开始此任务:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
pio.templates.default = "plotly_white"data = pd.read_csv("WA_Fn-UseC_-HR-Employee-Attrition.csv")
print(data.head())
输出
Age Attrition BusinessTravel DailyRate Department \
0 41 Yes Travel_Rarely 1102 Sales
1 49 No Travel_Frequently 279 Research & Development
2 37 Yes Travel_Rarely 1373 Research & Development
3 33 No Travel_Frequently 1392 Research & Development
4 27 No Travel_Rarely 591 Research & Development DistanceFromHome Education EducationField EmployeeCount EmployeeNumber \
0 1 2 Life Sciences 1 1
1 8 1 Life Sciences 1 2
2 2 2 Other 1 4
3 3 4 Life Sciences 1 5
4 2 1 Medical 1 7 ... RelationshipSatisfaction StandardHours StockOptionLevel \
0 ... 1 80 0
1 ... 4 80 1
2 ... 2 80 0
3 ... 3 80 0
4 ... 4 80 1 TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany \
0 8 0 1 6
1 10 3 3 10
2 7 3 3 0
3 8 3 3 8
4 6 3 3 2 YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
0 4 0 5
1 7 1 7
2 0 0 0
3 7 3 0
4 2 2 2 [5 rows x 35 columns]
让我们看看这个数据集是否包含任何缺失值:
print(data.isnull().sum())
输出
Age 0
Attrition 0
BusinessTravel 0
DailyRate 0
Department 0
DistanceFromHome 0
Education 0
EducationField 0
EmployeeCount 0
EmployeeNumber 0
EnvironmentSatisfaction 0
Gender 0
HourlyRate 0
JobInvolvement 0
JobLevel 0
JobRole 0
JobSatisfaction 0
MaritalStatus 0
MonthlyIncome 0
MonthlyRate 0
NumCompaniesWorked 0
Over18 0
OverTime 0
PercentSalaryHike 0
PerformanceRating 0
RelationshipSatisfaction 0
StandardHours 0
StockOptionLevel 0
TotalWorkingYears 0
TrainingTimesLastYear 0
WorkLifeBalance 0
YearsAtCompany 0
YearsInCurrentRole 0
YearsSinceLastPromotion 0
YearsWithCurrManager 0
dtype: int64
现在让我们来看看数据集中年龄的分布:
sns.displot(data['Age'], kde=True)
plt.title('Distribution of Age')
plt.show()
让我们来看看各部门的流失率:
# Filter the data to show only "Yes" values in the "Attrition" column
attrition_data = data[data['Attrition'] == 'Yes']# Calculate the count of attrition by department
attrition_by = attrition_data.groupby(['Department']).size().reset_index(name='Count')# Create a donut chart
fig = go.Figure(data=[go.Pie(labels=attrition_by['Department'],values=attrition_by['Count'],hole=0.4,marker=dict(colors=['#3CAEA3', '#F6D55C']),textposition='inside'
)])# Update the layout
fig.update_layout(title='Attrition by Department', font=dict(size=16), legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1
))# Show the chart
fig.show()
我们可以看到研发部门的人员流失率很高。现在让我们来看看按专业领域划分的流失百分比:
attrition_by = attrition_data.groupby(['EducationField']).size().reset_index(name='Count')# Create a donut chart
fig = go.Figure(data=[go.Pie(labels=attrition_by['EducationField'],values=attrition_by['Count'],hole=0.4,marker=dict(colors=['#3CAEA3', '#F6D55C']),textposition='inside'
)])# Update the layout
fig.update_layout(title='Attrition by Educational Field', font=dict(size=16), legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1
))# Show the chart
fig.show()
我们可以看到,以生命科学为专业的员工流失率很高。现在让我们来看看在公司工作的年数中的自然减员百分比:
attrition_by = attrition_data.groupby(['YearsAtCompany']).size().reset_index(name='Count')# Create a donut chart
fig = go.Figure(data=[go.Pie(labels=attrition_by['YearsAtCompany'],values=attrition_by['Count'],hole=0.4,marker=dict(colors=['#3CAEA3', '#F6D55C']),textposition='inside'
)])# Update the layout
fig.update_layout(title='Attrition by Years at Company', font=dict(size=16), legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1
))# Show the chart
fig.show()
我们可以看到,大多数员工在工作一年后离开公司。现在让我们来看看自上次晋升以来的年数的流失百分比:
attrition_by = attrition_data.groupby(['YearsSinceLastPromotion']).size().reset_index(name='Count')# Create a donut chart
fig = go.Figure(data=[go.Pie(labels=attrition_by['YearsSinceLastPromotion'],values=attrition_by['Count'],hole=0.4,marker=dict(colors=['#3CAEA3', '#F6D55C']),textposition='inside'
)])# Update the layout
fig.update_layout(title='Attrition by Years Since Last Promotion', font=dict(size=16), legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1
))# Show the chart
fig.show()
我们可以看到,与获得晋升的员工相比,没有获得晋升的员工离开公司的次数更多。现在让我们来看看按性别划分的流失百分比:
attrition_by = attrition_data.groupby(['Gender']).size().reset_index(name='Count')# Create a donut chart
fig = go.Figure(data=[go.Pie(labels=attrition_by['Gender'],values=attrition_by['Count'],hole=0.4,marker=dict(colors=['#3CAEA3', '#F6D55C']),textposition='inside'
)])# Update the layout
fig.update_layout(title='Attrition by Gender', font=dict(size=16), legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1
))# Show the chart
fig.show()
男性的自然流失率高于女性。现在让我们通过分析月收入与员工年龄之间的关系来看看流失情况:
fig = px.scatter(data, x="Age", y="MonthlyIncome", color="Attrition", trendline="ols")
fig.update_layout(title="Age vs. Monthly Income by Attrition")
fig.show()
我们可以看到,随着年龄的增长,每月收入增加。我们还可以看到,低月收入的员工流失率很高。
这就是我们分析员工流失的方法。
员工流失预测模型
现在,让我们准备一个机器学习模型来预测员工流失。该数据集具有许多具有分类值的特征。我将这些分类变量转换为数值:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
data['Attrition'] = le.fit_transform(data['Attrition'])
data['BusinessTravel'] = le.fit_transform(data['BusinessTravel'])
data['Department'] = le.fit_transform(data['Department'])
data['EducationField'] = le.fit_transform(data['EducationField'])
data['Gender'] = le.fit_transform(data['Gender'])
data['JobRole'] = le.fit_transform(data['JobRole'])
data['MaritalStatus'] = le.fit_transform(data['MaritalStatus'])
data['Over18'] = le.fit_transform(data['Over18'])
data['OverTime'] = le.fit_transform(data['OverTime'])
现在让我们来看看相关性:
correlation = data.corr()
print(correlation["Attrition"].sort_values(ascending=False))
输出
Attrition 1.000000
OverTime 0.246118
MaritalStatus 0.162070
DistanceFromHome 0.077924
JobRole 0.067151
Department 0.063991
NumCompaniesWorked 0.043494
Gender 0.029453
EducationField 0.026846
MonthlyRate 0.015170
PerformanceRating 0.002889
BusinessTravel 0.000074
HourlyRate -0.006846
EmployeeNumber -0.010577
PercentSalaryHike -0.013478
Education -0.031373
YearsSinceLastPromotion -0.033019
RelationshipSatisfaction -0.045872
DailyRate -0.056652
TrainingTimesLastYear -0.059478
WorkLifeBalance -0.063939
EnvironmentSatisfaction -0.103369
JobSatisfaction -0.103481
JobInvolvement -0.130016
YearsAtCompany -0.134392
StockOptionLevel -0.137145
YearsWithCurrManager -0.156199
Age -0.159205
MonthlyIncome -0.159840
YearsInCurrentRole -0.160545
JobLevel -0.169105
TotalWorkingYears -0.171063
EmployeeCount NaN
Over18 NaN
StandardHours NaN
Name: Attrition, dtype: float64
为这个数据添加一个新特征,满意度评分:
data['SatisfactionScore'] = data['EnvironmentSatisfaction'] + data['JobSatisfaction'] + data['RelationshipSatisfaction']
现在让我们将数据分为训练集和测试集:
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score# Split the data into training and testing sets
X = data.drop(['Attrition'], axis=1)
y = data['Attrition']
xtrain, xtest, ytrain, ytest = train_test_split(X, y, test_size=0.3, random_state=42)
下面是我们如何训练员工流失预测模型:
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(xtrain, ytrain)# Evaluate the model's performance
ypred = model.predict(xtest)
accuracy = accuracy_score(ytest, ypred)
print("Accuracy:", accuracy)
输出
Accuracy: 0.8662131519274376
总结
员工流失分析是一种行为分析,我们研究离开公司的员工的行为和特征,并将其特征与现有员工进行比较,以找到即将离开公司的员工。