运营商客户流失率分析

1.数据集说明

每一行代表一个客户,每一列包含列元数据中描述的客户属性。原始数据包含7043行(客户)和21列(特性)。

字段 字段 字段说明
customerID: 用户ID 身份标识
gender 性别 (male,female )
SeniorCitizen 是否老年人 (0, 1 )
Partner 是否有伴侣 (No, Yes )
Dependents 是否有抚养人 (No, Yes )
tenure 客户入网时长(月) (连续值 0-72 )
PhoneService 是否有电话服务 (Yes, No)
MultipleLines 是否有多线服务 (Yes, No, No phone service)
InternetService 客户互联网服务提供商 (No, DSL数字网络,fiber optic光纤网络 )
OnlineSecurity 是否有在线安全 (Yes, No, No internet service)
OnlineBackup 是否在线备份 (Yes, No, No internet service)
DeviceProtection 设备保护策略 (Yes, No, No internet service)
TechSupport 技术支持 (Yes, No, No internet service)
StreamingTV 在线电视 (Yes, No, No internet service)
StreamingMovies 在线电影 (Yes, No, No internet service)
Contract 合同 (month-to-month, two year, One year)
PaperlessBilling 无纸账单 (Yes, No)
PaymentMethod 支付方式 (Electronic check, Mailed check, Bank transfer (automatic), Credit card (automatic))
MonthlyCharges 每月服务费 (连续值)
TotalCharges 总话费 (连续值)
Churn 流失标签 (No, Yes)

2.分析思路

寻找与流失率有关的特征,进一步分析这些特征如何影响流失率,刻画高流失率用户画像,对高流失率用户提供建议。

3.数据预处理

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from datetime import datetime

%matplotlib inline
plt.style.use('ggplot')
plt.rcParams['font.sans-serif'] = ['SimHei']  # 用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False  # 用来正常显示负号
df = pd.read_csv('电信运营商客户数据集.csv')
df.head()
customerID gender SeniorCitizen Partner Dependents tenure PhoneService MultipleLines InternetService OnlineSecurity ... DeviceProtection TechSupport StreamingTV StreamingMovies Contract PaperlessBilling PaymentMethod MonthlyCharges TotalCharges Churn
0 7590-VHVEG Female 0 Yes No 1 No No phone service DSL No ... No No No No Month-to-month Yes Electronic check 29.85 29.85 No
1 5575-GNVDE Male 0 No No 34 Yes No DSL Yes ... Yes No No No One year No Mailed check 56.95 1889.5 No
2 3668-QPYBK Male 0 No No 2 Yes No DSL Yes ... No No No No Month-to-month Yes Mailed check 53.85 108.15 Yes
3 7795-CFOCW Male 0 No No 45 No No phone service DSL Yes ... Yes Yes No No One year No Bank transfer (automatic) 42.30 1840.75 No
4 9237-HQITU Female 0 No No 2 Yes No Fiber optic No ... No No No No Month-to-month Yes Electronic check 70.70 151.65 Yes

5 rows × 21 columns

#查看数据信息
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBilling  7043 non-null   object 
 17  PaymentMethod     7043 non-null   object 
 18  MonthlyCharges    7043 non-null   float64
 19  TotalCharges      7043 non-null   object 
 20  Churn             7043 non-null   object 
dtypes: float64(1), int64(2), object(18)
memory usage: 1.1+ MB

没有数据缺失。

#是否有重复数据
sum(df.duplicated())
0
sum(df.customerID.duplicated())
0

一共有7043名用户的数据。

#将TotalCharges(总消费额)转换为浮点型,错误充为nan值
df['TotalCharges'] = pd.to_numeric( df['TotalCharges'],errors=  'coerce' )
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBilling  7043 non-null   object 
 17  PaymentMethod     7043 non-null   object 
 18  MonthlyCharges    7043 non-null   float64
 19  TotalCharges      7032 non-null   float64
 20  Churn             7043 non-null   object 
dtypes: float64(2), int64(2), object(17)
memory usage: 1.1+ MB

TotalCharges(总消费额)有缺失值。

df[df.TotalCharges.isin([np.NaN])]
customerID gender SeniorCitizen Partner Dependents tenure PhoneService MultipleLines InternetService OnlineSecurity
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值