SMOTE抽样 数据不平衡的问题

本文介绍了如何使用SMOTE算法解决机器学习中类别不平衡的问题。通过调整抽样策略,实现不同类别样本数量的平衡,从而提高模型的预测性能。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

from imblearn.over_sampling import SMOTE
import pandas as pd 
C:\ProgramData\Anaconda3\lib\importlib\_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
  return f(*args, **kwds)
C:\ProgramData\Anaconda3\lib\importlib\_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
  return f(*args, **kwds)
df = pd.read_csv('base_done.csv')
data = df[:20].iloc[:,1:10]
data
sexageproviderlevelverifiedusing_timeregist_typecard_a_cntcard_b_cnt
00248530102471312471924712
11250110102474372471224712
21248770202474472471924725
30249250202471512471224712
41248772102470632471224712
50249440202472012471924712
60248400202472712471224712
70249440202470912471224712
80249080202473012471224712
91249560202474172471924719
100249200202474172471924719
110248710202472222471224712
121248890202472512471924719
131248652202471032471224712
140249440202472712471224712
151249310202473372471224719
160249630102472122471224712
170248770202472712471224712
180249010202473372472524719
190248592102471332471224712
X = data.drop(columns='provider').values
y = data.provider
data.provider.value_counts()
0    17
2     3
Name: provider, dtype: int64
sm = SMOTE(sampling_strategy={0:17,2:15},k_neighbors=2)  # sampling_strategy 抽样策略,默认为1:1, k = knn的
X_res, y_res = sm.fit_resample(X, y)  # 
y_res.value_counts()
0    17
2    15
Name: provider, dtype: int64
X_res
sexagelevelverifiedusing_timeregist_typecard_a_cntcard_b_cnt
0024853102471312471924712
1125011102474372471224712
2124877202474472471924725
3024925202471512471224712
4124877102470632471224712
5024944202472012471924712
6024840202472712471224712
7024944202470912471224712
8024908202473012471224712
9124956202474172471924719
10024920202474172471924719
11024871202472222471224712
12124889202472512471924719
13124865202471032471224712
14024944202472712471224712
15124931202473372471224719
16024963102472122471224712
17024877202472712471224712
18024901202473372472524719
19024859102471332471224712
20024860102471232471224712
21024871102470832471224712
22024872102470732471224712
23124865102470932471224712
24024863102471032471224712
25024859102471232471224712
26124870102470832471224712
27124875102470632471224712
28124872102470732471224712
29024873102470732471224712
30024862102471132471224712
31024859102471232471224712
32024873102470732471224712
33124866102470932471224712

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值