StratifiedKFold测试

StratifiedKFold理解和测试

参考:https://zhuanlan.zhihu.com/p/150446294

StratifiedKFold函数采用分层划分的方法(分层随机抽样思想),
验证集中不同类别占比与原始样本的比例保持一致,
故StratifiedKFold在做划分的时候需要传入标签特征。

测试代码

import numpy as np
from sklearn.model_selection import KFold,StratifiedKFold
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4],[5,9],[1,5],[3,9],[5,8],[1,1],[1,4]])
y = np.array([0, 1, 1, 1, 0, 0, 1, 0, 0, 0])

print('X:',X)
print('y:',y)


print('-'*20, 'KFold')
kf = KFold(n_splits=3 ,shuffle=True, random_state=2020)
#做split时只需传入数据,不需要传入标签
for train_index, test_index in kf.split(X):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]


print('-'*20, 'StratifiedKFold')
kf = StratifiedKFold(n_splits=3 ,shuffle=True, random_state=2020)
#做split时只需传入数据,不需要传入标签
for train_index, test_index in kf.split(X,y):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
运行结果
X: [[1 2]
 [3 4]
 [1 2]
 [3 4]
 [5 9]
 [1 5]
 [3 9]
 [5 8]
 [1 1]
 [1 4]]
y: [0 1 1 1 0 0 1 0 0 0]
-------------------- KFold
TRAIN: [0 3 5 6 7 8] TEST: [1 2 4 9]
TRAIN: [0 1 2 3 4 8 9] TEST: [5 6 7]
TRAIN: [1 2 4 5 6 7 9] TEST: [0 3 8]
-------------------- StratifiedKFold
TRAIN: [0 2 5 6 7 8] TEST: [1 3 4 9]
TRAIN: [0 1 3 4 6 8 9] TEST: [2 5 7]
TRAIN: [1 2 3 4 5 7 9] TEST: [0 6 8]

***Repl Closed***

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值