交叉验证的机制
scikit-learn提供库:cross_validation
from sklearn import cross_validation
n指代样本数,n_folds指代将数据集分成多少份且做几次验证试验。
在初始化对象k_fold里已经包含了许多信息,它已经根据参数n和n_folds将n个样本分成n_folds份。每次验证过程选取其中1份作为测试集,剩下的n_folds-1份作为训练集,并且做n_folds次这样的验证。
>>> from sklearn import cross_validation
>>> k_fold = cross_validation.KFold(n=6, n_folds=3)
>>> for train_indices, test_indices in k_fold:
... print('Train: %s | test: %s' % (train_indices, test_indices))
Train: [2 3 4 5] | test: [0 1]
Train: [0 1 4 5] | test: [2 3]
Train: [0 1 2