http://blog.youkuaiyun.com/u013630349/article/details/47133283
本文K折验证拟采用的是 Python 中 sklearn 包中的 StratifiedKFold 方法。
方法思想详见:http://scikit-learn.org/stable/modules/cross_validation.html
StratifiedKFold is a variation of k-fold which returns stratified folds: each set contains approximately the same percentage of samples of each target class as the complete set.
【译】
StratifiedKFold 是一种将数据集中每一类样本的数据成分,按均等方式拆分的方法。
其它划分方法详见:http://scikit-learn.org/stable/modules/cross_validation.html
闲言少叙,直接上代码。
【屌丝源码】
- import numpy
- import h5py
- import sklearn
- from sklearn import cluster,cross_validation
- from sklearn.cluster import AgglomerativeClustering
- from sklearn.cross_validation import StratifiedKFold
- ## 生成一个随机矩阵并保存
- #arr = numpy.random.random([200,400])
- #labvec = []
- #for i in numpy.arange(0,200):
- # j = i%10
- # arr[i,j*20:j*20+20] = arr[i,j*20:j*20+20]+10
- # labvec.append(j)
- #arr = arr.T
- #file = h5py.File('arr.mat','w')
- #file.create_dataset('arr', data = arr)
- #file.close()
- #file = h5py.File('labvec.mat','w')
- #file.create_dataset('labvec', data = labvec)
- #file.close()
- # 读方式打开文件
- myfile=h5py.File('arr.mat','r')
- arr = myfile['arr'][:]
- myfile.close()
- arr = arr.T
- myfile=h5py.File('labvec.mat','r')
- labvec = myfile['labvec'][:]
- myfile.close()
- skf = StratifiedKFold(labvec, 4)
- train_set = []
- test_set = []
- for train, test in skf:
- train_set.append(train)
- test_set.append(test)
详见:
http://scikit-learn.org/stable/modules/cross_validation.html