Section I: Brief Introduction on StratifiedKFold
A slight improvement over the standard k-fold cross-validation approach is stratified k-fold cross-validattion, which can yeild better bias and variance estimates, especially in case of unequal class proportions. In stratified cross-validattion, the class proportionss are preserved in each fold to ensure that each fold is representative of the class proportions in the training dataset.
FROM
Sebastian Raschka, Vahid Mirjalili. Python机器学习第二版. 南京:东南大学出版社,2018.
Section II: Code and Analyses
代码
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
import numpy as np
from sklearn.model_selection import StratifiedKFold
import warnings
warnings.filterwarnings("ignore")
#Section 1: Load Breast data, i.e., Benign and Malignant
breast=datasets.load_breast_cancer()
X=breast.data
y=breast.target
X_train,X_test,y_train,y_test=\
train_test_split(X,y,test_size=0.2,stratify=y,random_state=1