Section I: Brief Introduction on LearningCurves
If a model is too complex for a given training dataset-there are too many degrees of freedom or paramters in this model-the model tends to overfit the training data and does not generalize well to unseen data. Often, it can help to collect more training samples to reduce the degree of overfitting. However, in practice, it can often be very expensive or simply not feasible to collect more data. By plotting the model training and validation accuracies as functions of the training set size, we can easily detect whether the model suffers from high variance or high bias, and whether the collection of more data could help address the problem.
FROM
Sebastian Raschka, Vahid Mirjalili. Python机器学习第二版. 南京:东南大学出版社,2018.
Section II: Code Bundle and Analyses
代码
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
import numpy as np
from sklearn.model_selection import learning_curve
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
plt.rcParams

通过绘制学习曲线,我们可以分析模型在训练集和验证集上的表现,从而识别过度拟合或欠拟合。如果随着训练数据增加,训练精度提高但验证精度下降,说明模型过拟合;反之,如果两者精度提升不明显且偏低,可能是模型欠拟合。
最低0.47元/天 解锁文章

被折叠的 条评论
为什么被折叠?



