Stanford机器学习第六讲（上）Advices for applying machine learning--Deciding what to try next

本文提供了一种简单技术，帮助您在应用机器学习时避免盲目尝试，节省时间并提高算法性能。通过合理划分数据集，采用训练集、验证集和测试集，我们可以更准确地评估算法在新数据上的泛化能力。此外，文章还介绍了如何通过正则化技术调整偏差和方差，以及如何使用学习曲线诊断算法存在的偏斜或过拟合问题。最后，文章提供了一个全面的指南，帮助您在遇到预测误差较大时做出有效的调整。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Advices for applying machine learning--Deciding what to try next

下面介绍prettysimple technique来排除上面的某些调整方法，使得不用漫无目的的调整，减少没有意义的调整，节省时间。如一些Machine learning diagnostics，让你采取有效方式提升算法performance.

它有助于帮助你提前发现你打算进行的尝试是毫无结果的。

一、如何evaluate algorithms

当fitparameters时，尝试选择那些能minimize training error的parameters，但是trainingerror越小hypothesis不一定越好。

首先将dataset划分成training set和test set。如下图所示。

接下来计算error。

二、Model Selection and Train/Validation/Test Sets

假设你想确定拟合dataset的polynomial的阶数（算法该包含哪些features）或是regularizationparameter lamda，那么该如何决定，这就是model selection。

按照之前的做法是，将dataset划分成training set和test set，在training set上求出theta，用此theta在test set上计算test error，如下图所示

这里当d=5的时候test error最小，那么我们作为最终模型.

这时我们或许会问Howwell does the model generalize?我们能做的就是看我们选择的5th order polynomial hypothesis在test set上效果如何。

但是这并不是fair estimate of how well the hypothesis generalize.

原因是我们在testset上fit出了这个extra parameter d(degree of polynomial)，我们使用test set选择了d的大小，那么再在这个testset上evaluate我们的hypothesis是no longer fair的，因为我们的hypothesis is likely to do better on this test set than that wouldon new examples that hasn’t seen before，我们所要考虑的就是要对此进行修正。

换个方式思考，在之前讨论的将dataset划分成training set和test set的情形中，we saw that if we fit some set of parameters to training set, thenthe performance of this fitted model on the training set is not predictive ofhow well the hypothesis generalize new examples, this because the parameterswere fitted to the training set, so they would likely do well on the trainingset even if the parameters don’t do well on other examples。

总结一下，Specifically what we did is we fit the parameter d to the test set,and by having fit the parameters of test set means the performance ofhypothesis on that test set would not be fair estimate of how well thehypothesis is likely to do on new examples we haven’t seen before

To address this problem in a modelselection setting,，我们不仅仅把dataset划分成training set和test set，而是划分成training set、validation set和test set。如下图所示。

相应的，各种误差计算如下。

我们用trainingset来获得parameter theta，并在validation set上测试出d，因此不能再用validation set来estimate generalization error，所以用test set来 estimategeneralization error。

用validation set来select the model and evaluate it on the test set。如下图所示。

三、Bias/Variance

下面是training error和validation error随着degree of polynomial变化的变动示意图。

在degree ofpolynomial较小时，是low variance, high bias的。在degree ofpolynomial较大时，是high variance, low bias的。

四、Regularization and Bias/Variance

Regularization能够防止overfitting,下面探讨regularization如何影响Bias和variance。下图以linear regression为例示范了regularization term对bias和variance的影响。

下面给出各种error的定义

根据Part 2中model selection的内容，给出在有regularization term时的model selection过程。

其中在trainingset上计算出各种lamda下的theta值，根据在validation set上的validation error确定合适的theta和lamda，由上图所示，最终选中第5个模型，对其计算test error。

下图给出了随着lamda的变化，training error和validation error变动的示意图，以及bias和variance变化的情况。

五、Learning curve

Learning curve用来判断algorithm是suffer Bias还是variance或者二者both。

下图给出了随着training set size变化，validation error和training error变化示意图。

当已经是High bias时，下面给出此种情形下，随着training set size变化，validation error和training error变化示意图。Validation error和training error都很大，而且二者差距越来越小。注意在算法已经是high bias时，再增加training data对改进算法是没有帮助的。

当已经是High variance时，下面给出此种情形下，随着training set size变化，validation error和training error变化示意图。Validation error和training error一开始差距很大，随着样本数增加二者差距越来越小。注意在算法已经是high variance时，再增加training data有助于改进算法。

由此可知，我们每次调整算法的时候可以将learning curve画出来，据此判断该如何对算法进行调整。

六、Deciding what to do next

在本篇一开始说到当算法predictionerror比较大的时候，可以进行如下调整。下图给出所做的调整对high bias和high variance问题的改进。（可以很据part3、part4、part5中的曲线图得出结论）

下面谈论下Neuralnetworks中的overfitting问题。虽然small NN能防止overfitting，但是large NN 加regularization对address overfitting的效果更好。

对于hidden layer个数的选择可以通过cross validation的方法。

另外当cross validation error is much larger than thetraining error时，Is increasing the number of hidden units likelyto help? The answer is no. Because it is currently sufferingfrom high variance, so adding hidden units is unlikely to help.