Machine Learning Diagnostic

本文探讨了机器学习中常见的高偏差与高方差问题,介绍了如何通过学习曲线、正则化参数调整等手段来诊断并解决这些问题。此外,还讨论了如何选择合适的模型复杂度及神经网络结构。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Once we have done some trouble shooting for errors in our predictions by:

  • Getting more training examples
  • Trying smaller sets of features
  • Trying additional features
  • Trying polynomial features
  • Increasing or decreasing λ

Evaluating hypothesis
1.split up the data into two sets: a training set and a test set randomly. (7:3)
2.Learn Θ and minimize Jtrain(Θ) using the training set
3.Compute the test set error Jtest(Θ)


The test set error
这里写图片描述


Given many models with different polynomial degrees

1.Break down dataset into the three sets:

  • Training set: 60%
  • Cross validation set: 20%
  • Test set: 20%

2.Calculate three separate error values for the three different sets
- Optimize the parameters in Θ using the training set for each polynomial degree.
- Find the polynomial degree d with the least error using the cross validation set.
- Estimate the generalization error using the test set with Jtest(Θ(d)), (d = theta from polynomial with lower error);

*This way, the degree of the polynomial d has not been trained using the test set.

**We might generally expect JCV(θ) to be lower than Jtest(θ) because an extra parameter (d,the degree of the polynomial) has been fit to the cross validation set.


Diagnosing Bias vs. Variance

High bias (underfitting):
both Jtrain(Θ)and JCV(Θ) will be high. Also, JCV(Θ)Jtrain(Θ).

High variance (overfitting):
Jtrain(Θ)will be low and JCV(Θ)will be much greater than Jtrain(Θ).
这里写图片描述


choose parameter λ
In order to choose the model and the regularization term λ, we need to:

  1. Create a list of lambdas (i.e. λ∈{0,0.01,0.02,0.04,0.08,0.16,0.32,0.64,1.28,2.56,5.12,10.24});
  2. Create a set of models with different degrees or any other variants.
  3. Iterate through the λs and for each λ go through all the models to learn some Θ.
  4. Compute the cross validation error using the learned Θ (computed with λ) on the JCV(Θ) without regularization or λ = 0.
  5. Select the best combo that produces the lowest error on the cross validation set.
    这里写图片描述

Learning Curves
Experiencing high bias:
Low training set size: causes Jtrain(Θ) to be low and JCV(Θ) to be high.
Large training set size: causes both Jtrain(Θ) and JCV(Θ) to be high with Jtrain(Θ)≈JCV(Θ).
getting more training data will not (by itself) help much.
这里写图片描述

Experiencing high variance:
Low training set size: Jtrain(Θ) will be low and JCV(Θ) will be high.
Large training set size: Jtrain(Θ) increases with training set size and JCV(Θ) continues to decrease without leveling off. Also, Jtrain(Θ) < JCV(Θ) but the difference between them remains significant.
getting more training data is likely to help.
这里写图片描述

In practice, especially for small training sets, when you plot learning curves to debug your algorithms, it is often helpful to average across multiple sets of randomly selected examples to determine the training error and cross validation error.


Deciding What to Do Next Revisited

  • Getting more training examples: Fixes high variance
  • Trying smaller sets of features: Fixes high variance
    • Adding features: Fixes high bias
    • Adding polynomial features: Fixes high bias
    • Decreasing λ: Fixes high bias
    • Increasing λ: Fixes high variance.

Diagnosing Neural Networks

  • A neural network with fewer parameters is prone to underfitting.
    It is also computationally cheaper.
    • A large neural network with more parameters is prone to overfitting.
      It is also computationally expensive. In this case you can use regularization (increase λ) to address the overfitting.

Using a single hidden layer is a good starting default. You can train your neural network on a number of hidden layers using your cross validation set. You can then select the one that performs best.


Model Complexity Effects:

  1. Lower-order polynomials (low model complexity) have high bias and low variance. In this case, the model fits poorly consistently.
  2. Higher-order polynomials (high model complexity) fit the training data extremely well and the test data extremely poorly. These have low bias on the training data, but very high variance.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值