underfitting or high bias—hypothesis function h maps poorly to the trend of the data.
usually caused by a function that is too simple or uses too few features.
overfitting or high variance—fits the available data but does not generalize well to predict new data.
usually caused by a complicated function that creates a lot of unnecessary curves and angles unrelated to the data.
to address it:
1) Reduce the number of features:
1. Manually select which features to keep.
2. Use a model selection algorithm .
2) Regularization
1. Keep all the features: but reduce the magnitude of parameters θj.
2. Regularization works well when we have a lot of slightly useful features.
Regularization:
1.regularized linear regression
Without actually getting rid of these features or changing the form of our hypothesis, we can instead modify our cost function:
The λ, is the regularization parameter.
If λ is chosen to be too large, it may smooth out the function too much and cause underfitting.
As a result, we see that the new hypothesis (depicted by the pink curve) looks like a quadratic function but fits the data better due to the extra small terms θ
actually,(1−αλm)<1
so it shrink the parameter a little bit before do the same thing as previous.
Using regularization also takes care of any non-invertibility issues of the X transpose X matrix as well.
if m ≤ n, then XTX is non-invertible. However, when we add the term λ⋅L, then XTX+λ⋅L becomes invertible.
2.regularized logistic regression
the θ vector is indexed from 0 to n (holding n+1 values, θ0 through θn), and this sum explicitly skips θ0
b.t.w Because regularization causes J(θ) to no longer be convex, gradient descent may not always converge to the global minimum (when λ>0, and when using an appropriate learning rate α).