深度学习学习笔记——C2W1-4——神经网络的正则化-优快云博客

本文链接：https://blog.youkuaiyun.com/hpdlzu80100/article/details/144882989

Regularizing your Neural Network神经网络的正则化

Reading: Clarification about Upcoming Regularization Video

Regularization正则化

If you suspect your neural network is over fitting your data, that is, you have a high variance problem, one of the first things you should try is probably regularization. The other way to address high variance is to get more training data that's also quite reliable. But you can't always get more training data, or it could be expensive to get more data. But adding regularization will often help to prevent overfitting, or to reduce variance in your network.

So let's see how regularization works. Let's develop these ideas using logistic regression. Recall that for logistic regression, you try to minimize the cost function J, which is defined as this cost function. Some of your training examples of the losses of the individual predictions in the different examples, where you recall that w and b in the logistic regression, are the parameters. So w is an x-dimensional parameter vector, and b is a real number. And so, to add regularization to logistic regression, what you do is add to it, this thing, lambda, which is called the regularization parameter. I'll say more about that in a second. But lambda over 2m times the norm of w squared. So here, the norm of w squared, is just equal to sum from j equals 1 to nx of wj squared, or this can also be written w, transpose w, it's just a square Euclidean norm of the prime to vector w. And this is called L2 regularization. Because here, you're using the Euclidean norm, also it's called the L2 norm with the parameter vector w. Now, why do you regularize just the parameter w? Why don't we add something here, you know, about b as well? In practice, you could do this, but I usually just omit this. Because if you look at your parameters, w is usually a pretty high dimensional parameter vector, especially with a high variance problem. Maybe w just has a lot of parameters, so you aren't fitting all the parameter