Course 2 改善深层神经网络 Week 1 L2正则化和Dropout正则化（随机失活正则化）

深度学习正则化技巧：L2与Dropout实践

最新推荐文章于 2025-01-16 16:06:58 发布

原创

最新推荐文章于 2025-01-16 16:06:58 发布 · 509 阅读

4 ·

CC 4.0 BY-SA版权

文章标签：

#L2正则化 #Dropout正则化 #随机失活正则化

本文介绍了如何使用L2正则化和Dropout（随机失活）来改善深层神经网络的过拟合问题。通过实例展示了正则化模型的建立过程，包括数据导入、初始化参数、前向传播、反向传播等，以及L2正则化和Dropout的实现细节。实验证明，正则化能有效提高模型在测试集上的表现。

当神经网络过度拟合了数据，即存在高方差的问题。我们有几种方法可以解决：

正则化
1.1 L2正则化
如果正则化参数 $λ\lambda$ 设置得足够大，那么权重矩阵 $W$ 被设置到接近于0，也就是更多隐藏单元权重为0，从而消除这些单元的影响，也就简化了网络，使得网络从高方差向高偏差靠拢。
1.2 随机失活(DropOut)正则化
Dropout会遍历网络的每一层，并设置消除网络中节点的概率。即删除一些节点，使得节点更少，简化网络。也使得每个节点的权重降低，从而降低高偏差（过度拟合训练集）。
增加数据（Data Augumentation）
通过水平翻转，裁切图片使得数据增大
提早结束训练（Early Stopping）
运行梯度下降算法时，可以绘制验证集的训练误差或者代价函数 $J$ 的值，通过提早结束训练，选择一个大小适中的弗罗贝尼乌斯范数 $∑k∑jWk,j[l]2\sum\limits_k\sum\limits_j W_{k,j}^{[l]2}$ 。

一、建立正则化模型

问题描述：您刚刚被法国足球公司聘为AI专家。他们希望你推荐法国队的守门员应该踢球的位置，以便法国队的球员更加容易接到守门员传来的球。给出的是2维过去10场比赛中法国队的传球数据。
在这里插入图片描述

1.2 导入数据

数据包

import numpy as np
import matplotlib.pyplot as plt
import sklearn
import sklearn.datasets
import scipy.io as sio

导入数据

def load_2D_dataset(is_plot=True):
    data = sio.loadmat('datasets/data2.mat')
    train_X = data['X'].T
    train_Y = data['y'].T
    test_X = data['Xval'].T
    test_Y = data['yval'].T
    if is_plot:
        plt.scatter(train_X[0, :], train_X[1, :], c=np.squeeze(train_Y), s=40,
                    cmap=plt.cm.Spectral)  # 将c=train_Y改为c=np.squeeze(train_Y)

    return train_X, train_Y, test_X, test_Y

读取并绘制数据集

plt.rcParams['figure.figsize'] = (7.0, 4.0)  # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
# 读取数据
train_X, train_Y, test_X, test_Y = load_2D_dataset(is_plot=True)
plt.show()

在这里插入图片描述

在这里插入图片描述每一个点代表球落下的可能位置，蓝色代表己方球员会抢到球，红色代表对手的球员会抢到球，我们要做的就是使用模型来画出一条线，来找到我方球员可能抢到球的位置。

1.3 初始化参数

def initialize_parameters(layer_dims):
    """
    Arguments:
    layer_dims -- python array (list) containing the dimensions of each layer in our network

    Returns:
    parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
                    W1 -- weight matrix of shape (layer_dims[l], layer_dims[l-1])
                    b1 -- bias vector of shape (layer_dims[l], 1)
                    Wl -- weight matrix of shape (layer_dims[l-1], layer_dims[l])
                    bl -- bias vector of shape (1, layer_dims[l])

    Tips:
    - For example: the layer_dims for the "Planar Data classification model" would have been [2,2,1].
    This means W1's shape was (2,2), b1 was (1,2), W2 was (2,1) and b2 was (1,1). Now you have to generalize it!
    - In the for loop, use parameters['W' + str(l)] to access Wl, where l is the iterative integer.
    """

    np.random.seed(3)
    parameters = {
   
   }
    L = len(layer_dims)  # number of layers in the network

    for l in range(1, L):
        parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l - 1]) / np.sqrt(layer_dims[l - 1])
        parameters['b' + str(l)] = np.zeros((layer_dims[l], 1))

        assert (parameters['W' + str(l)].shape == (layer_dims[l], layer_dims[l - 1]))
        assert (parameters['b' + str(l)].shape == (layer_dims[l], 1))

    return parameters

1.4 前向传播

def sigmoid(x):
    """
    Compute the sigmoid of x

    Arguments:
    x -- A scalar or numpy array of any size.

    Return:
    s -- sigmoid(x)
    """
    s = 1 / (1 + np.exp(-x))
    return s
    
def relu(x):
    """
    Compute the relu of x

    Arguments:
    x -- A scalar or numpy array of any size.

    Return:
    s -- relu(x)
    """
    s = np.maximum(0, x)

    return s

def forward_propagation(X, parameters):
    """
    Implements the forward propagation (and computes the loss) presented in Figure 2.

    Arguments:
    X -- input dataset, of shape (input size, number of examples)
    Y -- true "label" vector (containing 0 if cat, 1 if non-cat)
    parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3":
                    W1 -- weight matrix of shape ()
                    b1 -- bias vector of shape ()
                    W2 -- weight matrix of shape ()
                    b2 -- bias vector of shape ()
                    W3 -- weight matrix of shape ()
                    b3 -- bias vector of shape ()

    Returns:
    loss -- the loss function (vanilla logistic loss)
    """

    # retrieve parameters
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    W3 = parameters["W3"]
    b3 = parameters["b3"]

    # LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID
    z1 = np.dot(W1, X) + b1
    a1 = relu(z1)
    z2 = np.dot(W2, a1) + b2
    a2 = relu(z2)
    z3 = np.dot(W3, a2) + b3
    a3 = sigmoid(z3)

    cache = (z1, a1, W1, b1, z2, a2, W2, b2, z3, a3, W3, b3)

    return a3, cache

1.5 计算成本

def compute_cost(a3, Y):
    """
    Implement the cost function

    Arguments:
    a3 -- post-activation, output of forward propagation
    Y -- "true" labels vector, same shape as a3

    Returns:
    cost - value of the cost function
    """
    m = Y.shape[1]

1.6 反向传播

def backward_propagation(X, Y, cache):
    """
    Implement the backward propagation presented in figure 2.

    Arguments:
    X -- input dataset, of shape (input size, number of examples)
    Y -- true "label" vector (containing 0 if cat, 1 if non-cat)
    cache -- cache output from forward_propagation()

    Returns:
    gradients -- A dictionary with the gradients with respect to each parameter, activation and pre-activation variables
    """
    m = X.shape[1]
    (z1, a1, W1, b1, z2, a2, W2, b2, z3, a3, W3, b3) = cache

    dz3 = 1. / m * (a3 - Y)
    dW3 = np.dot(dz3, a2.T)
    db3 = np.sum(dz3, axis=1, keepdims=True)

    da2 = np.dot(W3.T, dz3)
    dz2 = np.multiply(da2, np.int64(a2 > 0))  # relu_back
    dW2 = np.dot(dz2, a1.T)
    db2 = np.sum(dz2, axis=1, keepdims=True)

    da1 = np.dot(W2.T, dz2)
    dz1 = np.multiply(da1, np.int64(a1 > 0))
    dW1 = np.dot(dz1, X.T)
    db1 = np.sum(dz1, axis=1, keepdims=True)

    gradients = {
   
   "dz3": dz3, "dW3": dW3, "db3": db3,
                 "da2": da2, "dz2": dz2, "dW2": dW2, "db2": db2,
                 "da1": da1, "dz1": dz1, "dW1": dW1, "db1": db1}

    return gradients

1.7 更新参数

def update_parameters(parameters, grads, learning_rate):
    """
    Update parameters using gradient descent

    Arguments:
    parameters -- python dictionary containing your parameters
    grads -- python dictionary containing your gradients, output of n_model_backward

    Returns:
    parameters -- python dictionary containing your updated parameters
                  parameters['W' + str(i)] = ...
                  parameters['b' + str(i)] = ...
    """

    L = len(parameters) // 2  # number of layers in the neural networks

    # Update rule for each parameter
    for k in range(L):
        parameters["W" + str(k + 1)] = parameters["W" + str(k + 1)] - learning_rate * grads["dW" + str(k + 1)]
        parameters["b" + str(k + 1)] = parameters["b" + str(k + 1)] - learning_rate * grads["db" + str(k + 1)]

    return parameters

1.8 预测

def predict(X, y, parameters):
    """
    This function is used to predict the results of a  n-layer neural network.

    Arguments:
    X -- data set of examples you would like to label
    parameters -- parameters of the trained model

    Returns:
    p -- predictions for the given dataset X
    """

    m = X.shape[1]
    p = np.zeros((1, m), dtype=np.