Deep Leaning 学习笔记之神经网络（4）—— 多隐藏层神经网络

最新推荐文章于 2025-06-08 01:43:32 发布

Aperact

最新推荐文章于 2025-06-08 01:43:32 发布

阅读量3.8k

点赞数

CC 4.0 BY-SA版权

分类专栏： DeepLearning

本文链接：https://blog.youkuaiyun.com/m0_37108127/article/details/94571421

这篇博客详细介绍了深度学习中的多隐藏层神经网络，从预备函数开始，包括初始化参数、前向传播过程（计算Z值、激励值及其缓存），接着讨论了计算成本函数，重点讲解了反向传播的各个步骤，如LINEAR和LINEAR-ACTIVATION的反向传播，并给出了计算梯度的公式，最后阐述了L层模型的反向传播及参数更新方法。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1 预备函数

按照一次迭代来说
输入→计算linear，缓存linear_cache→计算action，缓存action_cache→反向传播→更新参数

1.1 初始化参数

# GRADED FUNCTION: initialize_parameters_deep

def initialize_parameters_deep(layer_dims):
    """
    Arguments:
    layer_dims -- python array (list) containing the dimensions of each layer in our network
    
    Returns:
    parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
                    Wl -- weight matrix of shape (layer_dims[l], layer_dims[l-1])
                    bl -- bias vector of shape (layer_dims[l], 1)
    """
    
    np.random.seed(3)
    parameters = {}
    L = len(layer_dims)            # number of layers in the network

    for l in range(1, L):
        ### START CODE HERE ### (≈ 2 lines of code)
        parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1]) * 0.01
        parameters['b' + str(l)] = np.zeros((layer_dims[l],1))
        ### END CODE HERE ###
        
        assert(parameters['W' + str(l)].shape == (layer_dims[l], layer_dims[l-1]))
        assert(parameters['b' + str(l)].shape == (layer_dims[l], 1))

        
    return parameters

1.2 前向传播

1.2.1 前向传播——计算Z值并缓存对应的A,W,b

# GRADED FUNCTION: linear_forward

def linear_forward(A, W, b):
    """
    实现前向传播的线性部分——linear部分

    参数:：A , W, b
    返回：
    Z -- 激活函数的输入，也称为预激活参数
    cache -- 包含“A”、“W”和“b”的python元组;存储用于有效地计算向后传递
    """
    
    ### START CODE HERE ### (≈ 1 line of code)
    Z = np.dot(W,A)+b
    ### END CODE HERE ###
    
    assert(Z.shape == (W.shape[0], A.shape[1]))
    cache = (A, W, b)
    
    return Z, cache

1.2.2 前向传播——计算激励值并缓存

linear_cache 是线性缓存，缓存Z值对应的A,W,b

activation_cache 是激励缓存，缓存的每一层的激励值

# GRADED FUNCTION: linear_activation_forward

def linear_activation_forward(A_prev, W, b, activation):
    """
    Implement the forward propagation for the LINEAR->ACTIVATION layer

    Arguments:
    A_prev -- activations from previous layer (or input data): (size of previous layer, number of examples)
    W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)
    b -- bias vector, numpy array of shape (size of the current layer, 1)
    activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"

    Returns:
    A -- the output of the activation function, also called the post-activation value 
    cache -- a python tuple containing "linear_cache" and "activation_cache";
             stored for computing the backward pass efficiently
    """
    
    if activation == "sigmoid":
        # Inputs: "A_prev, W, b". Outputs: "A, activation_cache".
        ### START CODE HERE ### (≈ 2 lines of code)
        Z, linear_cache = linear_forward(A_prev,W,b)
        A, activation_cache = sigmoid(Z)
        ### END CODE HERE ###
    
    elif activation == "relu":
        # Inputs: "A_prev, W, b". Outputs: "A, activation_cache".
        ### START CODE HERE ### (≈ 2 lines of code)
        Z, linear_cache = linear_forward(A_prev,W,b)
        A, activation_cache = relu(Z)
        ### END CODE HERE ###
    
    assert (A.shape == (W.shape[0], A_prev.shape[1]))
    cache = (linear_cache, activation_cache)

    return A, cache

1.2.3 前向传播——计算整体NN

# GRADED FUNCTION: L_model_forward

def L_model_forward(X, parameters):
    """
# 本示例中实现了前向传播，前L-1层为relu函数，第L层为sigmoid函数。
# 每一层都要缓存cache
# 返回参数：
## AL：最后一层的激励值
## caches：缓存的参数，L个，下标为0-(L-1)
    """
caches = []
    A = X
    L = len(parameters) // 2                  # number of layers in the neural network
    
    # 实现前L-1层的激励值并缓存，for循环，（1，L）不包含L层
    for l in range(1, L):
        A_prev = A 
        ###---------- START CODE HERE -------------### (≈ 2 lines of code)
        A, cache = linear_activation_forward(A_prev, parameters['W'+str(l)], parameters['b'+str(l)], activation = 'relu')
        caches.append(cache)
        ###---------- END CODE HERE ---------- ###
    
    # 实现第L层的sigmoid并缓存
    ### ----------START CODE HERE---------- ### (≈ 2 lines of code)
    AL, cache = linear_activation_forward(A, parameters['W'+str(L)], parameters['b'+str(L)], activation = 'sigmoid')
    caches.append(cache)
    ### ----------END CODE HERE---------- ###
    
    assert(AL.shape == (1,X.shape[1]))
            
    return AL, caches

1.3 计算costFunction

现在开始实现前向传播和反向传播，并且计算cost ，因为我想知道我的模型是否真的在学习

Compute the cross-entropy cost $J$ , using the following formula: $(7)−1m∑i=1m(y(i)log⁡(a[L](i))+(1−y(i))log⁡(1−a[L](i)))-\frac{1}{m} \sum\limits_{i = 1}^{m} (y^{(i)}\log\left(a^{[L] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[L](i)}\right)) \tag{7}$