深度学习：自编码实现L层神经网络-优快云博客

本文链接：https://blog.youkuaiyun.com/econe_wei/article/details/90636304

本文介绍如何仅使用numpy从头实现深度神经网络。内容涵盖初始化、前向传播、反向传播和参数更新，涉及sigmoid和ReLU激活函数。通过线性前向、线性激活前向及L层模型的建立，详细阐述了神经网络的运作过程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

简介

只用numpy库，从底层实现深度神经网络。底层的数学逻辑可参见吴恩达的深度学习。
温馨建议：
为了便于对整体进行观察，把主要子函数的输入输出列写如下，可快速了解各函数如何相互作用。
可结合整体再深入细节看每个函数的具体实现

parameters = initialize_parameters_deep(layer_dims)

# forward propagation
Z, cache = linear_forward(A, W, b)
A, cache = linear_activation_forward(A_prev, W, b, activation)
AL, caches = L_model_forward(X, parameters)

# cost function
cost = compute_cost(AL, Y)

# backward propagation
dA_prev, dW, db = linear_activation_backward(dA, cache, activation)
grads = L_model_backward(AL, Y, caches)
parameters = update_parameters(parameters, grads, learning_rate)

# compute sigmoid and ReLU function, and corresponding dZ
A, cache = sigmoid(Z)
A, cache = relu(Z)
dZ = relu_backward(dA, cache)
dZ = sigmoid_backward(dA, cache)

1 - Packages

import numpy as np

2 - Outline of the Assignment

在这里插入图片描述

实现流程

3 - Initialization

3.2 - L-layers Neural Network

$n^{[l]}$ 表示第 $l$ 层的单元数（units）。
假如输入 $X$ 的大小是(12288, 209)（ $m = 209$ examples），那么:
在这里插入图片描述
Initialization of a L-layers Neural Network

def initialize_parameters_deep(layers_dims):
    """
    input:
    layers_dims -- python list,维度矩阵. 
                   eg.layers_dims=[2,3,2]: input layers 有 2个 units，包含3个unit的一个hidden layers，output layer has 2 units
    output/return:
    parameters -- pathon dictiionary, initialize parameters containing parameters:
                  Wl : ['W' + str(l)]
                  bl : ['b' + str(l)] 
    """
    np.random.seed(3)
    parameters = {
   }         # 先申明dict，然后利用 for loop 在 dict 中添加 key
    L = len(layers_dims)    # 层的维度的个数即是层的个数
    
    for l in range(1, L):
        parameters["W" + str(l)] = np.random.rand(layers_dims[l-1], layers_dims[l])  #  layers_dims[l]:第l层的units
        parameters["b" + str(l)] = np.zeros(( layers_dims[l], 1))
        
        # 验证 parameters 的 shape
        assert(parameters["W" + str(l)].shape == ( layers_dims[l-1], layers_dims[l]))
        assert(parameters["b" + str(l)].shape == ( layers_dims[l], ))
    
    return parameters

4 - Forward propagation module

4.1 - Linear Forward

The linear forward 函数 (vectorized over all the examples) 计算下面的等式:
$Z^{[l]} = W^{[l]}A^{[l-1]} + b^{[l]}$
where $A^{[0]} = X$

def linear_forward(A, W, b):
    """
    input:
    A -- 前一层的activations,(or input data X): (size of previous layer, numbel of examples)
    W -- weight matrix: 矩阵 shape (size of current layer, size of previous layer)
    b -- bias vector, 矩阵 shape (size of current layer, 1)
    
    output/return:
    Z -- the input of activations function(前激活参数)
    cache -- python dictionary,containing A ,W, b.  存储在cache中，用于计算后向传播过程
    """
    Z = np.dot(W, A) + b      # broadcasting rule
    
    assert(Z.shape == (W.shape[0], A.shape[1]))
    cache = (A, W, b)
    
    return Z, cache

4.2 - Linear Activation Forward

在整个网络中，使用两种activation functions:

Sigmoid: $σ(Z)=σ(WA+b)=11+e−(WA+b)\sigma(Z) = \sigma(W A + b) = \frac{1} {1 + e^{-(W A + b)}}$ . 已定义好的sigmoid函数返回两个参数: the activation value “A” 和 a “cache” 存储变量 “Z” （作为相关后向传播函数的输入）。 To use it following: