深度神经网络构建指南：从零开始实现多层神经网络-优快云博客

深度神经网络构建指南：从零开始实现多层神经网络

deep-learning-coursera Deep Learning Specialization by Andrew Ng on Coursera. 项目地址: https://gitcode.com/gh_mirrors/de/deep-learning-coursera

深度神经网络是机器学习领域最强大的工具之一，能够解决复杂的非线性问题。本文将带你一步步实现一个完整的深度神经网络，涵盖参数初始化、前向传播、反向传播等重要模块的实现。

神经网络基础概念回顾

在开始构建之前，我们需要明确几个关键概念：

网络层(Layer)：神经网络由多个层组成，包括输入层、隐藏层和输出层
激活函数(Activation Function)：引入非线性因素，常见的有ReLU和Sigmoid
前向传播(Forward Propagation)：数据从输入层流向输出层的过程
反向传播(Backward Propagation)：误差从输出层反向传播以更新参数的过程

参数初始化

两层网络参数初始化

对于简单的两层神经网络(输入层→隐藏层→输出层)，我们需要初始化两组参数：

def initialize_parameters(n_x, n_h, n_y):
    np.random.seed(1)
    W1 = np.random.randn(n_h, n_x) * 0.01  # 隐藏层权重
    b1 = np.zeros((n_h, 1))               # 隐藏层偏置
    W2 = np.random.randn(n_y, n_h) * 0.01  # 输出层权重
    b2 = np.zeros((n_y, 1))               # 输出层偏置
    
    parameters = {"W1": W1, "b1": b1, "W2": W2, "b2": b2}
    return parameters

这里使用随机初始化权重(乘以0.01缩小初始值)和零初始化偏置，这是深度学习中常见的初始化策略。

多层网络参数初始化

对于更深的L层网络，我们需要更通用的初始化方法：

def initialize_parameters_deep(layer_dims):
    np.random.seed(3)
    parameters = {}
    L = len(layer_dims)  # 网络层数
    
    for l in range(1, L):
        parameters['W' + str(l)] = np.random.randn(
            layer_dims[l], layer_dims[l-1]) * 0.01
        parameters['b' + str(l)] = np.zeros((layer_dims[l], 1))
        
    return parameters

其中layer_dims是一个列表，包含每层的神经元数量。例如[5,4,3]表示：

输入层5个神经元
隐藏层4个神经元
输出层3个神经元

前向传播实现

线性部分前向传播

线性计算是神经网络的基础操作，公式为： Z = WX + b

实现代码如下：

def linear_forward(A, W, b):
    Z = np.dot(W, A) + b
    cache = (A, W, b)
    return Z, cache

这里缓存(cache)了输入A、权重W和偏置b，将在反向传播时使用。

线性+激活前向传播

在神经网络中，线性变换后通常会接一个激活函数：

def linear_activation_forward(A_prev, W, b, activation):
    if activation == "sigmoid":
        Z, linear_cache = linear_forward(A_prev, W, b)
        A, activation_cache = sigmoid(Z)
    elif activation == "relu":
        Z, linear_cache = linear_forward(A_prev, W, b)
        A, activation_cache = relu(Z)
    
    cache = (linear_cache, activation_cache)
    return A, cache

多层网络前向传播

对于L层网络，前L-1层使用ReLU激活，最后一层使用Sigmoid激活：

def L_model_forward(X, parameters):
    caches = []
    A = X
    L = len(parameters) // 2  # 参数对的数量
    
    # 前L-1层使用ReLU
    for l in range(1, L):
        A_prev = A 
        A, cache = linear_activation_forward(
            A_prev, parameters['W'+str(l)], parameters['b'+str(l)], "relu")
        caches.append(cache)
    
    # 最后一层使用Sigmoid
    AL, cache = linear_activation_forward(
        A, parameters['W'+str(L)], parameters['b'+str(L)], "sigmoid")
    caches.append(cache)
    
    return AL, caches