深度学习从零开始：手把手教你构建神经网络系统-优快云博客

深度学习从零开始：手把手教你构建神经网络系统

【免费下载链接】deep-learning-from-scratch 『ゼロから作る Deep Learning』(O'Reilly Japan, 2016) 项目地址: https://gitcode.com/gh_mirrors/deepl/deep-learning-from-scratch

还在为深度学习的神秘黑盒感到困惑？想要真正理解神经网络的工作原理而不是仅仅调用API？本文将带你从零开始，一步步构建完整的深度学习系统，彻底掌握神经网络的核心机制。

读完本文你将获得：

✅ 神经网络基础组件的完整实现
✅ 从感知机到卷积网络的演进路线
✅ 梯度下降和反向传播的数学原理
✅ 实战MNIST手写数字识别项目
✅ 性能优化和超参数调优技巧

项目概述

《深度学习从零开始》是一个完整的深度学习教学项目，采用Python和NumPy实现，不依赖任何深度学习框架。项目包含8个章节，从最简单的逻辑门到复杂的卷积神经网络，循序渐进地构建深度学习知识体系。

mermaid

环境搭建与准备

系统要求

Python 3.6+
NumPy 1.19+
Matplotlib 3.3+

安装依赖

pip install numpy matplotlib

第一章：Python基础与NumPy入门

项目从最基础的Python类和NumPy操作开始，为后续的神经网络实现打下坚实基础。

# ch01/man.py - 基础类示例
class Man:
    """示例类"""
    def __init__(self, name):
        self.name = name
        print("初始化完成!")

    def hello(self):
        print("你好 " + self.name + "!")

    def goodbye(self):
        print("再见 " + self.name + "!")

# 使用示例
m = Man("张三")
m.hello()
m.goodbye()

第二章：感知机与逻辑门

感知机是神经网络的基础构建块，本章实现基本的逻辑门运算。

AND门实现

# ch02/and_gate.py
import numpy as np

def AND(x1, x2):
    x = np.array([x1, x2])
    w = np.array([0.5, 0.5])  # 权重
    b = -0.7                  # 偏置
    tmp = np.sum(w*x) + b
    return 1 if tmp > 0 else 0

# 测试所有输入组合
for xs in [(0, 0), (1, 0), (0, 1), (1, 1)]:
    y = AND(xs[0], xs[1])
    print(f"{xs} -> {y}")

逻辑门性能对比

逻辑门	输入(0,0)	输入(1,0)	输入(0,1)	输入(1,1)	实现复杂度
AND门	0	0	0	1	简单
OR门	0	1	1	1	简单
NAND门	1	1	1	0	简单
XOR门	0	1	1	0	需要多层

第三章：神经网络基础

激活函数实现

神经网络的核心是非线性激活函数，项目实现了多种常用激活函数：

# common/functions.py
import numpy as np

def sigmoid(x):
    """Sigmoid激活函数"""
    return 1 / (1 + np.exp(-x))

def relu(x):
    """ReLU激活函数"""
    return np.maximum(0, x)

def softmax(x):
    """Softmax函数"""
    if x.ndim == 2:
        x = x.T
        x = x - np.max(x, axis=0)
        y = np.exp(x) / np.sum(np.exp(x), axis=0)
        return y.T

    x = x - np.max(x)  # 防止溢出
    return np.exp(x) / np.sum(np.exp(x))

MNIST手写数字识别

# ch03/neuralnet_mnist.py
import numpy as np
from dataset.mnist import load_mnist
from common.functions import sigmoid, softmax

def predict(network, x):
    """三层神经网络前向传播"""
    W1, W2, W3 = network['W1'], network['W2'], network['W3']
    b1, b2, b3 = network['b1'], network['b2'], network['b3']

    a1 = np.dot(x, W1) + b1
    z1 = sigmoid(a1)
    a2 = np.dot(z1, W2) + b2
    z2 = sigmoid(a2)
    a3 = np.dot(z2, W3) + b3
    y = softmax(a3)

    return y

# 加载数据和预训练权重
x, t = get_data()
network = init_network()

# 计算准确率
accuracy_cnt = 0
for i in range(len(x)):
    y = predict(network, x[i])
    p = np.argmax(y)
    if p == t[i]:
        accuracy_cnt += 1

print(f"准确率: {float(accuracy_cnt) / len(x):.4f}")

第四章：梯度下降与优化

数值梯度计算

# common/gradient.py
def numerical_gradient(f, x):
    """数值梯度计算"""
    h = 1e-4  # 0.0001
    grad = np.zeros_like(x)
    
    for idx in range(x.size):
        tmp_val = x[idx]
        
        # f(x+h)
        x[idx] = tmp_val + h
        fxh1 = f(x)
        
        # f(x-h)
        x[idx] = tmp_val - h
        fxh2 = f(x)
        
        grad[idx] = (fxh1 - fxh2) / (2 * h)
        x[idx] = tmp_val  # 还原值
        
    return grad

优化器实现

项目实现了多种优化算法：

# common/optimizer.py
class SGD:
    """随机梯度下降"""
    def __init__(self, lr=0.01):
        self.lr = lr
        
    def update(self, params, grads):
        for key in params.keys():
            params[key] -= self.lr * grads[key]

class Adam:
    """Adam优化器"""
    def __init__(self, lr=0.001, beta1=0.9, beta2=0.999):
        self.lr = lr
        self.beta1 = beta1
        self.beta2 = beta2
        self.iter = 0
        self.m = None
        self.v = None
        
    def update(self, params, grads):
        if self.m is None:
            self.m, self.v = {}, {}
            for key, val in params.items():
                self.m[key] = np.zeros_like(val)
                self.v[key] = np.zeros_like(val)
        
        self.iter += 1
        lr_t = self.lr * np.sqrt(1.0 - self.beta2**self.iter) / (1.0 - self.beta1**self.iter)
        
        for key in params.keys():
            self.m[key] = self.beta1 * self.m[key] + (1 - self.beta1) * grads[key]
            self.v[key] = self.beta2 * self.v[key] + (1 - self.beta2) * (grads[key]**2)
            
            params[key] -= lr_t * self.m[key] / (np.sqrt(self.v[key]) + 1e-7)

第五章：误差反向传播

反向传播是神经网络训练的核心算法，通过链式法则高效计算梯度。

mermaid

层抽象实现

# common/layers.py
class Affine:
    """全连接层"""
    def __init__(self, W, b):
        self.W = W
        self.b = b
        self.x = None
        self.dW = None
        self.db = None
        
    def forward(self, x):
        self.x = x
        out = np.dot(x, self.W) + self.b
        return out
        
    def backward(self, dout):
        dx = np.dot(dout, self.W.T)
        self.dW = np.dot(self.x.T, dout)
        self.db = np.sum(dout, axis=0)
        return dx

第六章：超参数优化与正则化

权重初始化策略

# 权重初始化比较
def weight_init_compare():
    """不同权重初始化方法对比"""
    # Xavier初始化
    node_num = 100
    w = np.random.randn(node_num, node_num) * np.sqrt(1.0 / node_num)
    
    # He初始化
    w = np.random.randn(node_num, node_num) * np.sqrt(2.0 / node_num)

Dropout正则化

class Dropout:
    """Dropout层"""
    def __init__(self, dropout_ratio=0.5):
        self.dropout_ratio = dropout_ratio
        self.mask = None
        
    def forward(self, x, train_flg=True):
        if train_flg:
            self.mask = np.random.rand(*x.shape) > self.dropout_ratio
            return x * self.mask
        else:
            return x * (1.0 - self.dropout_ratio)
            
    def backward(self, dout):
        return dout * self.mask

第七章：卷积神经网络

CNN架构实现

# ch07/simple_convnet.py
class SimpleConvNet:
    """简单卷积神经网络"""
    def __init__(self, input_dim=(1, 28, 28), 
                 conv_param={'filter_num':30, 'filter_size':5, 'pad':0, 'stride':1},
                 hidden_size=100, output_size=10, weight_init_std=0.01):
        
        # 卷积层参数
        filter_num = conv_param['filter_num']
        filter_size = conv_param['filter_size']
        filter_pad = conv_param['pad']
        filter_stride = conv_param['stride']
        
        # 计算输出尺寸
        conv_output_size = (input_size - filter_size + 2*filter_pad) / filter_stride + 1
        pool_output_size = int(filter_num * (conv_output_size/2) * (conv_output_size/2))
        
        # 初始化权重
        self.params = {}
        self.params['W1'] = weight_init_std * np.random.randn(filter_num, input_dim[0], filter_size, filter_size)
        self.params['b1'] = np.zeros(filter_num)
        self.params['W2'] = weight_init_std * np.random.randn(pool_output_size, hidden_size)
        self.params['b2'] = np.zeros(hidden_size)
        self.params['W3'] = weight_init_std * np.random.randn(hidden_size, output_size)
        self.params['b3'] = np.zeros(output_size)
        
        # 构建网络层
        self.layers = OrderedDict()
        self.layers['Conv1'] = Convolution(self.params['W1'], self.params['b1'], conv_param['stride'], conv_param['pad'])
        self.layers['Relu1'] = Relu()
        self.layers['Pool1'] = Pooling(pool_h=2, pool_w=2, stride=2)
        self.layers['Affine1'] = Affine(self.params['W2'], self.params['b2'])
        self.layers['Relu2'] = Relu()
        self.layers['Affine2'] = Affine(self.params['W3'], self.params['b3'])
        
        self.last_layer = SoftmaxWithLoss()

卷积操作实现

class Convolution:
    """卷积层"""
    def __init__(self, W, b, stride=1, pad=0):
        self.W = W  # 滤波器权重
        self.b = b  # 偏置
        self.stride = stride
        self.pad = pad
        
    def forward(self, x):
        FN, C, FH, FW = self.W.shape
        N, C, H, W = x.shape
        
        # 计算输出尺寸
        out_h = int(1 + (H + 2*self.pad - FH) / self.stride)
        out_w = int(1 + (W + 2*self.pad - FW) / self.stride)
        
        # im2col展开
        col = im2col(x, FH, FW, self.stride, self.pad)
        col_W = self.W.reshape(FN, -1).T
        
        out = np.dot(col, col_W) + self.b
        out = out.reshape(N, out_h, out_w, -1).transpose(0, 3, 1, 2)
        
        return out

第八章：深度学习实战与优化

深度网络架构

# ch08/deep_convnet.py
class DeepConvNet:
    """深度卷积网络"""
    def __init__(self, input_dim=(1, 28, 28),
                 conv_param_1 = {'filter_num':16, 'filter_size':3, 'pad':1, 'stride':1},
                 conv_param_2 = {'filter_num':16, 'filter_size':3, 'pad':1, 'stride':1},
                 conv_param_3 = {'filter_num':32, 'filter_size':3, 'pad':1, 'stride':1},
                 conv_param_4 = {'filter_num':32, 'filter_size':3, 'pad':2, 'stride':1},
                 conv_param_5 = {'filter_num':64, 'filter_size':3, 'pad':1, 'stride':1},
                 hidden_size=50, output_size=10):
        
        # 更深的网络结构
        self.layers = OrderedDict()
        self.layers['Conv1'] = Convolution(self.params['W1'], self.params['b1'], conv_param_1['stride'], conv_param_1['pad'])
        self.layers['Relu1'] = Relu()
        self.layers['Conv2'] = Convolution(self.params['W2'], self.params['b2'], conv_param_2['stride'], conv_param_2['pad'])
        self.layers['Relu2'] = Relu()
        self.layers['Pool1'] = Pooling(pool_h=2, pool_w=2, stride=2)
        # ... 更多层

训练流程优化

# common/trainer.py
class Trainer:
    """训练器类"""
    def __init__(self, network, x_train, t_train, x_test, t_test,
                 epochs=20, mini_batch_size=100,
                 optimizer='SGD', optimizer_param={'lr':0.01}, 
                 evaluate_sample_num_per_epoch=None, verbose=True):
        
        self.network = network
        self.x_train = x_train
        self.t_train = t_train
        self.x_test = x_test
        self.t_test = t_test
        
    def train(self):
        """训练循环"""
        for epoch in range(self.epochs):
            # 随机打乱数据
            batch_mask = np.random.choice(self.x_train.shape[0], self.mini_batch_size)
            x_batch = self.x_train[batch_mask]
            t_batch = self.t_train[batch_mask]
            
            # 计算梯度
            grads = self.network.gradient(x_batch, t_batch)
            
            # 更新参数
            self.optimizer.update(self.network.params, grads)
            
            # 评估性能
            if epoch % self.evaluate_sample_num_per_epoch == 0:
                train_acc = self.network.accuracy(self.x_train, self.t_train)
                test_acc = self.network.accuracy(self.x_test, self.t_test)
                print(f"Epoch {epoch}: Train Acc={train_acc:.4f}, Test Acc={test_acc:.4f}")

性能对比与最佳实践

不同网络架构性能对比

网络类型	参数量	MNIST准确率	训练时间	适用场景
全连接网络	~50K	98.2%	快速	简单分类
简单CNN	~100K	99.1%	中等	图像识别
深度CNN	~1M	99.4%	慢速	复杂任务

超参数调优指南

# 超参数优化示例
def hyperparameter_optimization():
    """超参数网格搜索"""
    weight_decay = 10 ** np.random.uniform(-8, -4)
    lr = 10 ** np.random.uniform(-6, -2)
    
    # 创建网络实例
    network = MultiLayerNet(input_size=784,
                           hidden_size_list=[100, 100, 100, 100],
                           output_size=10,
                           weight_decay_lambda=weight_decay)
    
    # 创建优化器
    optimizer = SGD(lr=lr)
    
    # 训练并评估
    trainer = Trainer(network, x_train, t_train, x_test, t_test,
                     epochs=5, mini_batch_size=100,
                     optimizer=optimizer, verbose=False)
    trainer.train()
    
    return trainer.test_acc_list[-1]

总结与展望

通过这个从零开始的深度学习项目，我们不仅实现了完整的神经网络系统，更重要的是深入理解了每个组件的工作原理。从最简单的逻辑门到复杂的卷积网络，每一步都体现了深度学习的基本原理。

关键收获

数学基础：深入理解梯度下降、反向传播的数学原理
工程实践：掌握了神经网络实现的工程细节和优化技巧
系统思维：学会了如何设计和构建完整的深度学习系统
调试能力：具备了分析和解决神经网络训练问题的能力

下一步学习建议

尝试实现更复杂的网络架构（如ResNet、Transformer）
探索不同的应用领域（自然语言处理、语音识别）
学习模型压缩和部署优化技术
参与开源项目贡献，积累实战经验

这个项目为深度学习学习提供了坚实的基础，让你不再是被动使用框架的"调包侠"，而是真正理解并能够创造神经网络系统的工程师。

如果觉得本文对你有帮助，请点赞、收藏、关注三连支持！后续将继续分享更多深度学习实战内容。

【免费下载链接】deep-learning-from-scratch 『ゼロから作る Deep Learning』(O'Reilly Japan, 2016) 项目地址: https://gitcode.com/gh_mirrors/deepl/deep-learning-from-scratch

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考