《动手学深度学习》Pytorch 版学习笔记一：从预备知识到现代卷积神经网络_深度学习pytorch版第七章现代卷积神经网络笔记-优快云博客

本文链接：https://blog.youkuaiyun.com/andrew_1219/article/details/142746808

前言

笔者有一定的机器学习和深度学习理论基础，对 Pytorch 的实战还不够熟悉，打算入职前专项突击一下

本文内容为笔者学习《动手学深度学习》一书的学习笔记

主要记录了代码的实现和实现过程遇到的问题，不完全包括其理论知识

引用：

《动手学深度学习》

一、预备知识

1. 数据操作

1.1 入门

创建行向量

# 创建行向量
x = torch.arange(12)

tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

报错：module ‘numpy’ has no attribute ‘array’

解决方案：

numpy 版本过高，原版本为 1.21.5，使用以下命令安装 1.21.0 的 numpy

pip uninstall numpy

pip install numpy==1.21

张量的基本操作

# 元素的数量
print(x.numel())
# 张量的形状
print(x.shape)
# 更改形状
print(x.reshape(3, 4))

12
torch.Size([12])
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])

创建张量

# 全零张量
print(torch.zeros((2, 3, 4)))
# 全一张量
print(torch.ones((2, 3, 4)))
# 随机数张量
print(torch.rand((2, 3, 4)))
# 创建时初始化
print(torch.tensor([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]]))

tensor([[[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]],

        [[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]]])
tensor([[[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]])
tensor([[[0.2715, 0.4234, 0.4764, 0.5638],
         [0.0958, 0.8449, 0.0129, 0.3975],
         [0.4510, 0.2093, 0.6003, 0.6838]],

        [[0.7996, 0.2331, 0.8481, 0.6440],
         [0.6056, 0.7846, 0.6360, 0.6849],
         [0.0169, 0.4028, 0.7457, 0.1688]]])
tensor([[2, 1, 4, 3],
        [1, 2, 3, 4],
        [4, 3, 2, 1]])

1.2 运算符

基本运算

x = torch.tensor([1.0, 2, 4, 8])
y = torch.tensor([2, 2, 2, 2])
# 四则运算
print(x + y, x - y, x * y, x / y, x ** y)
# 自然指数
print(torch.exp(x))
# 求和
print(torch.sum(x))
# 逻辑运算
print(x == y)

tensor([ 3.,  4.,  6., 10.]) tensor([-1.,  0.,  2.,  6.]) tensor([ 2.,  4.,  8., 16.]) tensor([0.5000, 1.0000, 2.0000, 4.0000]) tensor([ 1.,  4., 16., 64.])
tensor([2.7183e+00, 7.3891e+00, 5.4598e+01, 2.9810e+03])
tensor(15.)
tensor([False,  True, False, False])

连接运算

# 张量连接
X = torch.arange(12, dtype=torch.float32).reshape((3,4))
Y = torch.tensor([[2.0, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
# 按行 (3, 4) 和 (3, 4) 连接成 (6, 4)
print(torch.cat((X, Y), dim=0))
# 按列 (3, 4) 和 (3, 4) 连接成 (3, 8)
print(torch.cat((X, Y), dim=1))

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [ 2.,  1.,  4.,  3.],
        [ 1.,  2.,  3.,  4.],
        [ 4.,  3.,  2.,  1.]])
tensor([[ 0.,  1.,  2.,  3.,  2.,  1.,  4.,  3.],
        [ 4.,  5.,  6.,  7.,  1.,  2.,  3.,  4.],
        [ 8.,  9., 10., 11.,  4.,  3.,  2.,  1.]])

1.3 广播机制

pytorch 允许不同维度的张量做运算

当两个张量满足以下规则时，允许将维度较小的张量广播至维度较大的张量：

从尾部的维度起，两个张量的维度：

相等
或 其中一个维度为1
或 其中一个维度不存在

# 广播机制
# 简单版
a = torch.arange(3).reshape((3, 1))
b = torch.arange(2).reshape((1, 2))
print(a)
print(b)
print(a + b, '\n')
# 复杂版
# 从尾部维度数起，d最后一个维度为1，c与d倒数第二个维度相等，d倒数第三个维度不存在。故可广播
c = torch.arange(12).reshape((2, 3, 2))
d = torch.arange(3).reshape((3, 1))
print(c)
print(d)
print(c + d)

tensor([[0],
        [1],
        [2]])
tensor([[0, 1]])
tensor([[0, 1],
        [1, 2],
        [2, 3]]) 

tensor([[[ 0,  1],
         [ 2,  3],
         [ 4,  5]],

        [[ 6,  7],
         [ 8,  9],
         [10, 11]]])
tensor([[0],
        [1],
        [2]])
tensor([[[ 0,  1],
         [ 3,  4],
         [ 6,  7]],

        [[ 6,  7],
         [ 9, 10],
         [12, 13]]])

1.4 索引和切片

用法同 numpy

# 索引和切片，同 numpy
X = torch.arange(12, dtype=torch.float32).reshape((3, 4))
print(X)
# 利用切片取值
print(X[-1])    # 相当于 X[X.shape[0] - 1]
print(X[1: 3])  # 左闭右开 X[a, b] 相当于 [a, b)
# 利用切片赋值
X[1, 2] = 9
print(X)
X[0:2, :] = 12
print(X)

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.]])
tensor([ 8.,  9., 10., 11.])
tensor([[ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.]])
tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  9.,  7.],
        [ 8.,  9., 10., 11.]])
tensor([[12., 12., 12., 12.],
        [12., 12., 12., 12.],
        [ 8.,  9., 10., 11.]])

1.5 节省内存

# 浪费内存的写法
X = X + Y
# 节省内存的写法
X += Y
X[:] = X + Y

1.6 转换为其它 Python 对象

使用 .numpy() 将 ndarray 转 tensor

使用 .item() 取单个元素为 Python 基本类型元素

A = X.numpy()
# ndarray 转 tensor
B = torch.tensor(A)
print(type(A), type(B))
# 使用 .item() 取单个元素为 Python 基本类型元素
a = torch.tensor([3.5])
print(a, a.item(), float(a), int(a))

<class 'numpy.ndarray'> <class 'torch.Tensor'>
tensor([3.5000]) 3.5 3.5 3

2. 数据预处理

2.1 读取数据集

利用 pandas 读取数据集

import os.path
import pandas as pd

data_file = os.path.join('..', 'datas', 'heart', 'heart.csv')
data = pd.read_csv(data_file)
print(data.head())

   age  sex  cp  trestbps  chol  ...  oldpeak  slope  ca        thal  target
0   63    1   1       145   233  ...      2.3      3   0       fixed       0
1   67    1   4       160   286  ...      1.5      2   3      normal       1
2   67    1   4       120   229  ...      2.6      2   2  reversible       0
3   37    1   3       130   250  ...      3.5      3   0      normal       0
4   41    0   2       130   204  ...      1.4      1   0      normal       0

[5 rows x 14 columns]

2.2 处理缺失值

详见：Pandas数据分析学习笔记- 掘金 (juejin.cn)

2.3 转换为张量

x, y = data.iloc[:, :-2], data.iloc[:, -1]
print(x.head())
print(y.head())
# 需要先转换为 ndarray 再转换为 tensor
X = torch.tensor(x.to_numpy(dtype=float))
Y = torch.tensor(y.to_numpy(dtype=float))
print(type(X), X.shape)
print(type(Y), Y.shape)

[5 rows x 14 columns]
   age  sex  cp  trestbps  chol  ...  thalach  exang  oldpeak  slope  ca
0   63    1   1       145   233  ...      150      0      2.3      3   0
1   67    1   4       160   286  ...      108      1      1.5      2   3
2   67    1   4       120   229  ...      129      1      2.6      2   2
3   37    1   3       130   250  ...      187      0      3.5      3   0
4   41    0   2       130   204  ...      172      0      1.4      1   0

[5 rows x 12 columns]
0    0
1    1
2    0
3    0
4    0
Name: target, dtype: int64
<class 'torch.Tensor'> torch.Size([303, 12])
<class 'torch.Tensor'> torch.Size([303])

3. 线性代数

标量、向量、矩阵、张量、张量运算性质、降维部分与 numpy 相似，故略过

求和操作及其应用

x = torch.arange(4, dtype=torch.float32)

# 求和
print(x, x.sum())
# 非降维求和
A = torch.arange(20, dtype=torch.float32).reshape(5, 4)
print(A)
print(A.sum(axis=0))
print(A / A.sum(axis=0))    # 利用广播归一化

tensor([0., 1., 2., 3.]) tensor(6.)
tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [12., 13., 14., 15.],
        [16., 17., 18., 19.]])
tensor([40., 45., 50., 55.])
tensor([[0.0000, 0.0222, 0.0400, 0.0545],
        [0.1000, 0.1111, 0.1200, 0.1273],
        [0.2000, 0.2000, 0.2000, 0.2000],
        [0.3000, 0.2889, 0.2800, 0.2727],
        [0.4000, 0.3778, 0.3600, 0.3455]])

矩阵、向量相关运算

x = torch.arange(4, dtype=torch.float32)
y = torch.ones(4, dtype = torch.float32)
# 点积
print(x, y, torch.dot(x, y))
# 矩阵-向量积
A = torch.arange(20, dtype=torch.float32).reshape(5, 4)
print(torch.mv(A, x))
# 矩阵乘法
B = torch.ones(4, 3)
print(torch.mm(A, B))

tensor([0., 1., 2., 3.]) tensor([1., 1., 1., 1.]) tensor(6.)
tensor([ 14.,  38.,  62.,  86., 110.])
tensor([[ 6.,  6.,  6.],
        [22., 22., 22.],
        [38., 38., 38.],
        [54., 54., 54.],
        [70., 70., 70.]])

范数

# 向量的 L2-范数 及 矩阵的 F-范数
u = torch.tensor([3.0, -4.0])
print(torch.norm(u))
A = torch.ones((4, 9))
print(torch.norm(A))
# L1-范数
print(torch.abs(u).sum())

tensor(5.)
tensor(6.)
tensor(7.)

4. 微积分

绘制图线

参考资料：
xscale 和 yscale 的使用：坐标轴刻度 — Matplotlib 3.9.0 文档 - Matplotlib 中文
ptl.gca 的含义：matplotlib plt.gca()学习-优快云博客
fmts 详解：matplotlib.pyplot中的plot函数
fig、axes 等的含义：plt、fig、axes、axis的含义_fig, axes-优快云博客

绘制图线的函数：

from matplotlib import pyplot as plt
import numpy as np

def plot(X, Y=None, xlabel=None, ylabel=None, legend=None, xlim=None,
         ylim=None, xscale='linear', yscale='linear',
         fmts=('-', 'm--', 'g-.', 'r:'), figsize=(3.5, 2.5)):
    """
    :param X: 自变量
    :param Y: 因变量
    :param xlabel: 自变量的名称
    :param ylabel: 因变量的名称
    :param legend: 图例
    :param xlim: X轴的取值范围
    :param ylim: Y轴的取值范围
    :param xscale: X轴的缩放方式，默认为 linear
    :param yscale: Y轴的缩放方式，默认为 linear
    :param fmts: 图线的类型，默认 '-'为实线, 'm--'为红色虚线, 'g-.'为绿色点划线, 'r:'为红色点线
    :param figsize: 整张图像的大小
    :param axes: 已有的图像，默认为 None
    :return:
    """
    # 确定图像大小
    plt.figure(figsize=figsize)
    # 确定坐标轴
    if xlim is not None:
        plt.xlim(xlim)
    if ylim is not None: 
        plt.ylim(ylim)
    # label为标记
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    # scale为缩放方式
    plt.xscale(xscale)
    plt.yscale(yscale)
    # plot为绘制图像的函数，，scale为缩放方式
    for x, y, fmt in zip(X, Y, fmts):
        plt.plot(x, y, fmt)

    # 将标记绘制图例
    plt.legend(legend)
    plt.show()
    plt.close()

练习一：绘制函数 y = f(x) = x^3 - 1/x 及其在 x = 1 处切线的图像

"""
练习一
绘制函数 y = f(x) = x^3 - 1/x 及其在 x = 1 处切线的图像

f(1) = 0
f'(x) = 3x^2 + 1/x^2    f'(1) = 3
那么，x = 1 处切线方程为 y = 3x - 3
"""


def f(x):
    return x**3 - 1/x


x = np.arange(0.1, 3, 0.1)
plot(X=[x, x], Y=[f(x), 3 * x - 3],
     xlabel='x', ylabel='f(x)',
     legend=['f(x)', 'Tangent Line(x=1)'])

在这里插入图片描述

5. 自动微分

5.1 基本用法

import torch

# 设置自动微分
# 方式一：定义时设置
x = torch.arange(4.0, requires_grad=True)
# 方式二：.requeres_grad(True）
# x = torch.arange(4.0)
# x.requires_grad(True)

y = 2 * torch.dot(x, x)
print(y)    # 此时，y 是一个计算图

# 可对 y 求导
y.backward()
print(x.grad)

# 默认梯度会积累，一般需要将梯度清空
x.grad.zero_()
print(x.grad)

tensor(28., grad_fn=<MulBackward0>)
tensor([ 0.,  4.,  8., 12.])
tensor([0., 0., 0., 0.])

5.2 非标量的反向传播

$\odot x$

得

$\frac{\partial y}{\partial x_i} = 2 x_i$

# 非标量的反向传播 需要 .sum() 求和
x = torch.arange(4.0, requires_grad=True)
y = x * x
y.sum().backward()
print(x.grad)
x.grad.zero_()

tensor([0., 2., 4., 6.])

5.3 分离计算

当 z = y * x ，y = x * x 时，并且我们希望将 y 视为常数，只考虑到 x 在 y 被计算后发挥的作用

需要分离 y 获得一个新变量 u，丢弃计算图中如何计算 y 的信息

# 分离计算
# 没有使用分离计算
x = torch.arange(4.0, requires_grad=True)
y = x * x
z = y * x
z.sum().backward()
print(x.grad, x.grad == y)
# 使用分离计算
x = torch.arange(4.0, requires_grad=True)
y = x * x
u = y.detach()
z = u * x
z.sum().backward()
print(x.grad, x.grad == u)

tensor([ 0.,  3., 12., 27.]) tensor([ True, False, False, False])
tensor([0., 1., 4., 9.]) tensor([True, True, True, True])

5.4 Python 控制流得梯度计算

# f(a) 是关于 a 得分段线性函数
def f(a):
    b = a * 2
    while b.norm() < 1000:
        b = b * 2
    if b.sum() > 0:
        c = b
    else:
        c = 100 * b
    return c


a = torch.randn(size=(), requires_grad=True)
d = f(a)
d.backward()
print(a.grad == d / a)

tensor(True)

6. 概率

参考资料：Pytorch中的多项分布multinomial.Multinomial().sample()解析 - 知乎 (zhihu.com)

probs = torch.ones(6)
# total_count 为抽样次数，probs为样本，是一个tensor
multinomial_distribution = multinomial.Multinomial(total_count=1, probs=probs)

# 采样
print(multinomial_distribution.sample())
# 对数概率分布
print(multinomial_distribution.logits)

tensor([0., 0., 1., 0., 0., 0.])
tensor([-1.7918, -1.7918, -1.7918, -1.7918, -1.7918, -1.7918])

二、线性神经网络

1. 线性回归从零开始实现

1.1 生成数据集

import torch
from matplotlib import pyplot as plt

def synthetic_data(w, b, num_examples):  #@save
    """生成y=Xw+b+噪声"""
    X = torch.normal(0, 1, (num_examples, len(w)))
    y = torch.matmul(X, w) + b
    y += torch.normal(0, 0.01, y.shape)
    return X, y.reshape((-1, 1))


true_w = torch.tensor([2, -3.4])
true_b = 4.2
features, labels = synthetic_data(true_w, true_b, 1000)

# 观察第二个维度与标签的关系
plt.scatter(features[:, 1].detach().numpy(), labels.detach().numpy())
plt.show()

在这里插入图片描述

1.2 读取数据集

# 小批量读取数据集
def data_iter(batch_size, features, labels):
    num_examples = len(features)
    indices = list(range(num_examples))     # 样本下标
    random.shuffle(indices)                 # 打乱顺序
    for i in range(0, num_examples, batch_size):
        batch_indices = torch.tensor(
            indices[i: min(i + batch_size, num_examples)])
        yield features[batch_indices], labels[batch_indices]

batch_size = 5
for X, y in data_iter(batch_size, features, labels):
    print(X, '\n', y)
    break

tensor([[ 1.0637,  0.3883],
        [ 1.3318,  0.7545],
        [ 1.0563,  1.2710],
        [-0.6162, -0.2641],
        [ 0.2506,  1.1129]]) 
 tensor([[5.0095],
        [4.3107],
        [1.9846],
        [3.8708],
        [0.9319]])

1.3 模型定义和训练

参考资料：

with torch.no_grad() 的作用：【pytorch】 with torch.no_grad():用法详解_pytorch with no grad-优快云博客

# 初始化模型参数
w = torch.normal(0, 0.01, size=(2,1), requires_grad=True)
b = torch.zeros(1, requires_grad=True)
# 定义模型
def linear_reg(w, b, X):
    return torch.matmul(X, w) + b
# 定义损失函数
def squared_loss(y_hat, y):
    return (y - y_hat.reshape(y.shape)) ** 2 / 2
# 定义优化算法
def sgd(params, lr, batch_size):
    """
    小批量梯度下降
    :param params: 参数
    :param lr: 学习率
    """
    with torch.no_grad():
        # with torch.no_grad(): 以内的空间计算结果得 requires_grad 为 False
        for param in params:
            param -= lr * param.grad / batch_size
            param.grad.zero_()
# 训练
lr = 0.01
num_epochs = 3
net = linear_reg
loss = squared_loss

for epoch in range(num_epochs):
    for X, y in data_iter(batch_size, features, labels):
        l = loss(net(w, b, X), y)  # X和y的小批量损失
        l.sum().backward()
        sgd([w, b], lr, batch_size)  # 使用参数的梯度更新参数
    with torch.no_grad():
        train_l = loss(net(w, b, features), labels)
        print(f'epoch {
     epoch + 1}, loss {
     float(train_l.mean()):f}')

epoch 1, loss 0.292531
epoch 2, loss 0.005235
epoch 3, loss 0.000142

2. 线性回归简洁实现

参考资料：

Python 星号的作用：Python中的*（星号）和**(双星号）完全详解_python *-优快云博客

TensorDataset 和 DataLoader：PyTorch中 DataLoader 和 TensorDataset 的详细解析_tensordataset会打乱顺序吗-优快云博客

torch 中实现了更方便地读取数据的方法，只需要我们将 tensor 封装到 TensorDataset 中，再与 DataLoader 结合使用，即可实现前面 data_iter 的效果

DataLoader 的核心功能有：批量加载、打乱顺序、并行处理

true_w = torch.tensor([2, -3.4])
true_b = 4.2
features, labels = synthetic_data(true_w, true_b, 1000)


def load_array(data_arrays, batch_size, is_train=True):
    """构造一个PyTorch数据迭代器"""
    dataset = data.TensorDataset(*data_arrays)
    return data.DataLoader(dataset, batch_size, shuffle=is_train)


batch_size = 10
data_iter = load_array((features, labels), batch_size)

仍然按照步骤：定义模型 -> 初始化模型参数 -> 定义损失函数 -> 定义优化算法 -> 训练

from torch import nn

# 定义模型
net = nn.Sequential(nn.Linear(2, 1))
# 初始化模型参数
net[0].weight.data.normal_(0, 0.01)
net[0].bias.data.fill_(0)
# 定义损失函数
loss = nn.MSELoss()
# 定义优化算法
trainer = torch.optim.SGD(net.parameters(), lr=0.01)
# 训练
num_epochs = 3
for epoch in range(num_epochs):
    for X, y in data_iter:
        l = loss(net(X), y)
        trainer.zero_grad()
        l.backward()
        trainer.step()
    l = loss(net(features), labels)
    print(f'epoch {
     epoch + 1}, loss {
     l:f}')

epoch 1, loss 0.552002
epoch 2, loss 0.009066
epoch 3, loss 0.000246

3. softmax

由交叉熵损失

$\widehat{y}) = - \sum_{j=1}^{q} y_j log \hat{y_j}$

以及使用 softmax 函数时

$\hat{y} = softmax(o) = \frac{e^{o_j}}{\sum_{k=1}^q e^{o_k}}$