机器学习笔记TaskTwo

YINGZHECHENG

已于 2023-04-04 11:37:06 修改

阅读量185

点赞数

文章标签：深度学习 python pytorch

于 2023-03-23 23:56:41 首次发布

本文链接：https://blog.youkuaiyun.com/YINGZHECHENG/article/details/129742164

版权

本文介绍了线性代数的基础概念，包括标量、向量、矩阵和多维度张量的表示与运算，并通过PyTorch展示了这些操作的代码示例。此外，还探讨了矩阵的转置、对称性和降维方法。最后，文章讨论了自动求导的概念，展示如何使用PyTorch进行梯度计算，这对于深度学习中的优化至关重要。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

线性代数

标量

标量由只有一个元素的张量表示。下面的代码将实例化两个标量，并执行一些熟悉的算术运算，即加法、乘法、除法和指数。

 import torch
 
 x = torch.tensor([3.0])
 y = torch.tensor([2.0])
 x + y,x - y,x *y,x / y,x ** y

 (tensor([5.]), tensor([1.]), tensor([6.]), tensor([1.5000]), tensor([9.]))

向量

向量是有方向的标量。创建，索引。

 X = torch.arange(3)
 X,X[2]

 (tensor([0, 1, 2]), tensor(2))

访问张量的长度

 len(X)

矩阵

指定两个分量创建一个m×n矩阵，下面创建一个5×4的矩阵

 A = torch.arange(20).reshape(5,4)
 A

 tensor([[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11],
         [12, 13, 14, 15],
         [16, 17, 18, 19]])

矩阵的转置（矩阵沿主对角线翻折）

A.T

 tensor([[ 0,  4,  8, 12, 16],
         [ 1,  5,  9, 13, 17],
         [ 2,  6, 10, 14, 18],
         [ 3,  7, 11, 15, 19]])

对称矩阵 $A=A^T$

 B = torch.tensor([[1,2,3],[2,0,4],[3,4,5]])
 B,B.T

 (tensor([[1, 2, 3],
          [2, 0, 4],
          [3, 4, 5]]),
  tensor([[1, 2, 3],
          [2, 0, 4],
          [3, 4, 5]]))

 B == B.T

 tensor([[True, True, True],
         [True, True, True],
         [True, True, True]])

多维度张量

就像向量是标量的推广，矩阵是向量的推广一样，我们可以构建具有更多轴的数据结构

 X = torch.arange(24).reshape(2,3,4)
 X

 tensor([[[ 0,  1,  2,  3],
          [ 4,  5,  6,  7],
          [ 8,  9, 10, 11]],
 
         [[12, 13, 14, 15],
          [16, 17, 18, 19],
          [20, 21, 22, 23]]])

张量的运算

给定具有相同形状的任意两个张量，任何按元素二元运算的结果都将是相同形状的张量

 A = torch.arange(20,dtype=torch.float32).reshape(5,4)
 B = A.clone()
 A,A+B

 (tensor([[ 0.,  1.,  2.,  3.],
          [ 4.,  5.,  6.,  7.],
          [ 8.,  9., 10., 11.],
          [12., 13., 14., 15.],
          [16., 17., 18., 19.]]),
  tensor([[ 0.,  2.,  4.,  6.],
          [ 8., 10., 12., 14.],
          [16., 18., 20., 22.],
          [24., 26., 28., 30.],
          [32., 34., 36., 38.]]))

两个矩阵的按元素乘法称为哈达玛积（Hadamard product）（数学符号⊙）

 A * B

 tensor([[  0,   1,   4,   9],
         [ 16,  25,  36,  49],
         [ 64,  81, 100, 121],
         [144, 169, 196, 225],
         [256, 289, 324, 361]])

计算元素总和

 A,A.sum()

 (tensor([[ 0,  1,  2,  3],
          [ 4,  5,  6,  7],
          [ 8,  9, 10, 11],
          [12, 13, 14, 15],
          [16, 17, 18, 19]]),
  tensor(190))

降维

默认情况下，调用求和函数会沿所有的轴降低张量的维度，使它变为一个标量。我们还可以指定张量沿哪一个轴来通过求和降低维度。以矩阵为例，为了通过求和所有行的元素来降维（轴0），可以在调用函数时指定axis=0。由于输入矩阵沿0轴降维以生成输出向量，因此输入轴0的维数在输出形状中消失

 A_sum_axis0 = A.sum(axis=0)
 A_sum_axis0, A_sum_axis0.shape

 (tensor([40, 45, 50, 55]), torch.Size([4]))

 A_sum_axis1 = A.sum(axis=1)
 A_sum_axis1, A_sum_axis1.shape

 (tensor([ 6, 22, 38, 54, 70]), torch.Size([5]))

 A.sum(axis=[0, 1])

 tensor(190)

一个与求和相关的量是平均值（mean或average）。我们通过将总和除以元素总数来计算平均值。在代码中，我们可以调用函数来计算任意形状张量的平均值。

 A,A.mean(),A.sum()/A.numel()

 (tensor([[ 0.,  1.,  2.,  3.],
          [ 4.,  5.,  6.,  7.],
          [ 8.,  9., 10., 11.],
          [12., 13., 14., 15.],
          [16., 17., 18., 19.]]),
  tensor(9.5000),
  tensor(9.5000))

同样，计算平均值的函数也可以沿指定轴降低张量的维度

 A.mean(axis=0),A.sum(axis=0)/A.shape[0]

 (tensor([ 8.,  9., 10., 11.]), tensor([ 8.,  9., 10., 11.]))

非降维求和

计算总和或均值时保持轴数不变

 sum_A = A.sum(axis=0,keepdims=True)
 sum_A

 tensor([[40., 45., 50., 55.]])

通过广播将A除以sum_A,求得每个元素在所在行所占数值比例

 A/sum_A

 tensor([[0.0000, 0.0222, 0.0400, 0.0545],
         [0.1000, 0.1111, 0.1200, 0.1273],
         [0.2000, 0.2000, 0.2000, 0.2000],
         [0.3000, 0.2889, 0.2800, 0.2727],
         [0.4000, 0.3778, 0.3600, 0.3455]])

某个轴计算A元素的累积总和, $A_{ij} = \sum_{j=0}^nA_{ij} (i=0,1,2...)$

 A.cumsum(axis=0)

 tensor([[ 0.,  1.,  2.,  3.],
         [ 4.,  6.,  8., 10.],
         [12., 15., 18., 21.],
         [24., 28., 32., 36.],
         [40., 45., 50., 55.]])

点积

我们已经学习了按元素操作、求和及平均值。另一个最基本的操作之一是点积。给定两个向量x,y\in R^d，它们的点积（dot product） $x^Ty$ （或<x,y>）是相同位置的按元素乘积的和： $x^Ty = \sum_{i=1}^dx_iy_i$ 。

 x = torch.tensor([1,2,3,4],dtype=torch.float32)
 y = torch.tensor([2,1,4,3],dtype=torch.float32)
 x,y,torch.dot(x,y)

 (tensor([1., 2., 3., 4.]), tensor([2., 1., 4., 3.]), tensor(28.))

我们可以通过执行按元素乘法，然后进行求和来表示两个向量的点积

 torch.sum(x * y)

 tensor(28.)

矩阵和向量的积

矩阵向量积 $Ax$ 是一个长度为 $m$ 的列向量，其第i个元素是点积 $A_i^Tx$ ， $A$ 矩阵的第 $i$ 列转置后和 $x$ 行向量的点积

 A.shape,x.shape,torch.mv(A,x)

 (torch.Size([5, 4]), torch.Size([4]), tensor([ 20.,  60., 100., 140., 180.]))

矩阵乘法

我们可以将矩阵-矩阵乘法 $AB$ 看作是简单地执行m次矩阵-向量积，并将结果拼接在一起，形成一个n×m矩阵

 B = torch.ones(4,3)
 A,B,torch.mm(A,B)

 (tensor([[ 0.,  1.,  2.,  3.],
          [ 4.,  5.,  6.,  7.],
          [ 8.,  9., 10., 11.],
          [12., 13., 14., 15.],
          [16., 17., 18., 19.]]),
  tensor([[1., 1., 1.],
          [1., 1., 1.],
          [1., 1., 1.],
          [1., 1., 1.]]),
  tensor([[ 6.,  6.,  6.],
          [22., 22., 22.],
          [38., 38., 38.],
          [54., 54., 54.],
          [70., 70., 70.]]))

范数

补充：深度学习中向量有三个范数，分别是 $L_0=i$ （元素个数）； $L_1=\sum_{i=1}^n|x_i|$ ， $L_2 = \sqrt{\sum_{i=1}^n{x_i^2}}$ $F_2$ 范数是向量元素平方和的平方根,也就是向量的长度。

 u = torch.tensor([3.0,-4.0])
 torch.norm(u)

 tensor(5.)

 torch.abs(u).sum()

 tensor(7.)

矩阵的弗罗贝尼乌斯范数（Frobenius norm）（也称F范数）是矩阵元素平方和的平方根：

$||X||_F = \sqrt{\sum_{i=1}^m\sum_{i=1}^nx_{ij}^2}$

 z = torch.arange(6,dtype=torch.float32).reshape(2,3)
 z,torch.norm(z)

 (tensor([[0., 1., 2.],
          [3., 4., 5.]]),
  tensor(7.4162))

范数和目标

在深度学习中，我们经常试图解决优化问题：最大化分配给观测数据的概率; 最小化预测和真实观测之间的距离。用向量表示物品（如单词、产品或新闻文章），以便最小化相似项目之间的距离，最大化不同项目之间的距离。目标，或许是深度学习算法最重要的组成部分（除了数据），通常被表达为范数。

小结

本节主要学习了线性代数的代码实现，较为重要的是一些运算以及不同运算之间的区别，例如矩阵的点积、矩阵向量积和矩阵乘法；还有学习了两种常用的降维的方法，求和和求平均在指定方向上；还设计到范数这个概念，向量的L_0,L_1,L_2 和矩阵的F范数，以及了解了范数在深度学习中的作用，解决最优化等问题

练习

 # 1.

 A = torch.arange(6.0).reshape(2,3)
 A,A.T.T,A.T.T == A

 (tensor([[0., 1., 2.],
          [3., 4., 5.]]),
  tensor([[0., 1., 2.],
          [3., 4., 5.]]),
  tensor([[True, True, True],
          [True, True, True]]))

 # 2.
 B = 2 * torch.arange(6.0).reshape(2,3)
 B,A, A.T + B.T == (A + B).T

 (tensor([[ 0.,  2.,  4.],
          [ 6.,  8., 10.]]),
  tensor([[0., 1., 2.],
          [3., 4., 5.]]),
  tensor([[True, True],
          [True, True],
          [True, True]]))

 # 3.
 X = torch.tensor([[2,1],[0,9]])
 X,X+X.T

 (tensor([[2, 1],
          [0, 9]]),
  tensor([[ 4,  1],
          [ 1, 18]]))

对于任意一个矩阵

$A = \begin{pmatrix} a_{11} & a_{12} & ... & a_{1n}\\ a_{21} & a_{22} & ... & a_{2n}\\ ... & ... & ... & ...\\ a_{n1} & a_{n2} & ... & a_{nn} \end{pmatrix} , A^T = \begin{pmatrix} a_{11} & a_{21} & ... & a_{n1}\\ a_{12} & a_{22} & ... & a_{n2}\\ ... & ... & ... & ...\\ a_{1n} & a_{2n} & ... & a_{nn} \end{pmatrix}, A^T + A = \begin{pmatrix} a_{11}+ a_{11} & a_{21}+ a_{12} & ... & a_{n1} + a_{1n}\\ a_{12}+ a_{21} & a_{22}+ a_{22} & ... & a_{n2} + a_{2n}\\ ... & ... & ... & ...\\ a_{1n} + a_{n1} & a_{2n}+ a_{n2} & ... & a_{nn} + a_{nn} \end{pmatrix}$

是对称的

 # 4.
 Y = torch.arange(24.0).reshape(2,3,4)
 len(Y)

是的，如果是一维，对应的向量的长度，二维对应行，三维对应高。也就是shape(x1,x2,...xn)中的x1

 # 6.
 A = torch.arange(9.0).reshape(3,3)
 A,A.sum(axis = 1),A/A.sum(axis = 1)

 (tensor([[0., 1., 2.],
          [3., 4., 5.],
          [6., 7., 8.]]),
  tensor([ 3., 12., 21.]),
  tensor([[0.0000, 0.0833, 0.0952],
          [1.0000, 0.3333, 0.2381],
          [2.0000, 0.5833, 0.3810]]))

正常来说应该是求元素在改行的数值占比，A.sum(axis=1)应该是列向量但这里不是，使用reshape就可以得到想要的，但是不可以使用转置.T

 A,A.sum(axis = 1).reshape(3,1),A/A.sum(axis = 1).reshape(3,1)

 (tensor([[0., 1., 2.],
          [3., 4., 5.],
          [6., 7., 8.]]),
  tensor([[ 3.],
          [12.],
          [21.]]),
  tensor([[0.0000, 0.3333, 0.6667],
          [0.2500, 0.3333, 0.4167],
          [0.2857, 0.3333, 0.3810]]))

7.猜测一下，axis=0->(3,4),axis=1->(2,4),axis=2->(2,3)，下面代码验证一下。

 B = torch.arange(24.0).reshape(2,3,4)
 B.sum(axis=0).shape,B.sum(axis=1).shape,B.sum(axis=2).shape
 # 猜测正确

 (torch.Size([3, 4]), torch.Size([2, 4]), torch.Size([2, 3]))

 import math
 B,torch.norm(B),math.sqrt((B*B).sum(axis=[0,1,2]))

 (tensor([[[ 0.,  1.,  2.,  3.],
           [ 4.,  5.,  6.,  7.],
           [ 8.,  9., 10., 11.]],
  
          [[12., 13., 14., 15.],
           [16., 17., 18., 19.],
           [20., 21., 22., 23.]]]),
  tensor(65.7571),
  65.75712889109438)

计算得到的是每个元素平方和再开根号，也就是前面讲到的弗罗贝尼乌斯范数F-范数

自动求导

微分

微分公式： $dy=f'(x)dx$ 微分的定义：是指函数在某一点处（趋近于无穷小）的变化量，是一种变化的量。推导：

$y = f(x) \\ \Delta y = f(x + \Delta x) - f(x)\\ = f'(x)\Delta + f(x + \Delta x) - f'(x + \Delta x)\\$
当 $\Delta x$ 无限小时，记 $f(x + \Delta x)-f'(x + \Delta x) = o(x)$ ，x的高阶无穷小,可以忽略不计

$\Delta y = f'(x)\Delta x + o(x)$

$=f'(x)\Delta x$

当 $\Delta y$ 趋于无限小时, $dy = \Delta y$ ，因此记作 $dy=f'(x)dx$

梯度

概念：梯度的本意是一个向量（矢量），表示某一函数在该点处的方向导数沿着该方向取得最大值，即函数在该点处沿着该方向（此梯度的方向）变化最快，变化率最大（为该梯度的模）。公式太复杂，也不太会，转载一下吧博客园-方向倒数和梯度

自动求导

假设我们对函数 y = 2x^Tx 关于列向量x求导

 import torch
 x = torch.arange(4.0)
 x

 tensor([0., 1., 2., 3.])

在计算y关于x的梯度前，找一个地方来存储梯度

 x.requires_grad_(True)  #等价于x = torch.arange(4.0,requires_grad=True)
 x.grad

现在计算y

 y = 2*torch.dot(x,x)
 y

 tensor(28., grad_fn=<MulBackward0>)

现在通过反向传播函数来自动计算y关于x每个分量的梯度

 y.backward()
 x.grad

 tensor([ 0.,  4.,  8., 12.])

 x.grad == 4*x

 tensor([True, True, True, True])

现在让我计算x的另一个函数

 # 默认情况下，PyTorch会累积梯度，我们需要清理之前的值
 x.grad.zero_()
 y = x.sum()
 y.backward()
 x.grad

 tensor([1., 1., 1., 1.])

深度学习中，我们的目的不是计算微分矩阵，而是批量中每个样本单独计算的偏导数之和

 x.grad.zero_()
 y = x*x
 y.sum().backward()
 x.grad

 tensor([0., 2., 4., 6.])

将某些计算移动到记录的计算图之外

 x.grad.zero_()
 y = x * x
 u = y.detach()
 z = u * x
 
 z.sum().backward()
 x.grad == u

 tensor([True, True, True, True])

 x.grad.zero_()
 y.sum().backward()
 x.grad == 2 * x

 tensor([True, True, True, True])

即使构建函数的计算图需要通过Python控制流（例如，条件、循环或任意函数调用），我们仍然可以计算得到的变量的梯度

 def f(a):
     b = a * 2
     while b.norm() < 1000:
         b = b * 2
     if b.sum() > 0:
         c = b
     else:
         c = 100 * b
     return c
 
 a = torch.randn(size=(), requires_grad=True)
 d = f(a)
 d.backward()
 
 a.grad ,d / a

 (tensor(1024.), tensor(1024., grad_fn=<DivBackward0>))

求 y = 2x^3 关于 x的导数

 v = torch.arange(5.0)
 v

 tensor([0., 1., 2., 3., 4.])

 v.requires_grad_(True)

 tensor([0., 1., 2., 3., 4.], requires_grad=True)

 v.grad.zero_()
 w = v * v
 r = w.detach()
 u = 2 * r * v
 u

 tensor([  0.,   2.,  16.,  54., 128.], grad_fn=<MulBackward0>)

 u.sum().backward()

 v.grad == 2 * r

 tensor([True, True, True, True, True])

小结

深度学习框架可以自动计算导数：我们首先将梯度附加到想要对其计算偏导数的变量上，然后记录目标值的计算，执行它的反向传播函数，并访问得到的梯度

练习

 # 4.
 def f(a):
     if a >= 0:
         return a * a
     else:
         return -1 * (a * a)
 a = torch.randn(size=(),requires_grad = True)    
 b = f(a)
 b.backward()
 
 a.grad == 2*(a//abs(a))*a

 tensor(True)

给定一个方程组，y = \begin{cases}a^2,a\geq0\\-a^2,a<0 \end{cases}。当a>=0时，梯度为2a；当a<0时，梯度为-2a

 # 5.
 import math
 x = torch.arange(-10.0,10.0,0.5,requires_grad=True)
 y = torch.sin(x)
 y.sum().backward()
 y

 tensor([ 0.5440,  0.0752, -0.4121, -0.7985, -0.9894, -0.9380, -0.6570, -0.2151,
          0.2794,  0.7055,  0.9589,  0.9775,  0.7568,  0.3508, -0.1411, -0.5985,
         -0.9093, -0.9975, -0.8415, -0.4794,  0.0000,  0.4794,  0.8415,  0.9975,
          0.9093,  0.5985,  0.1411, -0.3508, -0.7568, -0.9775, -0.9589, -0.7055,
         -0.2794,  0.2151,  0.6570,  0.9380,  0.9894,  0.7985,  0.4121, -0.0752],
        grad_fn=<SinBackward0>)

 x.grad

 tensor([-0.8391, -0.9972, -0.9111, -0.6020, -0.1455,  0.3466,  0.7539,  0.9766,
          0.9602,  0.7087,  0.2837, -0.2108, -0.6536, -0.9365, -0.9900, -0.8011,
         -0.4161,  0.0707,  0.5403,  0.8776,  1.0000,  0.8776,  0.5403,  0.0707,
         -0.4161, -0.8011, -0.9900, -0.9365, -0.6536, -0.2108,  0.2837,  0.7087,
          0.9602,  0.9766,  0.7539,  0.3466, -0.1455, -0.6020, -0.9111, -0.9972])

 from matplotlib import pyplot as plt
 import numpy
 x_np = []
 y_np = []
 x_grad_np = []
 for i in range(40):
     x_np.append(x[i].detach().numpy())
     y_np.append(y[i].detach().numpy())
     x_grad_np.append(x.grad[i].detach().numpy())

 plt.plot(x_np,y_np,'r',label='y=f(x)')
 plt.plot(x_np,x_grad_np,'b',label='y=df(x)/dx')
 plt.legend()
 plt.show()

)