Learn_PyTorch_1_张量与自动求导

最新推荐文章于 2025-04-25 14:34:58 发布

原创最新推荐文章于 2025-04-25 14:34:58 发布 · 609 阅读

1 ·

CC 4.0 BY-SA版权

机器学习专栏收录该内容

10 篇文章

订阅专栏

本文详细介绍了PyTorch中张量的基本概念、数据类型、创建方法及常见操作，包括拼接、选择、重塑等。同时，深入探讨了PyTorch的自动求导机制，展示了如何通过自动微分进行神经网络训练。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Pytorch入门总结1

1 张量
2 自动求导

1 张量

张量的英文是Tensor，它是PyTorch里面基础的运算单位,与Numpy的ndarray相同都表示的是一个多维的矩阵。与ndarray的最大区别就是，PyTorch的Tensor可以在 GPU上运行，而 numpy 的 ndarray 只能在 CPU 上运行，在GPU上运行大大加快了运算速度。

1.1 数据类型

Torch定义了八种CPU张量类型和八种GPU张量类型：

数据类型	dtype	CPU张量	GPU张量
32位浮点	`torch.float32/float`	`torch.FloatTensor`	`torch.cuda.FloatTensor`
64位浮点	`torch.float64/double`	`torch.DoubleTensor`	`torch.cuda.DoubleTensor`
16位浮点	`torch.float16/half`	`torch.HalfTensor`	`torch.cuda.HalfTensor`
8位整型（无符号）	`torch.uint8`	`torch.ByteTensor`	`torch.cuda.ByteTensor`
8位整型（有符号）	`torch.int8`	`torch.CharTensor`	`torch.cuda.CharTensor`
16位整型（有符号）	`torch.int16`	`torch.ShortTensor`	`torch.cuda.ShortTensor`
32位整型（有符号）	`torch.int32`	`torch.IntTensor`	`torch.cuda.IntTensor`
32位整型（有符号）	`torch.int64`	`torch.LongTensor`	`torch.cuda.LongTensor`

其中，torch.Tensor是默认张量（torch.FloatTensor）的别名。

1.2 创建张量

1. torch.tensor(data, dtype = None, device = None, requires_grad = False)
该函数是tensor的构造函数，可以从Python的list或array构造张量：

import torch
a = torch.sensor([1., -1.], [1., -1.]) # from list
b = torch.sensor(np.array([[1, 2, 3], [4, 5, 6]])) # from np.array
print(a, b, sep = '\n')

tensor([[ 1., -1.],
        [ 1., -1.]])
tensor([[1, 2, 3],
        [3, 4, 5]], dtype=torch.int32)

2. torch.as_sensor(data, dtype = None, device = Node)
将数据转换为torch.Tensor，如果数据已经是一个具有相同dtype和device的Tensor，则不进行复制，相当于浅拷贝。否则会返回原数据的一个副本：

>>> a = np.array([1, 2, 3])
>>> t = torch.as_tensor(a)
>>> t
tensor([1, 2, 3], dtype=torch.int32)
>>> t[0] = -1
>>> a
array([-1,  2,  3])

>>> a = np.array([1, 2, 3])
>>> t = torch.as_tensor(a, device = torch.device('cuda'))
>>> t[0] = -1
>>> a
array([1, 2, 3])

3. torch.from_numpy(ndarray)
从numpy.ndarray创建Tensor，二者共享内存，在Tensor上的修改会反映到ndarray上，且Tensor无法resize：

>>> a = np.array([1, 2, 3])
>>> t = torch.from_numpy(a)
>>> t
tensor([1, 2, 3], dtype=torch.int32)
>>> t[0] = -1
>>> a
array([-1,  2,  3])

4. torch.zeros(*size, out = None, dtype = None, layout = torch.strided, device = None, requires_grad = False)
返回一个用标量0填充的张量，形状由size指定，和numpy.zeros不同的是，这里的size不需要是元组：

>>> a = torch.zeros(2, 3)
>>> a
tensor([[0., 0., 0.],
        [0., 0., 0.]])
>>> b = np.zeros((2, 3))
>>> b
array([[0., 0., 0.],
       [0., 0., 0.]])

注：用法类似的有torch.ones()、torch.empty()和torch.full()

5. torch.zeros_like(input, dtype = None, layout = None, device = None, requires_grad = False)
返回一个由标量0填充的张量，形状和input相同：

>>> input = torch.empty(2, 3)
>>> torch.zeros_like(input)
tensor([[0., 0., 0.],
        [0., 0., 0.]])

注：用法类似的有torch.ones_like()、torch.empty_like()和torch.full_like()

6. torch.arange(start = 0, end, step = 1, out = None, dtype = None, layout = torch.strided, device = None, requires_grad = False)
返回一个长度为 $\displaystyle \lfloor\frac{end-start}{step}\rfloor$ 的一维张量，其中的值由区间 $[s t a r t, e n d)$ 插值得到：

>>> torch.arange(5)
tensor([0, 1, 2, 3, 4])
>>> torch.arange(1, 5)
tensor([1, 2, 3, 4])
>>> torch.arange(1, 2.5, 0.5)
tensor([1.0000, 1.5000, 2.0000])

7. torch.range(start = 0, end, step = 1, out = None, dtype = None, layout = torch.strided, divice = None, requires_grad = False)
返回一个长度为 $\displaystyle \lfloor\frac{end-start}{step}\rfloor+1$ 的一维张量：

>>> torch.range(1, 5)
__main__:1: UserWarning: torch.range is deprecated in favor of torch.arange and will be removed in 0.5. Note that arange generates values in [start; end), not [start; end].
tensor([1., 2., 3., 4., 5.])

注：推荐使用torch.arange()而不是torch.range()。

8. torch.linspace(start, end, steps = 100, out = None, dtype = None, layout = torch.strided, device = None, required_grad = False)
返回一个均分分布的一维张量，起点为start，终点为end（包含终点），点数为steps：

>>> a = torch.linspace(1, 100, 10)
>>> a
tensor([  1.,  12.,  23.,  34.,  45.,  56.,  67.,  78.,  89., 100.])

9. torch.logspace(start, end, steps = 100, base = 10.0, out = None, dtype = None, layout = torch.strided, device = None, requiers_grad = False)
返回一个对数均匀分布的一维张量：

>>> torch.logspace(start = -10, end = 10, steps = 5)
tensor([1.0000e-10, 1.0000e-05, 1.0000e+00, 1.0000e+05, 1.0000e+10])

10. torch.eye(n, m = None, out = None, dtype = None, layout = torch.strided, device = None, requires_grad = False)
返回一个单位阵：

>>> torch.eye(2, 3)
tensor([[1., 0., 0.],
        [0., 1., 0.]])
>>> torch.eye(3)
tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])

1.3 张量操作

1. torch.cat(tensors, dim = 0, out = None)
张量拼接，除了要拼接的维度，其他维度必须相同：

>>> x = torch.randn(2, 3)
>>> torch.cat((x, x, x), 0)
tensor([[ 1.0102, -0.4265,  0.1250],
        [-0.4309, -0.2024, -2.1630],
        [ 1.0102, -0.4265,  0.1250],
        [-0.4309, -0.2024, -2.1630],
        [ 1.0102, -0.4265,  0.1250],
        [-0.4309, -0.2024, -2.1630]])
>>> torch.cat((x, x, x), 1)
tensor([[ 1.0102, -0.4265,  0.1250,  1.0102, -0.4265,  0.1250,  1.0102, -0.4265,
          0.1250],
        [-0.4309, -0.2024, -2.1630, -0.4309, -0.2024, -2.1630, -0.4309, -0.2024,
         -2.1630]])

2. torch.mask_select(input, mask, out = None)
根据掩模选择张量中的数据：

>>> x = torch.randn(3, 4)
>>> x
tensor([[-0.3417, -0.1746,  1.6381, -0.5856],
        [ 0.0387,  0.6801, -0.7249, -1.8611],
        [-0.4696,  0.1779,  1.7757,  0.3188]])
>>> mask = x.ge(0.5)
>>> mask
tensor([[0, 0, 1, 0],
        [0, 1, 0, 0],
        [0, 0, 1, 0]], dtype=torch.uint8)
>>> torch.masked_select(x, mask)
tensor([1.6381, 0.6801, 1.7757])

3. torch.narrow(input, dimension, start, length)
根据维度选择数据形成张量，得到的张量与源张量共享内存：

>>> x = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
>>> x
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
>>> torch.narrow(x, 0, 0, 2)
tensor([[1, 2, 3],
        [4, 5, 6]])
>>> torch.narrow(x, 1, 1, 2)
tensor([[2, 3],
        [5, 6],
        [8, 9]])

4. torch.nonzero(input, out = None)
返回由非零值下标构成的张量：

>>> torch.nonzero(torch.tensor([1, 1, 1, 0, 1]))
tensor([[0],
        [1],
        [2],
        [4]])
>>> torch.nonzero(torch.tensor([[0.6, 0.0, 0.0, 0.0],
...                             [0.0, 0.4, 0.0, 0.0],
...                             [0.0, 0.0, 1.2, 0.0],
...                             [0.0, 0.0, 0.0, -0.4]]))
tensor([[0, 0],
        [1, 1],
        [2, 2],
        [3, 3]])

4. torch.reshape(input, shape)
reshape

>>> a = torch.arange(4)
>>> torch.reshape(a, (2, 2))
tensor([[0, 1],
        [2, 3]])
>>> b = torch.tensor([[0, 1], [2, 3]])
>>> torch.reshape(b, (-1,))
tensor([0, 1, 2, 3])

5. torch.split(tensor, split_size_or_selections, dim = 0)
当split_size_or_selections为整数时，则将且分为等长的块；当split_size_or_selections为列表时，则根据列表的值进行切分：

>>> a = torch.arange(10)
>>> torch.split(a, 3)
(tensor([0, 1, 2]), tensor([3, 4, 5]), tensor([6, 7, 8]), tensor([9])) # 最后一段可能长度不够
>>> torch.split(a, [1, 2, 3, 4])
(tensor([0]), tensor([1, 2]), tensor([3, 4, 5]), tensor([6, 7, 8, 9]))

6. torch.squeeze(input, dim = None, out = None)
将长度为1的维度删掉。如果没有设置dim的值，则将所有的维度1删掉，否则删除指定的维度1。

>>> x = torch.zeros(2, 1, 2, 1, 2)
>>> x.size()
torch.Size([2, 1, 2, 1, 2])
>>> y = torch.squeeze(x)
>>> y.size()
torch.Size([2, 2, 2])
>>> y = torch.squeeze(x, 0) # 维度0大小不为1，无法删除
>>> y.size()
torch.Size([2, 1, 2, 1, 2])
>>> y = torch.squeeze(x, 1) # 维度1大小为1，删除
>>> y.size()
torch.Size([2, 2, 1, 2])

注： torch.unsqueeze()的效果和torch.squeeze()的效果相反。

7. torch.stack(seq, dim = 0, out = None)
对一系列size相同的张量拼接起来，维度增加一维用于拼接。

>>> x = torch.randn(3, 2)
>>> y = torch.randn(3, 2)
>>> z = torch.randn(3, 2)
>>> torch.stack((x, y, z))
tensor([[[-1.9550, -0.0732],
         [-0.7633,  0.9965],
         [ 1.2510, -0.0338]],

        [[-1.0724,  0.4955],
         [ 0.4289,  0.9300],
         [-1.0098,  0.0298]],

        [[ 2.3160, -0.8827],
         [-1.8079,  0.9960],
         [ 0.8421,  2.0886]]])

8. torch.t(input)
将维度<=2的张量进行转置，返回的张量与源张量共享内存：

>>> x = torch.randn(3, 2)
>>> y = torch.t(x)
>>> y.size()
torch.Size([2, 3])

9. torch.transpose(input, dim0, dim1)
对维度dim0和维度dim1进行转置：

>>> x = torch.empty(3, 2, 1, 4)
>>> x.size()
torch.Size([3, 2, 1, 4])
>>> y = torch.transpose(x, 1, 2)
>>> y.size()
torch.Size([3, 1, 2, 4])

10. torch.take(input, indices)
根据一系列索引选择张量中的值构成新的张量，新的张量维度和索引的维度相同：

>>> x = torch.randn(3, 2)
>>> x
tensor([[ 1.1186, -0.5884],
        [ 0.0556, -0.2917],
        [-0.7200,  0.1229]])
>>> torch.take(x, torch.tensor([[0, 1], [4, 5]]))
tensor([[ 1.1186, -0.5884],
        [-0.7200,  0.1229]])

11. torch.unbind(tensor, dim = 0)
根据给定维度对张量进行拆分：

>>> torch.unbind(torch.tensor([[1, 2, 3],
...                            [2, 3, 4],
...                            [3, 4, 5]]))
(tensor([1, 2, 3]), tensor([2, 3, 4]), tensor([3, 4, 5]))

12. torch.where(condition, x, y)
根据condition的值选择x或者y中的值：

>>> x = torch.randn(3, 2)
>>> y = torch.ones(3, 2)
>>> x
tensor([[-0.7097, -0.0090],
        [-0.4766, -0.1324],
        [ 1.1632, -1.7332]])
>>> torch.where(x > 0, x, y)
tensor([[1.0000, 1.0000],
        [1.0000, 1.0000],
        [1.1632, 1.0000]])

1.4 随机数

1. torch.manual_seed(seed)
设置随机数种子

2. torch.bernoulli(input, *, generator = None, out = None)
生成0-1分布的随机数，input为伯努利分布的概率：

>>> a = torch.empty(3, 3).uniform_(0, 1)
>>> a
tensor([[0.6334, 0.8580, 0.1572],
        [0.7853, 0.1458, 0.4177],
        [0.3047, 0.0382, 0.5805]])
>>> torch.bernoulli(a)
tensor([[1., 1., 0.],
        [1., 0., 0.],
        [0., 0., 0.]])
>>> a = torch.ones(3, 3)
>>> torch.bernoulli(a)
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])

3. torch.multinomial(input, num_samples, replacement = False, out = None)
多项式分布随机数，返回一个每行包含num_samples个索引的多项式分布随机数：

>>> weights = torch.tensor([0, 10, 3, 0], dtype = torch.float)
>>> torch.multinomial(weights, 2)
tensor([2, 1])
>>> torch.multinomial(weights, 4)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: invalid argument 2: invalid multinomial distribution (with replacement=False, not enough non-negative category to sample) at ..\aten\src\TH/generic/THTensorRandom.cpp:347
>>> torch.multinomial(weights, 4, replacement = True)
tensor([1, 1, 1, 1])

4. torch.nomal(mean, std, out = None)
正态分布随机数：

>>> torch.normal(mean = torch.arange(1., 11.), std = torch.arange(1, 0, -0.1))
tensor([-0.2438,  0.2975,  5.2781,  3.6607,  4.8195,  5.7053,  6.9796,  7.8922,
         8.7863, 10.1156])
>>>

5. torch.rand(*sizes, out = None, dtype = None， layout = torch.strided, device = None, requires_grad = False)
均匀分布[0, 1)：

>>> torch.rand(4)
tensor([0.2407, 0.6577, 0.7282, 0.9323])
>>> torch.rand(2, 3)
tensor([[0.9123, 0.3024, 0.3010],
        [0.0817, 0.8514, 0.4296]])

注： torch.rand_like()

6. torch.randint(low = 0, high, size, out = None, dtype = None, layout = torch.strided, device = None, requires_grad = False)
生成[low, high)范围的整数随机值：

>>> torch.randint(3, 5, (3, ))
tensor([3, 4, 4])
>>> torch.randint(10, (2, 2))
tensor([[3, 6],
        [2, 4]])
>>> torch.randint(3, 10, (2, 2))
tensor([[3, 8],
        [4, 9]])

注： torch.randint_like()

7. torch.randn(*sizes, out = None, dtype = None, layout = torch.strided, device = None, requires_grad = False)
标准正太N(0, 1)：

>>> torch.randn(4)
tensor([-1.4879, -0.5092, -2.2113,  0.4307])
>>> torch.randn(2, 3)
tensor([[-0.8514, -0.4009,  0.1524],
        [ 0.9868, -0.1758,  1.6272]])

注： torch.randn_like()

8. torch.randperm(n, out = None, dtype = torch.int64, layout = torch.strided, device = None, requiers_grad = False)
生成0 - n-1的随机排列：

>>> torch.randperm(10)
tensor([3, 7, 5, 8, 6, 4, 9, 0, 2, 1])

其他
有关Tensor类的其他函数可可参考tensor-creation-ops。

2 自动求导

PyTorch的自动求导包autograd包为张量的所有操作提供自动微分功能，意味着我们在进行神经网络的反向传播时不必考虑求导过程，该过程可以利用.backward()方法自动完成。
下面时一个最简单的例子，展示了对直线y = 3x + 4求斜率的过程：

>>> x = torch.tensor(1., requires_grad = True)
>>> y = 3 * x + 4
>>> y.backward()
>>> x.grad
tensor(3.)

上例首先创建一个标量x，并令y = 3x + 4，然后利用y.backward()自动计算得到y在x = 1处的微分，此时x.grad保存的值就是 $\frac{\partial{y}}{\partial{x}}|_{x = 1}=3$ 。

下面是一个更加复杂的例子：

>>> a = torch.tensor([3., 4.], requires_grad = True)
>>> d = torch.norm(a)
>>> d
tensor(5., grad_fn=<NormBackward0>)
>>> d.backward()
>>> a.grad
tensor([0.6000, 0.8000])

上过程为求点 $a = (x, y)$ 的模，即 $\sqrt{x^2+y^2}$ ，可以计算得到偏导数： $\frac{\partial{d}}{\partial{x}}|_{x = 3} = \frac{2x}{2\sqrt{x^2 + y^2}}|_{x = 3} = 0.6\\ \frac{\partial{d}}{\partial{y}}|_{y = 4} = \frac{2y}{2\sqrt{x^2 + y^2}}|_{y = 4} = 0.8$

可以看到，上面的例子中输出都是标量。官方的autograd的介绍如下：torch.autograd提供对任意标量值函数进行自动求导的类和函数。

实际上，torch.autograd也提供了输出为矢量的自动求导方案：

当输入为矢量，输出为标量时，即 $f(\textbf{x}) = f(x_1, x_2, ..., x_n)$ ，则偏导数为：
$\nabla = (\frac{\partial{y}}{\partial x_1}, \frac{\partial{y}}{\partial x_2},..., \frac{\partial{y}}{\partial x_n})$

当输入和输出均为矢量时，即 $\textbf{y} = (y_1, y_2, ..., y_m) = f(\textbf{x}) = f(x_1, x_2, ..., x_n)$ ，则 $\textbf{y}$ 关于 $\textbf{x}$ 的微分为雅克比矩阵： $\left\{ \begin{matrix} \frac{\partial{y_1}}{\partial{x_1}} & \ldots & \frac{\partial{y_1}}{\partial{x_n}} \\ \vdots & \ddots & \vdots \\ \frac{\partial{y_m}}{\partial{x_1}} & \ldots & \frac{\partial{y_m}}{\partial{x_n}} \end{matrix} \right\}$

我们需要给定一个矢量 $(v_1，v_2，\ldots，v_m)^T$ ，将其传递到.backward()函数中，则torch.auto_grad就是计算的下式的值： $J^T·v = \left\{ \begin{matrix} \frac{\partial{y_1}}{\partial{x_1}} & \ldots & \frac{\partial{y_m}}{\partial{x_1}} \\ \vdots & \ddots & \vdots \\ \frac{\partial{y_1}}{\partial{x_n}} & \ldots & \frac{\partial{y_m}}{\partial{x_n}} \end{matrix} \right\} \left\{ \begin{matrix} v_1 \\ \vdots \\ v_m \end{matrix} \right\} = \left\{ \begin{matrix} \frac{\partial{y_1}}{\partial{x_1}}v_1+\ldots+ \frac{\partial{y_m}}{\partial{x_1}}v_m\\ \vdots \\ \frac{\partial{y_1}}{\partial{x_n}}v_1+\ldots+ \frac{\partial{y_m}}{\partial{x_n}}v_m \end{matrix} \right\}$
注：矢量 $v$ 的维度和输出的维度 $y$ 的维度必须一致。

下面是一个简单的例子：

>>> x = torch.randn(3, requires_grad = True)
>>> x
tensor([-0.2983,  0.3566, -0.2247], requires_grad=True)
>>> y = x * 2
>>> y
tensor([-0.5967,  0.7132, -0.4495], grad_fn=<MulBackward0>)
>>> y.backward(torch.tensor([1.0, 1.0, 1.0]))
>>> x.grad
tensor([2., 2., 2.])

其中， $\textbf{y} = (y_1, y_2, y_3) ，\textbf{x} = (x_1, x_2, x_3)$ ，雅克比矩阵为： $\left\{ \begin{matrix} \frac{\partial{y_1}}{\partial{x_1}} & \frac{\partial{y_1}}{\partial{x_2}} & \frac{\partial{y_1}}{\partial{x_3}} \\ \frac{\partial{y_2}}{\partial{x_1}} & \frac{\partial{y_2}}{\partial{x_2}} & \frac{\partial{y_2}}{\partial{x_3}} \\ \frac{\partial{y_3}}{\partial{x_1}} & \frac{\partial{y_3}}{\partial{x_2}} & \frac{\partial{y_3}}{\partial{x_3}} \\ \end{matrix} \right\} = \left\{ \begin{matrix} 2 & 0 & 0 \\ 0 & 2 & 0\\ 0& 0 & 2\\ \end{matrix} \right\}$
由于 $v = (1.0, 1.0, 1.0)^T$ ，因此 $J^Tv = \left\{ \begin{matrix} 2 & 0 & 0 \\ 0 & 2 & 0\\ 0& 0 & 2\\ \end{matrix} \right\} \left\{ \begin{matrix} 1\\ 1\\ 1\\ \end{matrix} \right\}= \left\{ \begin{matrix} 2\\ 2\\ 2\\ \end{matrix} \right\}$
即程序输出的结果。

最后，如果暂时不需要对张量进行自动求导，则可以将其包含在torch.no_grad()代码块中：

>>> print(x.requires_grad)
True
>>> print((x**2).requires_grad)
True
>>> with torch.no_grad():
...     print((x**2).requires_grad)
...
False

注：torch._no_grad()经常用于测试代码中。