Hugging Face 的 Transformers 库快速入门（三）必要的 Pytorch 知识

最新推荐文章于 2025-05-26 19:21:45 发布

liu_chengwei

最新推荐文章于 2025-05-26 19:21:45 发布

阅读量2.3k

点赞数

CC 4.0 BY-SA版权

文章标签： pytorch 人工智能 python nlp

本文链接：https://blog.youkuaiyun.com/liu_chengwei/article/details/127432916

注：本系列教程仅供学习使用, 由原作者授权, 均转载自小昇的博客。

前言

在上一篇《模型与分词器》中，我们介绍了 Model 类和 Tokenizers 类，尤其是如何运用分词器对文本进行预处理。

Transformers 库建立在 Pytorch 框架之上（Tensorflow 的版本功能并不完善），虽然官方宣称使用 Transformers 库并不需要掌握 Pytorch 知识，但是实际上我们还是需要通过 Pytorch 的 DataLoader 类来加载数据、使用 Pytorch 的优化器对模型参数进行调整等等。

因此，本文将介绍 Pytorch 的一些基础概念以及后续可能会使用到的类，让大家可以快速上手使用 Transformers 库建立模型。

Pytorch 基础

Pytorch 由 Facebook 人工智能研究院于 2017 年推出，具有强大的 GPU 加速张量计算功能，并且能够自动进行微分计算，从而可以使用基于梯度的方法对模型参数进行优化。截至 2022 年 8 月，PyTorch 已经和 Linux 内核、Kubernetes 等并列成为世界上增长最快的 5 个开源社区之一。现在在 NeurIPS、ICML 等等机器学习顶会中，有超过 80% 研究人员用的都是 PyTorch。

为了确保商业化和技术治理之间的相互独立，2022 年 9 月 12 日 PyTorch 官方宣布正式加入 Linux 基金会，PyTorch 基金会董事会包括 Meta、AMD、亚马逊、谷歌、微软、Nvidia 等公司。

张量

张量 (Tensor) 是深度学习的基础，例如常见的 0 维张量称为标量 (scalar)、1 维张量称为向量 (vector)、2 维张量称为矩阵 (matrix)。Pytorch 本质上就是一个基于张量的数学计算工具包，它提供了多种方式来创建张量：

>>> import torch
>>> torch.empty(2, 3) # empty tensor (uninitialized), shape (2,3)
tensor([[2.7508e+23, 4.3546e+27, 7.5571e+31],
        [2.0283e-19, 3.0981e+32, 1.8496e+20]])
>>> torch.rand(2, 3) # random tensor, each value taken from [0,1)
tensor([[0.8892, 0.2503, 0.2827],
        [0.9474, 0.5373, 0.4672]])
>>> torch.randn(2, 3) # random tensor, each value taken from standard normal distribution
tensor([[-0.4541, -1.1986,  0.1952],
        [ 0.9518,  1.3268, -0.4778]])
>>> torch.zeros(2, 3, dtype=torch.long) # long integer zero tensor
tensor([[0, 0, 0],
        [0, 0, 0]])
>>> torch.zeros(2, 3, dtype=torch.double) # double float zero tensor
tensor([[0., 0., 0.],
        [0., 0., 0.]], dtype=torch.float64)
>>> torch.arange(10)
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

也可以通过 torch.tensor() 或者 torch.from_numpy() 基于已有的数组或 Numpy 数组创建张量：

>>> array = [[1.0, 3.8, 2.1], [8.6, 4.0, 2.4]]
>>> torch.tensor(array)
tensor([[1.0000, 3.8000, 2.1000],
        [8.6000, 4.0000, 2.4000]])
>>> import numpy as np
>>> array = np.array([[1.0, 3.8, 2.1], [8.6, 4.0, 2.4]])
>>> torch.from_numpy(array)
tensor([[1.0000, 3.8000, 2.1000],
        [8.6000, 4.0000, 2.4000]], dtype=torch.float64)

注意： 上面这些方式创建的张量会存储在内存中并使用 CPU 进行计算，如果想要调用 GPU 计算，需要直接在 GPU 中创建张量或者将张量送入到 GPU 中

>>> torch.rand(2, 3).cuda()
tensor([[0.0405, 0.1489, 0.8197],
        [0.9589, 0.0379, 0.5734]], device='cuda:0')
>>> torch.rand(2, 3, device="cuda")
tensor([[0.0405, 0.1489, 0.8197],
        [0.9589, 0.0379, 0.5734]], device='cuda:0')
>>> torch.rand(2, 3).to("cuda")
tensor([[0.9474, 0.7882, 0.3053],
        [0.6759, 0.1196, 0.7484]], device='cuda:0')

在后续章节中，我们经常会将编码后的文本张量通过 to(device) 送入到指定的 GPU 或 CPU 中。

张量计算

张量的加减乘除是按元素进行计算的，例如：

>>> x = torch.tensor([1, 2, 3], dtype=torch.double)
>>> y = torch.tensor([4, 5, 6], dtype=torch.double)
>>> print(x + y)
tensor([5., 7., 9.], dtype=torch.float64)
>>> print(x - y)
tensor([-3., -3., -3.], dtype=torch.float64)
>>> print(x * y)
tensor([ 4., 10., 18.], dtype=torch.float64)
>>> print(x / y)
tensor([0.2500, 0.4000, 0.5000], dtype=torch.float64)

Pytorch 还提供了许多常用的计算函数，如 torch.dot() 计算向量点积、torch.mm() 计算矩阵相乘、三角函数和各种数学函数等：

>>> x.dot(y)
tensor(32., dtype=torch.float64)
>>> x.sin()
tensor([0.8415, 0.9093, 0.1411], dtype=torch.float64)
>>> x.exp()
tensor([ 2.7183,  7.3891, 20.0855], dtype=torch.float64)

除了数学运算，Pytorch 还提供了多种张量操作函数，如聚合 (aggregation)、拼接 (concatenation)、比较、随机采样、序列化等，详细使用方法可以参见 Pytorch 官方文档。

对张量进行聚合（如求平均、求和、最大值和最小值等）或拼接操作时，可以指定进行操作的维度 (dim)。例如，计算张量的平均值，在默认情况下会计算所有元素的平均值：

>>> x = torch.tensor([[1, 2, 3], [4, 5, 6]], dtype=torch.double)
>>> x.mean()
tensor(3.5000, dtype=torch.float64)

更常见的情况是需要计算某一行或某一列的平均值，此时就需要设定计算的维度，例如分别对第 0 维和第 1 维计算平均值：

>>> x.mean(dim=0)
tensor([2.5000, 3.5000, 4.5000], dtype=torch.float64)
>>> x.mean(dim=1)
tensor([2., 5.], dtype=torch.float64)

注意，上面的计算自动去除了多余的维度，因此结果从矩阵变成了向量，如果要保持维度不变，可以设置 keepdim=True：

>>> x.mean(dim=0, keepdim=True)
tensor([[2.5000, 3.5000, 4.5000]], dtype=torch.float64)
>>> x.mean(dim=1, keepdim=True)
tensor([[2.],
        [5.]], dtype=torch.float64)

拼接 torch.cat 操作类似，通过指定拼接维度，可以获得不同的拼接结果：

>>> x = torch.tensor([[1, 2, 3], [ 4,  5,  6]], dtype=torch.double)
>>> y = torch.tensor([[7, 8, 9], [10, 11, 12]], dtype=torch.double)
>>> torch.cat((x, y), dim=0)
tensor([[ 1.,  2.,  3.],
        [ 4.,  5.,  6.],
        [ 7.,  8.,  9.],
        [10., 11., 12.]], dtype=torch.float64)
>>> torch.cat((x, y), dim=1)
tensor([[ 1.,  2.,  3.,  7.,  8.,  9.],
        [ 4.,  5.,  6., 10., 11., 12.]], dtype=torch.float64)

组合使用这些操作就可以写出复杂的数学计算表达式。例如对于：
$(x+y)\times(y-2)$
当 $x = 2$

最低0.47元/天解锁文章

Hugging Face 的 Transformers 库快速入门 （三）必要的 Pytorch 知识

文章目录

前言

Pytorch 基础

张量

张量计算

Hugging Face 的 Transformers 库快速入门（三）必要的 Pytorch 知识