Pytorch教程之torch.mm、torch.bmm、torch.matmul、masked_fill

PyTorch中的张量矩阵操作：torch.mm,torch.bmm,torch.matmul与masked_fill

最新推荐文章于 2025-10-15 00:13:07 发布

原创最新推荐文章于 2025-10-15 00:13:07 发布 · 2.3k 阅读

15 ·

CC 4.0 BY-SA版权

文章标签：

#pytorch

Pytorch 专栏收录该内容

3 篇文章

订阅专栏

本文介绍了PyTorch中用于矩阵运算的函数torch.mm、torch.bmm和torch.matmul，以及如何进行张量的填充操作masked_fill。torch.mm和torch.matmul涉及矩阵乘法，torch.bmm处理批量矩阵乘法，而masked_fill允许根据布尔掩码对张量特定位置进行填充。这些函数在深度学习，特别是自然语言处理（NLP）中的注意力机制等场景有广泛应用。

部署运行你感兴趣的模型镜像

1、简介

这几天正在看NLP中的注意力机制，代码中涉及到了一些关于张量矩阵乘法和填充一些代码，这里积累一下。主要参考了pytorch2.0的官方文档。
①torch.mm(input,mat2,*,out=None)
②torch.bmm(input,mat2,*,out=None)
③torch.matmul(input, other, *, out=None)
④Tensor.masked_fill

2、torch.mm

torch.mm语法为：

torch.mm(input, mat2, *, out=None) → Tensor

就是矩阵的乘法。如果输入input是(n，m)，mat2是(m, p)，则输出为(n, p)。
示例：

mat1 = torch.randn(2, 3)
mat2 = torch.randn(3, 3)
torch.mm(mat1, mat2)
-->tensor([[ 0.4851,  0.5037, -0.3633],
        [-0.0760, -3.6705,  2.4784]])

3、torch.bmm

torch.bmm语法为：

torch.bmm(input, mat2, *, out=None) → Tensor

功能：对存储在input和mat2矩阵中的批数量的矩阵进行乘积。
要求：input矩阵和mat2必须是三维的张量，且第一个维度即batch维度必须一样。
举例：如果input是一个(b, n , m)的张量，mat2是一个(b, m, p)张量，则输出形状为(b, n, p)

示例：

input = torch.randn(10, 3, 4)
mat2 = torch.randn(10, 4, 5)
res = torch.bmm(input, mat2)
res.size()
-->torch.Size([10, 3, 5])

解读：实际上刻画的就是一组矩阵与另一组张量矩阵的乘积，至于一组有多少个矩阵，由input和mat2的第一个输入维度决定，上述代码第一个维度为10，就代表着10个形状为(3, 4)的矩阵与10个形状为(4, 5)的矩阵分别对应相乘，得到10个形状为(3, 5)的矩阵。

4、torch.matmul

torch.matmul语法为：

torch.matmul(input, other, *, out=None) → Tensor

该函数刻画的是两个张量的乘积，且计算过程与张量的维度密切相关。

① 如果张量是一维的，输出结果是点乘，是一个标量。

a = torch.tensor([1,2,4])
b = torch.tensor([2,5,6])
print(torch.matmul(a, b))
print(a.shape)
--> tensor(36)
-->torch.Size([3])

注意：张量a.shape显示的是torch.Size([3])，只有一个维度，3是指这个维度中有3个数。
② 如果两个张量都是二维的，执行的是矩阵的乘法。

a = torch.tensor([
    [1,2,4], 
    [6,2,1]
         ])
b = torch.tensor([
    [2,5],
    [1,2],
    [6,8]
])
print(a.shape)
print(b.shape)
print(torch.matmul(a, b))
-->torch.Size([2, 3])
-->torch.Size([3, 2])
-->tensor([[28, 41],
        [20, 42]])

由上述示例可知，如果两个张量均为2维，那么其运算和torch.mm是一样的。
③如果第一个参数input是1维的，第二个参数是二维的，那么在计算时，在第一个参数前增加一个维度1，计算完毕之后再把这个维度去掉。

a = torch.tensor([1,2,4])
b = torch.tensor([
    [2,5],
    [1,2],
    [6,8]
])

print(a.shape)
print(b.shape)
print(torch.matmul(a, b))
-->torch.Size([3])
-->torch.Size([3, 2])
-->tensor([28, 41])

如上所示，a只有一个维度，在进行计算时，变成了(1, 3)，则变成了(1, 3)乘以(3, 2)，变成(1, 2)，最后在去掉1这个维度。
④如果第一个参数是2维的，第二个参数是1维的，则返回矩阵-向量乘积。

a = torch.tensor([1,2])
b = torch.tensor([
    [2,5],
    [1,2],
    [6,8]
])

print(b.shape)
print(a.shape)
print(torch.matmul(b, a))
-->torch.Size([3, 2])
-->torch.Size([2])
-->tensor([12,  5, 22])

矩阵乘以张量，就是矩阵中的每一行都与这个张量相乘，最终得到一个一维的，大小为3的结果。
⑤多个维度

如果两个参数至少都是1维的，且有一个参数的维度N>2，则返回的是一个批矩阵的乘积(即把多出的那个维度看作batch即可，让每个batch后的矩阵与后面的张量相乘即可)。
如果第一个参数是1维的，则在它的维度前加上1，以便批量矩阵相乘并在之后删除。如果第二个参数是1维的，则将1追加到其维度，用于批处理矩阵倍数，然后删除。
举例：如果input形状是(j，1，n，n)，other的张量形状是(k，n，n)，那么输出张量的形状将会是(j，k，n，n)。
如果input形状是(j，1，n，m)，other的张量形状是(k，m，p)，那么输出张量的形状将会是(j，k，n，p)。

tensor1 = torch.randn(10, 3, 4, 5)
tensor2 = torch.randn(5, 4)
torch.matmul(tensor1, tensor2).size()
-->torch.Size([10, 3, 4, 4])

tensor1 = torch.randn(10, 3, 4, 5)
tensor2 = torch.randn(1, 5, 4)
torch.matmul(tensor1, tensor2).size()
-->torch.Size([10, 3, 4, 4])

tensor1 = torch.randn(10, 3, 4, 5)
tensor2 = torch.randn(1, 1, 5, 4)
torch.matmul(tensor1, tensor2).size()
-->torch.Size([10, 3, 4, 4])

仔细比较上述三个代码块，其最终的结果是一样的。可以简单记为如果两个维度不一致的话，多出的维度就看作是batch维，相当于在低维度前面增加一个维度。

5、masked_fill

语法为：

Tensor.masked_fill_(mask, value)

参数：

mask(BoolTensor)：布尔掩码
value(float)：用于填充的值。

mask是一个pytorch张量，元素是布尔值，value是要填充的值，填充规则是mask中取值为True的位置对应与需要填充的张量中的位置用value填充。

a = torch.tensor([
    [0, 8],
    [ 6, 8],
    [ 7,  1]
])

mask = torch.tensor([
    [ True, False],
    [False, False],
    [False,  True]
])
b = a.masked_fill(mask, -1e9)
print(b)
-->tensor([[-1000000000,           8],
        [          6,           8],
        [          7, -1000000000]])