【pytorch】torch.mm，torch.bmm以及torch.matmul的使用

安安爸Chris

已于 2024-10-09 16:37:00 修改

阅读量3.2w

点赞数 31

分类专栏： pytorch 文章标签： pytorch

于 2021-08-19 19:26:29 首次发布

本文链接：https://blog.youkuaiyun.com/mimiduck/article/details/119797012

版权

pytorch 专栏收录该内容

24 篇文章

订阅专栏

本文详细解析了torch库中矩阵乘法函数torch.mm, torch.bmm和torch.matmul的用法区别，涵盖了不同维度的矩阵操作，从一维到多维的扩展，以及与numpy的对比。重点讲解了广播机制和batched matrix multiply的应用场景。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

torch.mm

torch.mm是两个矩阵相乘，即两个二维的张量相乘

如下面的例子

mat1 = torch.randn(2,3)
print("mat1=", mat1)

mat2 = torch.randn(3,2)
print("mat2=", mat2)

mat3 = torch.mm(mat1, mat2)
print("mat3=", mat3)

但是如果维度超过二维，则会报错。RuntimeError: self must be a matrix

torch.bmm

它其实就是加了一维batch，所以第一位为batch，并且要两个Tensor的batch相等。

第二维和第三维就是mm运算了，同上了。

示例代码如下：

mat1 = torch.randn(10, 2, 4)
# print("mat1=", mat1)

mat2 = torch.randn(10, 4, 1)
# print("mat2=", mat2)

mat3 = torch.matmul(mat1, mat2)
print("mat3=", mat3, mat3.shape)

torch.matmul

torch.mm仅仅是供矩阵相乘使用，使用范围较为狭窄。
而torch.matmul使用的场合就比较多了。

如官方文档所介绍，有如下几种：

If both tensors are 1-dimensional, the dot product (scalar) is returned.
如果两个tensor都是一维的，则为点乘运算，即每个元素对应相乘求和
如下：

mat1 = torch.Tensor([1,2])
print("mat1=", mat1)

mat2 = torch.Tensor([1,2])
print("mat2=", mat2)

mat3 = torch.matmul(mat1, mat2)
print("mat3=", mat3)

公式如下：
$A=[a_1,a_2 ,..., a_n] \newline B=[b_1,b_2 ,..., b_n] \newline matmul(A,B)=\sum_{i}a_ib_i$

样例代码如下：

If both arguments are 2-dimensional, the matrix-matrix product is returned.
如果两个都是二维的，那么就如同torch.mm
If the first argument is 1-dimensional and the second argument is 2-dimensional, a 1 is prepended to its dimension for the purpose of the matrix multiply. After the matrix multiply, the prepended dimension is removed.
如果第一个入参是一维的，第二个入参是二维的，则第一个参数增加一个一维，做矩阵乘法，结果然后去掉一维

如，第一参数A shape为（2），第二参数B shape（2，2），
先讲A增加一维，shape为（1，2），然后计算矩阵相乘
结果为（1，2），再去掉第一维，改为（2）

代码样例如下：

mat1 = torch.Tensor([1,-1])
print("mat1=", mat1)

mat2 = torch.Tensor([[1,2],[-1,-2]])
print("mat2=", mat2)

mat3 = torch.matmul(mat1, mat2)
print("mat3=", mat3, mat3.shape)

If the first argument is 2-dimensional and the second argument is 1-dimensional, the matrix-vector product is returned.
如果第一个入参是二维的，第二个是一维，则将第二个参数扩展一维，做矩阵乘法。

如，第一参数A shape为（2，2），第二参数B shape（2），
先将B增加一维，shape为（2,1），然后计算矩阵相乘
结果为（2,1），再去掉第一维，改为（2）
代码样例如下：

mat1 = torch.Tensor([[1,2],[-1,-2]])
print("mat1=", mat1)

mat2 = torch.Tensor([1,-1])
print("mat2=", mat2)

mat3 = torch.matmul(mat1, mat2)
print("mat3=", mat3, mat3.shape)

If both arguments are at least 1-dimensional and at least one argument is N-dimensional (where N > 2), then a batched matrix multiply is returned. If the first argument is 1-dimensional, a 1 is prepended to its dimension for the purpose of the batched matrix multiply and removed after. If the second argument is 1-dimensional, a 1 is appended to its dimension for the purpose of the batched matrix multiple and removed after. The non-matrix (i.e. batch) dimensions are broadcasted (and thus must be broadcastable). For example, if input is a (j \times 1 \times n \times n)(j×1×n×n) tensor and other is a (k \times n \times n)(k×n×n) tensor, out will be a (j \times k \times n \times n)(j×k×n×n) tensor
最后一个看的很复杂，实际上需要明白2点：
一，两边shape的rank要相同，最后两维是用来做mm计算的
二，如果shape的rank不同，则可以通过“broadcast”，即+1维。如shape（10，4，2）matmul shape（2，3）则变为 shape（10，4，2）matmul shape（1，2，3）

但是要注意：

如果是shape（10，4，2）matmul shape（2，2，3）会报错，因为batch不匹配
如果是shape(2,3, 2, 4)matmul shape(2, 4, 1)，也会报错。因为batch （2，3）不匹配（2）
但是shape(2,3, 2, 4)matmul shape(3, 4, 1) 是可以的，因为 shape(3, 4, 1) 可以通过broadcast转为shape(1，3, 4, 1)

对比numpy

相对numpy，有类似函数tensordot，具体可以参考我这一篇，但是个人觉得还是numpy的实现更灵活一些

========================== 2024年10月更新 ==========================
其实在实际代码中，发现使用上述三者的还是比较少的。现在比较流程的是torch.einsum

下面是一个实际的使用例子，大家可以参考

import torch

a = torch.rand(3,4,2)
b = torch.rand(3,2,5)
c = torch.matmul(a, b)
print('c.shape = ', c.shape)
d = torch.einsum('ijk,ikm->ijm', a, b)
print('d.shape = ', d.shape)
print(torch.equal(c, d))