矩阵操作 - view & permutation

最新推荐文章于 2024-04-26 13:47:04 发布

原创最新推荐文章于 2024-04-26 13:47:04 发布 · 245 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#pytorch #python #深度学习

深度学习同时被 2 个专栏收录

3 篇文章

订阅专栏

笔记

1 篇文章

订阅专栏

这篇博客介绍了如何使用PyTorch对2D特征图进行窗口划分，通过`view`和`permute`操作将20x20的2Dtensor转换为5x5x2x2的5Dtensor。首先，`view`方法将输入数据按指定大小重新排列，接着`permute`调整维度顺序，最终得到所需窗口结构。通过示例代码和步骤解释了这两个操作的作用，并提供了与`reshape`的区别以及相关参考资料。

部署运行你感兴趣的模型镜像

1. 提出问题

对于一个2D特征图 $\in\mathbb R^{M\times N \times d}$ ，空间尺寸 $\times N$ ，将其划分为 $s_p \times s_p$ 的窗格，请问用代码如何表示
（上面这个问题来自Focal Self-attention for Local-Global Interactions in Vision Transformers论文，地址）
举例：下面图示一个20x20的2D tensor，目标是将这个2D tensor划分为 4x4的窗格，所以希望输出 5x5x2x2的5D tensor
在这里插入图片描述

2. 解决方案

# 假设窗格尺寸为 2x2
# windows 就是
import torch

def getWindow(x, win_size):
  M, N, C = x.shape
  x = x.view(M, M // window_size, window_size, N // window_size, window_size, C)
  windows = x.permute(0, 2, 1, 3, 4).contiguous()
  return windows

3. 讨论

首先，使用torch的tensor.view方法，将输入的 $x$ 重新排列为 $\hat{x} \in\mathbb R^{\frac{M}{s_p}\times s_p \times \frac{N}{s_p} \times s_p \times d}$
然后，使用torch的tensor.permute方法，将 $x$ 重新排列为 $\hat{x} \in\mathbb R^{\frac{M}{s_p}\times \frac{N}{s_p} \times s_p \times s_p \times d}$
一开始觉得很神奇，也不理解为什么用以上这两步操作就可以得到结果。仔细思考，view的作用是将 $x$ 划分为所需的“粒度”。而permute的作用是将粒度重新排列成想要的顺序。
下面做一个测试：假设输入8x6的2D数据，win_size是2

# 准备数据
H = 8
W = 6
data = np.arange(H*W)
print(data)
"""
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47]
"""
data = torch.tensor(data)
data = data.reshape(H,W)
print(data)
"""
tensor([[ 0,  1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10, 11],
        [12, 13, 14, 15, 16, 17],
        [18, 19, 20, 21, 22, 23],
        [24, 25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34, 35],
        [36, 37, 38, 39, 40, 41],
        [42, 43, 44, 45, 46, 47]], dtype=torch.int32)
"""
data_view = data.view(int(H/2), 2, int(W/2), 2)
print(data_view)
"""
tensor([[[[ 0,  1],
          [ 2,  3],
          [ 4,  5]],

         [[ 6,  7],
          [ 8,  9],
          [10, 11]]],


        [[[12, 13],
          [14, 15],
          [16, 17]],

         [[18, 19],
          [20, 21],
          [22, 23]]],


        [[[24, 25],
          [26, 27],
          [28, 29]],

         [[30, 31],
          [32, 33],
          [34, 35]]],
show more (open the raw output data in a text editor) ...

          [40, 41]],

         [[42, 43],
          [44, 45],
          [46, 47]]]], dtype=torch.int32)
"""
windows = data_view.permute(0, 2, 1, 3).contiguous()
print(windows)
"""
tensor([[[[ 0,  1],
          [ 6,  7]],

         [[ 2,  3],
          [ 8,  9]],

         [[ 4,  5],
          [10, 11]]],


        [[[12, 13],
          [18, 19]],

         [[14, 15],
          [20, 21]],

         [[16, 17],
          [22, 23]]],


        [[[24, 25],
          [30, 31]],

         [[26, 27],
          [32, 33]],
show more (open the raw output data in a text editor) ...

         [[38, 39],
          [44, 45]],

         [[40, 41],
          [46, 47]]]], dtype=torch.int32)
"""

这里补充一个pytorch view和reshape的区别：
https://stackoverflow.com/questions/49643225/whats-the-difference-between-reshape-and-view-in-pytorch
一般情况下，建议用view

6. 参考

https://stackoverflow.com/questions/48915810/pytorch-what-does-contiguous-do
https://jdhao.github.io/2019/07/10/pytorch_view_reshape_transpose_permute/
https://clay-atlas.com/us/blog/2021/08/11/pytorch-en-view-permute-change-dimensions/
https://stackoverflow.com/questions/49643225/whats-the-difference-between-reshape-and-view-in-pytorch

您可能感兴趣的与本文相关的镜像