【学习笔记】torch.nn.MaxPool2d参数解释

最新推荐文章于 2025-04-03 08:17:39 发布

原创最新推荐文章于 2025-04-03 08:17:39 发布 · 2.7k 阅读

2 ·

CC 4.0 BY-SA版权

文章标签：

#python

本文详细介绍了PyTorch中MaxPool2d模块的工作原理，包括其在2D输入上的作用方式，参数如kernel_size、stride、padding和dilation的影响，以及如何计算输出尺寸。通过实例展示了不同参数设置下的池化操作，并强调了在需要最大值索引时return_indices参数的重要性。

部署运行你感兴趣的模型镜像

日常学习，给自己挖坑，and造轮子

class torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)
如下是MaxPool2d的解释：

class MaxPool2d(_MaxPoolNd):
    r"""Applies a 2D max pooling over an input signal composed of several input
    planes.

    In the simplest case, the output value of the layer with input size :math:`(N, C, H, W)`,output :math:`(N, C, H_{out}, W_{out})` and :attr:`kernel_size` :math:`(kH, kW)`can be precisely described as:

    .. math::
        \begin{aligned}
            out(N_i, C_j, h, w) ={} & \max_{m=0, \ldots, kH-1} \max_{n=0, \ldots, kW-1} \\
                                    & \text{input}(N_i, C_j, \text{stride[0]} \times h + m,
                                                   \text{stride[1]} \times w + n)
        \end{aligned}

    If :attr:`padding` is non-zero, then the input is implicitly zero-padded on both sides 
    for :attr:`padding` number of points. :attr:`dilation` controls the spacing between the kernel points.
    It is harder to describe, but this `link`_ has a nice visualization of what :attr:`dilation` does.

    The parameters :attr:`kernel_size`, :attr:`stride`, :attr:`padding`, :attr:`dilation` can either be:

        - a single ``int`` -- in which case the same value is used for the height and width dimension
        - a ``tuple`` of two ints -- in which case, the first `int` is used for the height dimension, and the second `int` for the width dimension

    Args:
        kernel_size: the size of the window to take a max over
        stride: the stride of the window. Default value is :attr:`kernel_size`
        padding: implicit zero padding to be added on both sides
        dilation: a parameter that controls the stride of elements in the window
        return_indices: if ``True``, will return the max indices along with the outputs.
                        Useful for :class:`torch.nn.MaxUnpool2d` later
        ceil_mode: when True, will use `ceil` instead of `floor` to compute the output shape

    Shape:
        - Input: :math:`(N, C, H_{in}, W_{in})`
        - Output: :math:`(N, C, H_{out}, W_{out})`, where

          .. math::
              H_{out} = \left\lfloor\frac{H_{in} + 2 * \text{padding[0]} - \text{dilation[0]}
                    \times (\text{kernel\_size[0]} - 1) - 1}{\text{stride[0]}} + 1\right\rfloor

          .. math::
              W_{out} = \left\lfloor\frac{W_{in} + 2 * \text{padding[1]} - \text{dilation[1]}
                    \times (\text{kernel\_size[1]} - 1) - 1}{\text{stride[1]}} + 1\right\rfloor

    Examples::

        >>> # pool of square window of size=3, stride=2
        >>> m = nn.MaxPool2d(3, stride=2)
        >>> # pool of non-square window
        >>> m = nn.MaxPool2d((3, 2), stride=(2, 1))
        >>> input = torch.randn(20, 16, 50, 32)
        >>> output = m(input)

    .. _link:
        https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md
    """

    def forward(self, input):
        return F.max_pool2d(input, self.kernel_size, self.stride,
                            self.padding, self.dilation, self.ceil_mode,
                            self.return_indices)

大致解释为：
在由多个输入通道组成的输入信号上应用2D max池。

在最简单的情况下，具有输入大小的层的输出值：(N, C, H, W)，
输出：(N, C, H_{out}, W_{out})， kernel_size，和 (kH, kW)可以准确地描述为：

out(N_i, C_j, h, w) ={} & \max_{m=0, \ldots, kH-1} \max_{n=0, \ldots, kW-1} \\
                                    & \text{input}(N_i, C_j, \text{stride[0]} \times h + m,
                                                   \text{stride[1]} \times w + n)

如果 padding 非零，那么输入的两边都隐式地被0填充；dilation用于控制内核点之间的距离
但这里很好地展示了 diagration 的作用。

这些参数：kernel_size，stride，padding, ，dilation 可以为：
-单个int值–在这种情况下，高度和宽度标注使用相同的值
-两个整数组成的数组——在这种情况下，第一个int用于高度维度，第二个int表示宽度

参数：

kernel_size(int or tuple) ：max pooling的窗口大小
stride(int or tuple, optional)：max pooling的窗口移动的步长。默认值是kernel_size
padding(int or tuple, optional) ：输入的每一条边补充0的层数
dilation(int or tuple, optional)：一个控制窗口中元素步幅的参数
return_indices ：如果等于True，会返回输出最大值的序号，对于上采样操作会有帮助
ceil_mode ：如果等于True，计算输出信号大小的时候，会使用向上取整，代替默认的向下取整的操作

Shape:
        - Input: :math:`(N, C, H_{in}, W_{in})`
        - Output: :math:`(N, C, H_{out}, W_{out})`, where

          .. math::
              H_{out} = \left\lfloor\frac{H_{in} + 2 * \text{padding[0]} - \text{dilation[0]}
                    \times (\text{kernel\_size[0]} - 1) - 1}{\text{stride[0]}} + 1\right\rfloor

          .. math::
              W_{out} = \left\lfloor\frac{W_{in} + 2 * \text{padding[1]} - \text{dilation[1]}
                    \times (\text{kernel\_size[1]} - 1) - 1}{\text{stride[1]}} + 1\right\rfloor

    Examples::

        >>> # pool of square window of size=3, stride=2
        >>> m = nn.MaxPool2d(3, stride=2)
        >>> # pool of non-square window
        >>> m = nn.MaxPool2d((3, 2), stride=(2, 1))
        >>> input = torch.randn(20, 16, 50, 32)
        >>> output = m(input)