pytorch中的上采样以及各种反操作,求逆操作

本文详细介绍了深度学习中常用的上采样技术,包括最近邻插值、线性插值、双线性和三线性插值等算法,并探讨了它们在不同维度数据上的应用。同时,文章还深入解析了卷积转置层的工作原理及其参数设置,以及最大反池化层如何部分逆向最大池化操作。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

import torch.nn.functional as F

import torch.nn as nn 

 F.upsample(input, size=None, scale_factor=None,mode='nearest', align_corners=None)

    r"""Upsamples the input to either the given :attr:`size` or the given
    :attr:`scale_factor`

    The algorithm used for upsampling is determined by :attr:`mode`.

    Currently temporal, spatial and volumetric upsampling are supported, i.e.
    expected inputs are 3-D, 4-D or 5-D in shape.

    The input dimensions are interpreted in the form:
    `mini-batch x channels x [optional depth] x [optional height] x width`.

    The modes available for upsampling are: `nearest`, `linear` (3D-only),
    `bilinear` (4D-only), `trilinear` (5D-only)

    Args:
        input (Tensor): the input tensor
        size (int or Tuple[int] or Tuple[int, int] or Tuple[int, int, int]):
            output spatial size.
        scale_factor (int): multiplier for spatial size. Has to be an integer.
        mode (string): algorithm used for upsampling:
            'nearest' | 'linear' | 'bilinear' | 'trilinear'. Default: 'nearest'
        align_corners (bool, optional): if True, the corner pixels of the input
            and output tensors are aligned, and thus preserving the values at
            those pixels. This only has effect when :attr:`mode` is `linear`,
            `bilinear`, or `trilinear`. Default: False

    .. warning::
        With ``align_corners = True``, the linearly interpolating modes
        (`linear`, `bilinear`, and `trilinear`) don't proportionally align the
        output and input pixels, and thus the output values can depend on the
        input size. This was the default behavior for these modes up to version
        0.3.1. Since then, the default behavior is ``align_corners = False``.
        See :class:`~torch.nn.Upsample` for concrete examples on how this
        affects the outputs.

    """

nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1)

"""
Parameters:	
    in_channels (int) – Number of channels in the input image
    out_channels (int) – Number of channels produced by the convolution
    kernel_size (int or tuple) – Size of the convolving kernel
    stride (int or tuple, optional) – Stride of the convolution. Default: 1
    padding (int or tuple, optional) – kernel_size - 1 - padding zero-padding will be added to both sides of each dimension in the input. Default: 0
    output_padding (int or tuple, optional) – Additional size added to one side of each dimension in the output shape. Default: 0
    groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1
    bias (bool, optional) – If True, adds a learnable bias to the output. Default: True
    dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1


"""

计算方式:

定义:nn.MaxUnpool2d(kernel_size, stride=None, padding=0)

调用:

def forward(self, input, indices, output_size=None):
    return F.max_unpool2d(input, indices, self.kernel_size, self.stride,
                          self.padding, output_size)

    r"""Computes a partial inverse of :class:`MaxPool2d`.

    :class:`MaxPool2d` is not fully invertible, since the non-maximal values are lost.

    :class:`MaxUnpool2d` takes in as input the output of :class:`MaxPool2d`
    including the indices of the maximal values and computes a partial inverse
    in which all non-maximal values are set to zero.

    .. note:: `MaxPool2d` can map several input sizes to the same output sizes.
              Hence, the inversion process can get ambiguous.
              To accommodate this, you can provide the needed output size
              as an additional argument `output_size` in the forward call.
              See the Inputs and Example below.

    Args:
        kernel_size (int or tuple): Size of the max pooling window.
        stride (int or tuple): Stride of the max pooling window.
            It is set to ``kernel_size`` by default.
        padding (int or tuple): Padding that was added to the input

    Inputs:
        - `input`: the input Tensor to invert
        - `indices`: the indices given out by `MaxPool2d`
        - `output_size` (optional) : a `torch.Size` that specifies the targeted output size

    Shape:
        - Input: :math:`(N, C, H_{in}, W_{in})`
        - Output: :math:`(N, C, H_{out}, W_{out})` where

    计算公式:见下面

    Example: 见下面


    """

F. max_unpool2d(input, indices, kernel_size, stride=None, padding=0, output_size=None)

见上面的用法一致!

def max_unpool2d(input, indices, kernel_size, stride=None, padding=0,
                 output_size=None):
    r"""Computes a partial inverse of :class:`MaxPool2d`.

    See :class:`~torch.nn.MaxUnpool2d` for details.
    """
    pass

 

### 上采样的插值操作与实现方法 #### 1. 基于点云的上采样 在处理三维点云数据时,可以通过多层最小二乘法(MLS, Moving Least Squares)来实现点云的上采样。该方法的核心在于通过局部曲面拟合,在缺失区域生成新的点以提高密度和分辨率。具体来说,重建过程依赖于估计的曲面模型,并利用插值技术填充空白区域[^1]。 以下是基于 PCL 库实现 MLS 的 Python 示例代码: ```python import pcl from pcl import ml def mls_upsampling(cloud): # 创建 MLS 对象 mls = cloud.make_moving_least_squares() # 设置参数 mls.set_search_radius(0.03) # 搜索半径 mls.set_polynomial_order(2) # 多项式阶数 # 执行 MLS 并返回结果 result_cloud = mls.process() return result_cloud ``` 此代码片段展示了如何使用 `pcl` 库执行 MLS 方法进行点云上采样。 --- #### 2. 钉床方法 (Bicubic Interpolation) 钉床方法是一种高效的图像或特征图上采样算法。它通过对像素值进行三次多项式插值得到更精细的结果。相比于简单的最近邻插值和双线性插值,钉床方法能够提供更高的精度并减少锯齿效应[^2]。 下面是采用 PyTorch 实现 Bicubic 插值的一个例子: ```python import torch.nn.functional as F def bicubic_interpolation(input_tensor, scale_factor=2): # 使用 PyTorch 中的 interpolate 函数实现 Bicubic 插值 upsampled_tensor = F.interpolate( input_tensor, scale_factor=scale_factor, mode='bicubic', align_corners=True ) return upsampled_tensor ``` 上述函数可以将输入张量按指定比例放大,并应用 Bicubic 插值完成细节恢复。 --- #### 3. 图像上采样的其他常见方法 除了 Bicubic 插值外,还有多种主流的技术可用于图像上采样,包括但不限于以下几种[^3]: - **双线性插值 (Bilinear)** 这种方法适用于二维空间内的简单平滑需,其核心是对四个相邻像素取加权平均值。 - **反卷积 (Transposed Convolution)** 反卷积也被称为转置卷积,主要用于神经网络中的上采样任务。它可以学习特定模式下的映射关系,从而生成高质量的目标图像。 - **反池化 (Unpooling)** 此方法通常作为下采样操作(如最大池化 Max Pooling)的运算存在,旨在还原原始尺寸的同时保留重要信息。 下面是一个 Transposed Convolution 的 TensorFlow 实现案例: ```python import tensorflow as tf def transposed_convolution(input_tensor, filters, kernel_size, strides): # 定义反卷积层 output = tf.keras.layers.Conv2DTranspose( filters=filters, kernel_size=kernel_size, strides=strides, padding="same", activation="relu" )(input_tensor) return output ``` 这段代码展示了一个基本的反卷积层配置方式及其调用流程。 --- #### 总结 针对不同应用场景可以选择合适的上采样策略。对于点云数据推荐使用 MLS 技术;而对于传统图像领域,则有更多选项可供挑选,例如 Bicubic、Bilinear 或者深度学习驱动的方法(如 Transposed Convolution 和 Unpooling)。每种方案都有各自的优势以及适用范围,请根据实际项目需做出最佳决策。
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值