去噪算法: 评价指标、计算参数量以及CBAM回顾

最新推荐文章于 2025-10-11 11:25:12 发布

原创最新推荐文章于 2025-10-11 11:25:12 发布 · 4.2k 阅读

5 ·

CC 4.0 BY-SA版权

Deep learning 同时被 2 个专栏收录

3 篇文章

订阅专栏

图像去噪

2 篇文章

订阅专栏

本文回顾了图像去噪技术，重点关注Residual Dense Network在超分辨率中的应用。评估指标包括PSNR和SSIM，参数及内存计算涉及模型、层输出在训练时的内存占用，例如VGG16。此外，探讨了CBAM模块的位置对其性能的影响，并介绍了FLOPS作为计算能力的度量。文章提到了sub-pixel convolution和GRDN技术，并详细解释了CBAM的通道和空间注意力机制。

Denoising review:

The trials made can be found on Github.

Assessment indexes:

PSNR: peak signal to noise ratio
$10*log_{10}(\frac{(2^n-1)^2}{MSE})=20*log_{10}(\frac{MAX}{\sqrt{MSE}})$
注意MSE是均方误差(已平均化)，MAX表示图像颜色的最大值，一般8位图表示255

PSNR单位是dB.

SSIM: structural similarity index
$\frac{2u_Xu_Y+C_1}{u_X^2+u_Y^2+C_1}, C(X,Y) = \frac{2\sigma_X\sigma_Y+C_2}{\sigma_X^2+\sigma_Y^2+C_2},S(X,Y)=\frac{\sigma_{XY}+C_3}{\sigma_X\sigma_Y+C_3}$
where $C1 = (K1*L)^2,C2=(K2*L)^2,C3=C2/2$ , and generally $K 1 = 0.01, K 2 = 0.03, l = 255$ .

$S S I M = L * C * S$ .

Note some basic np operations:

np.contatenate((a,b), dims=1)

np.reshape(a,newshape=(2,2,2,2))

np.split(a,4,0)

np.vstack

np.hstack, etc.

parameters & memory calculation

calculation

Note:

Total GPU memory = memory for model & layer outputs

In training time: forward , backward X2

e.g. VGG16 , memory of params: 528MB, memory of layers: 58.12MB/image

when training:

SGD+momentum, batchsize = 128

memory for model:

528MB*3 = 1.54 GB

1 for params, 1 for SGD, 1 for momemtum

If use Adam, need to x4
Memory for outputs:

128*58.12MB *2 = 14.53GB
Total

14.53+1.54 = 16.07GB

FLOPS：
Conv = para*H*W
Linear = para

TFlops/s，可以简单写为T/s，是数据流量的计数单位，意思是”1万亿次浮点指令每秒”，它是衡量一个电脑计算能力的标准。1TFlops=1024GFlops，即1T=1024G。

a useful tool from torchstat import stat

model

1. Residual Dense Network for Image Super-Resolution paper

在这里插入图片描述

Here, LR represents Low Resolution Images.

upscale:

sub-pixel convolution

the best position of CBAM is after the up-sampling layer. up 0.0x dB, if patch-size up, perf down.

GRDN:

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-QKeMc3EO-1585630012708)(C:\Users\Knxie\AppData\Roaming\Typora\typora-user-images\1575102441452.png)]$

… yeah, concatenate, concatenate, concatenate.

CBAM: paper

在这里插入图片描述

The channel attention is computed as :
$M_C(F) = \sigma(MLP(AvgPool(F))+MLP(MaxPool(F)))$
Where $\sigma$ represents activation function, e.g. ReLU.

The spatial attention is computed as:
$M_s(F) = \sigma(f^{7*7}([AvgPool(F);MaxPool(F)]))\\ = \sigma(f^{7*7}([F_{avg}^s;F_{max}^s]))$
where, $F_{avg}^s, F^{s}_{max} \in \mathbb{R}^{1\times H \times W }$

class ChannelAttention(nn.Module):
    def __init__(self, in_planes, ratio=16):
        super(ChannelAttention, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.max_pool = nn.AdaptiveMaxPool2d(1)

        self.fc1   = nn.Conv2d(in_planes, in_planes // 16, 1, bias=False)
        self.relu1 = nn.ReLU()
        self.fc2   = nn.Conv2d(in_planes // 16, in_planes, 1, bias=False)

        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = self.fc2(self.relu1(self.fc1(self.avg_pool(x))))
        max_out = self.fc2(self.relu1(self.fc1(self.max_pool(x))))
        out = avg_out + max_out
        return self.sigmoid(out)

channel attention: $C\times W $ -> pool


class SpatialAttention(nn.Module):
    def __init__(self, kernel_size=7):
        super(SpatialAttention, self).__init__()

        assert kernel_size in (3, 7), 'kernel size must be 3 or 7'
        padding = 3 if kernel_size == 7 else 1

        self.conv1 = nn.Conv2d(2, 1, kernel_size, padding=padding, bias=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = torch.mean(x, dim=1, keepdim=True)
        max_out, _ = torch.max(x, dim=1, keepdim=True)
        x = torch.cat([avg_out, max_out], dim=1)
        x = self.conv1(x)
        return self.sigmoid(x)