去噪算法: 评价指标、计算参数量以及CBAM回顾

本文回顾了图像去噪技术,重点关注Residual Dense Network在超分辨率中的应用。评估指标包括PSNR和SSIM,参数及内存计算涉及模型、层输出在训练时的内存占用,例如VGG16。此外,探讨了CBAM模块的位置对其性能的影响,并介绍了FLOPS作为计算能力的度量。文章提到了sub-pixel convolution和GRDN技术,并详细解释了CBAM的通道和空间注意力机制。

Denoising review:

The trials made can be found on Github.

Assessment indexes:

PSNR: peak signal to noise ratio
P S N R = 10 ∗ l o g 10 ( ( 2 n − 1 ) 2 M S E ) = 20 ∗ l o g 10 ( M A X M S E ) PSNR = 10*log_{10}(\frac{(2^n-1)^2}{MSE})=20*log_{10}(\frac{MAX}{\sqrt{MSE}}) PSNR=10log10(MSE(2n1)2)=20log10(MSE MAX)
注意MSE是均方误差(已平均化),MAX表示图像颜色的最大值,一般8位图表示255

PSNR单位是dB.

SSIM: structural similarity index
L ( X , Y ) = 2 u X u Y + C 1 u X 2 + u Y 2 + C 1 , C ( X , Y ) = 2 σ X σ Y + C 2 σ X 2 + σ Y 2 + C 2 , S ( X , Y ) = σ X Y + C 3 σ X σ Y + C 3 L(X,Y) = \frac{2u_Xu_Y+C_1}{u_X^2+u_Y^2+C_1}, C(X,Y) = \frac{2\sigma_X\sigma_Y+C_2}{\sigma_X^2+\sigma_Y^2+C_2},S(X,Y)=\frac{\sigma_{XY}+C_3}{\sigma_X\sigma_Y+C_3} L(X,Y)=uX2+uY2+C12uXuY+C1,C(X,Y)=σX2+σY2+C22σXσY+C2,S(X,Y)=σXσY+C3σXY+C3
where C 1 = ( K 1 ∗ L ) 2 , C 2 = ( K 2 ∗ L ) 2 , C 3 = C 2 / 2 C1 = (K1*L)^2,C2=(K2*L)^2,C3=C2/2 C1=(K1L)2,C2=(K2L)2,C3=C2/2, and generally K 1 = 0.01 , K 2 = 0.03 , l = 255 K1=0.01,K2=0.03, l=255 K1=0.01,K2=0.03,l=255.

S S I M = L ∗ C ∗ S SSIM = L*C*S SSIM=LCS.

Note some basic np operations:

np.contatenate((a,b), dims=1)

np.reshape(a,newshape=(2,2,2,2))

np.split(a,4,0)

np.vstack

np.hstack, etc.

parameters & memory calculation

calculation

Note:

Total GPU memory = memory for model & layer outputs

In training time: forward , backward X2

e.g. VGG16 , memory of params: 528MB, memory of layers: 58.12MB/image

when training:

SGD+momentum, batchsize = 128

memory for model:

  1. 528MB*3 = 1.54 GB

    1 for params, 1 for SGD, 1 for momemtum

    If use Adam, need to x4

  2. Memory for outputs:

    128*58.12MB *2 = 14.53GB

  3. Total

    14.53+1.54 = 16.07GB

FLOPS:
Conv = para*H*W
Linear = para

TFlops/s,可以简单写为T/s, 是数据流量的计数单位,意思是”1万亿次浮点指令每秒”,它是衡量一个电脑计算能力的标准。1TFlops=1024GFlops,即1T=1024G。

a useful tool from torchstat import stat

model


1. Residual Dense Network for Image Super-Resolution paper

在这里插入图片描述

Here, LR represents Low Resolution Images.

upscale:

sub-pixel convolution

the best position of CBAM is after the up-sampling layer. up 0.0x dB, if patch-size up, perf down.

GRDN:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-QKeMc3EO-1585630012708)(C:\Users\Knxie\AppData\Roaming\Typora\typora-user-images\1575102441452.png)]

… yeah, concatenate, concatenate, concatenate.

CBAM: paper

在这里插入图片描述
在这里插入图片描述
The channel attention is computed as :
M C ( F ) = σ ( M L P ( A v g P o o l ( F ) ) + M L P ( M a x P o o l ( F ) ) ) M_C(F) = \sigma(MLP(AvgPool(F))+MLP(MaxPool(F))) MC(F)=σ(MLP(AvgPool(F))+MLP(MaxPool(F)))
Where σ \sigma σ represents activation function, e.g. ReLU.

The spatial attention is computed as:
M s ( F ) = σ ( f 7 ∗ 7 ( [ A v g P o o l ( F ) ; M a x P o o l ( F ) ] ) ) = σ ( f 7 ∗ 7 ( [ F a v g s ; F m a x s ] ) ) M_s(F) = \sigma(f^{7*7}([AvgPool(F);MaxPool(F)]))\\ = \sigma(f^{7*7}([F_{avg}^s;F_{max}^s])) Ms(F)=σ(f77([AvgPool(F);MaxPool(F)]))=σ(f77([Favgs;Fmaxs]))
where, F a v g s , F m a x s ∈ R 1 × H × W F_{avg}^s, F^{s}_{max} \in \mathbb{R}^{1\times H \times W } Favgs,FmaxsR1×H×W

class ChannelAttention(nn.Module):
    def __init__(self, in_planes, ratio=16):
        super(ChannelAttention, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.max_pool = nn.AdaptiveMaxPool2d(1)

        self.fc1   = nn.Conv2d(in_planes, in_planes // 16, 1, bias=False)
        self.relu1 = nn.ReLU()
        self.fc2   = nn.Conv2d(in_planes // 16, in_planes, 1, bias=False)

        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = self.fc2(self.relu1(self.fc1(self.avg_pool(x))))
        max_out = self.fc2(self.relu1(self.fc1(self.max_pool(x))))
        out = avg_out + max_out
        return self.sigmoid(out)

channel attention: $C\times W $ -> pool


class SpatialAttention(nn.Module):
    def __init__(self, kernel_size=7):
        super(SpatialAttention, self).__init__()

        assert kernel_size in (3, 7), 'kernel size must be 3 or 7'
        padding = 3 if kernel_size == 7 else 1

        self.conv1 = nn.Conv2d(2, 1, kernel_size, padding=padding, bias=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = torch.mean(x, dim=1, keepdim=True)
        max_out, _ = torch.max(x, dim=1, keepdim=True)
        x = torch.cat([avg_out, max_out], dim=1)
        x = self.conv1(x)
        return self.sigmoid(x)

For channels, use torch.mean or torch.max to implement the avg or pool for channel info.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值