去噪算法: 评价指标、计算参数量以及CBAM回顾

本文回顾了图像去噪技术,重点关注Residual Dense Network在超分辨率中的应用。评估指标包括PSNR和SSIM,参数及内存计算涉及模型、层输出在训练时的内存占用,例如VGG16。此外,探讨了CBAM模块的位置对其性能的影响,并介绍了FLOPS作为计算能力的度量。文章提到了sub-pixel convolution和GRDN技术,并详细解释了CBAM的通道和空间注意力机制。

Denoising review:

The trials made can be found on Github.

Assessment indexes:

PSNR: peak signal to noise ratio
P S N R = 10 ∗ l o g 10 ( ( 2 n − 1 ) 2 M S E ) = 20 ∗ l o g 10 ( M A X M S E ) PSNR = 10*log_{10}(\frac{(2^n-1)^2}{MSE})=20*log_{10}(\frac{MAX}{\sqrt{MSE}}) PSNR=10log10(MSE(2n1)2)=20log10(MSE MAX)
注意MSE是均方误差(已平均化),MAX表示图像颜色的最大值,一般8位图表示255

PSNR单位是dB.

SSIM: structural similarity index
L ( X , Y ) = 2 u X u Y + C 1 u X 2 + u Y 2 + C 1 , C ( X , Y ) = 2 σ X σ Y + C 2 σ X 2 + σ Y 2 + C 2 , S ( X , Y ) = σ X Y + C 3 σ X σ Y + C 3 L(X,Y) = \frac{2u_Xu_Y+C_1}{u_X^2+u_Y^2+C_1}, C(X,Y) = \frac{2\sigma_X\sigma_Y+C_2}{\sigma_X^2+\sigma_Y^2+C_2},S(X,Y)=\frac{\sigma_{XY}+C_3}{\sigma_X\sigma_Y+C_3} L(X,Y)=uX2+uY2+C12uXuY+C1,C(X,Y)=σX2+σY2+C22σXσY+C2,S(X,Y)=σXσY+C3σXY+C3
where C 1 = ( K 1 ∗ L ) 2 , C 2 = ( K 2 ∗ L ) 2 , C 3 = C 2 / 2 C1 = (K1*L)^2,C2=(K2*L)^2,C3=C2/2 C1=(K1L)2,C2=(K2L)2,C3=C2/2, and generally K 1 = 0.01 , K 2 = 0.03 , l = 255 K1=0.01,K2=0.03, l=255 K1=0.01,K2=0.03,l=255.

S S I M = L ∗ C ∗ S SSIM = L*C*S SSIM=LCS.

Note some basic np operations:

np.contatenate((a,b), dims=1)

np.reshape(a,newshape=(2,2,2,2))

np.split(a,4,0)

np.vstack

np.hstack, etc.

parameters & memory calculation

calculation

Note:

Total GPU memory = memory for model & layer outputs

In training time: forward , backward X2

e.g. VGG16 , memory of params: 528MB, memory of layers: 58.12MB/image

when training:

SGD+momentum, batchsize = 128

memory for model:

  1. 528MB*3 = 1.54 GB

    1 for params, 1 for SGD, 1 for momemtum

    If use Adam, need to x4

  2. Memory for outputs:

    128*58.12MB *2 = 14.53GB

  3. Total

    14.53+1.54 = 16.07GB

FLOPS:
Conv = para*H*W
Linear = para

TFlops/s,可以简单写为T/s, 是数据流量的计数单位,意思是”1万亿次浮点指令每秒”,它是衡量一个电脑计算能力的标准。1TFlops=1024GFlops,即1T=1024G。

a useful tool from torchstat import stat

model


1. Residual Dense Network for Image Super-Resolution paper

在这里插入图片描述

Here, LR represents Low Resolution Images.

upscale:

sub-pixel convolution

the best position of CBAM is after the up-sampling layer. up 0.0x dB, if patch-size up, perf down.

GRDN:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-QKeMc3EO-1585630012708)(C:\Users\Knxie\AppData\Roaming\Typora\typora-user-images\1575102441452.png)]

… yeah, concatenate, concatenate, concatenate.

CBAM: paper

在这里插入图片描述
在这里插入图片描述
The channel attention is computed as :
M C ( F ) = σ ( M L P ( A v g P o o l ( F ) ) + M L P ( M a x P o o l ( F ) ) ) M_C(F) = \sigma(MLP(AvgPool(F))+MLP(MaxPool(F))) MC(F)=σ(MLP(AvgPool(F))+MLP(MaxPool(F)))
Where σ \sigma σ represents activation function, e.g. ReLU.

The spatial attention is computed as:
M s ( F ) = σ ( f 7 ∗ 7 ( [ A v g P o o l ( F ) ; M a x P o o l ( F ) ] ) ) = σ ( f 7 ∗ 7 ( [ F a v g s ; F m a x s ] ) ) M_s(F) = \sigma(f^{7*7}([AvgPool(F);MaxPool(F)]))\\ = \sigma(f^{7*7}([F_{avg}^s;F_{max}^s])) Ms(F)=σ(f77([AvgPool(F);MaxPool(F)]))=σ(f77([Favgs;Fmaxs]))
where, F a v g s , F m a x s ∈ R 1 × H × W F_{avg}^s, F^{s}_{max} \in \mathbb{R}^{1\times H \times W } Favgs,FmaxsR1×H×W

class ChannelAttention(nn.Module):
    def __init__(self, in_planes, ratio=16):
        super(ChannelAttention, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.max_pool = nn.AdaptiveMaxPool2d(1)

        self.fc1   = nn.Conv2d(in_planes, in_planes // 16, 1, bias=False)
        self.relu1 = nn.ReLU()
        self.fc2   = nn.Conv2d(in_planes // 16, in_planes, 1, bias=False)

        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = self.fc2(self.relu1(self.fc1(self.avg_pool(x))))
        max_out = self.fc2(self.relu1(self.fc1(self.max_pool(x))))
        out = avg_out + max_out
        return self.sigmoid(out)

channel attention: $C\times W $ -> pool


class SpatialAttention(nn.Module):
    def __init__(self, kernel_size=7):
        super(SpatialAttention, self).__init__()

        assert kernel_size in (3, 7), 'kernel size must be 3 or 7'
        padding = 3 if kernel_size == 7 else 1

        self.conv1 = nn.Conv2d(2, 1, kernel_size, padding=padding, bias=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = torch.mean(x, dim=1, keepdim=True)
        max_out, _ = torch.max(x, dim=1, keepdim=True)
        x = torch.cat([avg_out, max_out], dim=1)
        x = self.conv1(x)
        return self.sigmoid(x)

For channels, use torch.mean or torch.max to implement the avg or pool for channel info.

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值