Denoising review:
The trials made can be found on Github.
Assessment indexes:
PSNR: peak signal to noise ratio
P
S
N
R
=
10
∗
l
o
g
10
(
(
2
n
−
1
)
2
M
S
E
)
=
20
∗
l
o
g
10
(
M
A
X
M
S
E
)
PSNR = 10*log_{10}(\frac{(2^n-1)^2}{MSE})=20*log_{10}(\frac{MAX}{\sqrt{MSE}})
PSNR=10∗log10(MSE(2n−1)2)=20∗log10(MSEMAX)
注意MSE是均方误差(已平均化),MAX表示图像颜色的最大值,一般8位图表示255
PSNR单位是dB.
SSIM: structural similarity index
L
(
X
,
Y
)
=
2
u
X
u
Y
+
C
1
u
X
2
+
u
Y
2
+
C
1
,
C
(
X
,
Y
)
=
2
σ
X
σ
Y
+
C
2
σ
X
2
+
σ
Y
2
+
C
2
,
S
(
X
,
Y
)
=
σ
X
Y
+
C
3
σ
X
σ
Y
+
C
3
L(X,Y) = \frac{2u_Xu_Y+C_1}{u_X^2+u_Y^2+C_1}, C(X,Y) = \frac{2\sigma_X\sigma_Y+C_2}{\sigma_X^2+\sigma_Y^2+C_2},S(X,Y)=\frac{\sigma_{XY}+C_3}{\sigma_X\sigma_Y+C_3}
L(X,Y)=uX2+uY2+C12uXuY+C1,C(X,Y)=σX2+σY2+C22σXσY+C2,S(X,Y)=σXσY+C3σXY+C3
where
C
1
=
(
K
1
∗
L
)
2
,
C
2
=
(
K
2
∗
L
)
2
,
C
3
=
C
2
/
2
C1 = (K1*L)^2,C2=(K2*L)^2,C3=C2/2
C1=(K1∗L)2,C2=(K2∗L)2,C3=C2/2, and generally
K
1
=
0.01
,
K
2
=
0.03
,
l
=
255
K1=0.01,K2=0.03, l=255
K1=0.01,K2=0.03,l=255.
S S I M = L ∗ C ∗ S SSIM = L*C*S SSIM=L∗C∗S.
Note some basic np operations:
np.contatenate((a,b), dims=1)
np.reshape(a,newshape=(2,2,2,2))
np.split(a,4,0)
np.vstack
np.hstack, etc.
parameters & memory calculation
Note:
Total GPU memory = memory for model & layer outputs
In training time: forward , backward X2
e.g. VGG16 , memory of params: 528MB, memory of layers: 58.12MB/image
when training:
SGD+momentum, batchsize = 128
memory for model:
-
528MB*3 = 1.54 GB
1 for params, 1 for SGD, 1 for momemtum
If use Adam, need to x4
-
Memory for outputs:
128*58.12MB *2 = 14.53GB
-
Total
14.53+1.54 = 16.07GB
FLOPS:
Conv = para*H*W
Linear = para
TFlops/s,可以简单写为T/s, 是数据流量的计数单位,意思是”1万亿次浮点指令每秒”,它是衡量一个电脑计算能力的标准。1TFlops=1024GFlops,即1T=1024G。
a useful tool from torchstat import stat
model
1. Residual Dense Network for Image Super-Resolution paper

Here, LR represents Low Resolution Images.
upscale:
sub-pixel convolution
the best position of CBAM is after the up-sampling layer. up 0.0x dB, if patch-size up, perf down.
GRDN:
![[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-QKeMc3EO-1585630012708)(C:\Users\Knxie\AppData\Roaming\Typora\typora-user-images\1575102441452.png)]](https://i-blog.csdnimg.cn/blog_migrate/658f218c635e347ace2d7bfa9b3a38b9.png#pic_center)
… yeah, concatenate, concatenate, concatenate.
CBAM: paper


The channel attention is computed as :
M
C
(
F
)
=
σ
(
M
L
P
(
A
v
g
P
o
o
l
(
F
)
)
+
M
L
P
(
M
a
x
P
o
o
l
(
F
)
)
)
M_C(F) = \sigma(MLP(AvgPool(F))+MLP(MaxPool(F)))
MC(F)=σ(MLP(AvgPool(F))+MLP(MaxPool(F)))
Where
σ
\sigma
σ represents activation function, e.g. ReLU.
The spatial attention is computed as:
M
s
(
F
)
=
σ
(
f
7
∗
7
(
[
A
v
g
P
o
o
l
(
F
)
;
M
a
x
P
o
o
l
(
F
)
]
)
)
=
σ
(
f
7
∗
7
(
[
F
a
v
g
s
;
F
m
a
x
s
]
)
)
M_s(F) = \sigma(f^{7*7}([AvgPool(F);MaxPool(F)]))\\ = \sigma(f^{7*7}([F_{avg}^s;F_{max}^s]))
Ms(F)=σ(f7∗7([AvgPool(F);MaxPool(F)]))=σ(f7∗7([Favgs;Fmaxs]))
where,
F
a
v
g
s
,
F
m
a
x
s
∈
R
1
×
H
×
W
F_{avg}^s, F^{s}_{max} \in \mathbb{R}^{1\times H \times W }
Favgs,Fmaxs∈R1×H×W
class ChannelAttention(nn.Module):
def __init__(self, in_planes, ratio=16):
super(ChannelAttention, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.max_pool = nn.AdaptiveMaxPool2d(1)
self.fc1 = nn.Conv2d(in_planes, in_planes // 16, 1, bias=False)
self.relu1 = nn.ReLU()
self.fc2 = nn.Conv2d(in_planes // 16, in_planes, 1, bias=False)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
avg_out = self.fc2(self.relu1(self.fc1(self.avg_pool(x))))
max_out = self.fc2(self.relu1(self.fc1(self.max_pool(x))))
out = avg_out + max_out
return self.sigmoid(out)
channel attention: $C\times W $ -> pool
class SpatialAttention(nn.Module):
def __init__(self, kernel_size=7):
super(SpatialAttention, self).__init__()
assert kernel_size in (3, 7), 'kernel size must be 3 or 7'
padding = 3 if kernel_size == 7 else 1
self.conv1 = nn.Conv2d(2, 1, kernel_size, padding=padding, bias=False)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
avg_out = torch.mean(x, dim=1, keepdim=True)
max_out, _ = torch.max(x, dim=1, keepdim=True)
x = torch.cat([avg_out, max_out], dim=1)
x = self.conv1(x)
return self.sigmoid(x)
For channels, use torch.mean or torch.max to implement the avg or pool for channel info.

本文回顾了图像去噪技术,重点关注Residual Dense Network在超分辨率中的应用。评估指标包括PSNR和SSIM,参数及内存计算涉及模型、层输出在训练时的内存占用,例如VGG16。此外,探讨了CBAM模块的位置对其性能的影响,并介绍了FLOPS作为计算能力的度量。文章提到了sub-pixel convolution和GRDN技术,并详细解释了CBAM的通道和空间注意力机制。
182

被折叠的 条评论
为什么被折叠?



