- 论文全称:Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network
来自CVPR2016 - The downsampling operation is deterministic and known: to produce ILR from IHR, we first convolve IHR using a Gaussian filter - thus simulating the camera’s point spread function - then downsample the image by a factor of r
产生模糊的图像不是简单的4个点取一个,而是先用高斯滤波器来卷积,再下采样,这样下采样到的像素能包括其周围像素点的信息 - The deconvolution layer proposed in [50] can be seen as multiplication of each input pixel by a filter element-wise with stride r, and sums over the resulting output windows also known as backwards convolution
deconv层可以认为是每个LR像素乘以同一个卷积核得到HR图像,重叠的地方相加 - The other way to upscale a LR image is convolution with fractional stride of 1 r in the LR space as mentioned by [24], which can be naively implemented by interpolation, perforate [27] or un-pooling [49] from LR space to HR space followed by a convolution with a stride of 1 in HR space. These implementations increase the computational cost by a factor of r2, since convolution happens in HR space
还要其它upscale的方式比如interp或up-pooling(perforate),得到HRsize的图片,再卷积 - 而文中选择的upsample的方式比较特殊:
先conv成r平方个channel的map,然后做位置变化,把每个location 的 r平方个channel的像素摆到一个r*r的正方形里面,变成一张HR,而且conv后是没有non-linear层的,在训练的时候采取相反的方式,把ground-truth的HR分割成r*r的网格然后变化位置变成r平方个channel的3-d map,计算per-pixel loss,会快很多
ESPCN
最新推荐文章于 2024-12-19 22:42:33 发布