Video Super-Resolution with Recurrent Structure-Detail Network阅读笔记

Video Super-Resolution with Recurrent Structure-Detail Network(视频超分与循环的结构-细节网络)
文章检索出处:2020 ECCV
论文:https://arxiv.org/abs/2008.00455

 

代码:https://github.com/junpan19/RSDN


本篇笔记主要对整篇论文从头到尾进行阅读分析,如果只对模型部分有兴趣,可直接观看第四部分。


(1)摘要


        简单叙述本文特点以及提出的内容,本文提出新的循环视频超分算法,对比过去借助时间滑动窗口中的相邻帧然后超分单个参考帧,以及基于循环的算法而言,有效又高效。并在结尾给出了代码。


(2)引言


        用单帧图像以及多帧图像的处理引出视频超分。视频超分可以简单分为显示方式以及隐式方式,显示方式利用运动估计以及运动补偿,对输入帧进行扭曲对齐,利用观测值重建高分辨率目标帧。隐式方式在于利用运动信息进行灵活的运动补偿,避免的运动估计步骤,主要分为动态上采样滤波器以及渐进式融合残差模块。
        本文中的算法,将每一帧都分为结构(structure)和细节(detail)分量,并且根据前一步得到的结构(structure)和细节(detail)信息来超分目标帧。并且在网络中的隐藏状态会随着时间的推移捕捉场景的不同典型外观,通过计算参考帧和隐藏状态下的每个通道的相关性,可以抑制过时信息并且凸显有用信息,对融合信息更具有鲁棒性。


(3)相关工作


        单图像超分方法,(关系不大,不多看了)
        视频超分方法,介绍了显示运动补偿以及隐式运动补偿的原理以及方法。
        本文使用的循环神经网络的介绍


(4)本文方法介绍

本文提出循环网络的总体流程,如下图,虽然是类似于递归/循环神经网络,但是只是利用了将前一刻数据的输出作为下一次的输入。

解释图中各个符号的意义:

                 :t帧的低分辨率图像              :t帧的隐式信息

                   :t帧的Structure分量 (包含的是图像中的低频信息和帧间运动)               

                  :t帧的Detail分量 (包含的是高频信息和外观上的细微变化)

### Transformer Model for Single Image Super Resolution In the context of single image super-resolution (SISR), transformer models have emerged as powerful tools due to their ability to capture long-range dependencies and global contexts within images. A key contribution in this area involves a texture transformer that includes four tightly coupled modules specifically designed for SISR tasks[^2]. This architecture leverages multi-head self-attention mechanisms inherent in transformers, which allow each position in an input sequence to attend over all positions in previous layers. The core components of such a model typically include: #### Encoder Structure An encoder generates feature representations based on attention mechanisms, enabling these features to locate specific information from the entire context globally. For instance, given an input low-resolution image \( I_{LR} \): ```python class Encoder(nn.Module): def __init__(self, num_heads=8, d_model=512, dropout_rate=0.1): super(Encoder, self).__init__() self.self_attention = nn.MultiheadAttention(d_model, num_heads) self.feed_forward = FeedForwardNetwork() def forward(self, x): attn_output, _ = self.self_attention(x, x, x) output = self.feed_forward(attn_output) return output ``` This structure facilitates capturing detailed patterns across different regions of the image while maintaining spatial relationships between pixels. #### Decoder Architecture Decoders retrieve useful information from encoded high-level abstractions produced by encoders. In SISR applications, decoders reconstruct higher resolution versions of inputs using learned mappings derived during training phases. A notable aspect is how cross-scale feature integration modules enable stacking multiple instances of texture transformers, thereby enhancing representational power through deeper architectures without sacrificing performance or introducing excessive computational overheads. #### Attention Mechanism Transformers rely heavily on self-attention layers where every element interacts with others at once rather than sequentially like recurrent neural networks do. Such interactions help identify important parts contributing most significantly towards generating sharper details when upscaling lower quality visuals into finer ones. For example, consider applying positional encoding before feeding data points into subsequent processing stages; this step ensures relative distances among elements remain preserved throughout transformations applied inside network pipelines. ```python def get_positional_encoding(max_len, embed_size): pe = torch.zeros(max_len, embed_size) position = torch.arange(0., max_len).unsqueeze(1) div_term = torch.exp(torch.arange(0., embed_size, 2) * -(math.log(10000.) / embed_size)) pe[:, 0::2] = torch.sin(position * div_term) pe[:, 1::2] = torch.cos(position * div_term) return pe.unsqueeze(0) positional_encodings = get_positional_encoding(image_height*width, embedding_dim) input_tensor += positional_encodings.to(device) ``` By integrating advanced techniques mentioned above along with traditional convolution operations commonly found in CNN-based approaches, modern transformer-driven solutions achieve superior results compared to earlier methods used exclusively for solving problems related to SISR.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值