VOS论文阅读 :Kernelized Memory Network for Video Object Segmentation(2020 ECCV)

最新推荐文章于 2024-09-09 14:13:10 发布

原创

最新推荐文章于 2024-09-09 14:13:10 发布 · 置顶 · 1.8k 阅读

9 ·

CC 4.0 BY-SA版权

[论文阅读 2020 ECCV]Kernelized Memory Network for Video Object Segmentation

Semi-Supervised. KMN
STM在VOS任务中的问题
KMN Contribution
Kernelized Memory Net
Two Stage Training
Result
- Performance
- Ablation study
Qualitative Results

没有按照全文翻译的方式对整个论文进行翻译，仅仅记录了自己在阅读论文时一些受到启发、比较重要的内容，将论文的 motivation和 contribution都很详尽地记录下来，中间有很多自己的想法，希望大家批评指正。

Semi-Supervised. KMN

在 Video object segmentation任务中根据在测试时是否使用视频序列中第一帧的精确标注的mask，将任务区分为Semi-Supervised和unsupervised。
在ICCV 2019中的论文STM(Video object segmentation using space- time memory networks)，获得了极高的性能表现，这篇KMN实际上是基于STM的改进，提出了一种预训练方法（在image inpainting和其他任务中被使用，首次使用在VOS任务上以提高对抗遮挡和边界模糊的能力）以及在测试时使用的Keinelized Memory Read。
arXiv:2007.08270
链接: arxiv论文地址.

STM在VOS任务中的问题

The solution (STM) is non-local, but the problem (VOS) is predominantly local .
STM方法是非局部的，但是在VOS任务中target的匹配通常是局部的，因为在全局中存在很多相似物体。

The memory read operation of STM has two inherent problems. First, every grid in the query frame searches the memory frames for a target object. There is only Query-to-Memory matching in the STM. Thus, when multiple objects in the query frame look like a target object, all of them can be matched with the same target object in the memory frames. Second, the non-local matching in the STM can be ineffective in VOS, because it overlooks the fact that the target object in the query should appear where it previously was in the memory frames.
STM 的存储器读取操作有两个固有的问题。首先，查询帧中的每个网格都在存储帧中搜索目标对象。 STM中只有“Query-to-Memory”匹配。因此，当查询帧中的多个对象看起来像目标对象时，所有这些对象都可以与存储框架中的相同目标对象进行匹配。其次，STM中的non-local matching在VOS中可能无效，因为它忽略了以下事实：Query中的目标对象应出现在存储帧中以前的位置。（因为受限于速度和存储空间，STM最终选择了间隔五帧将内容保存进Memory中，但是希望当前帧中target object的每个pixel都在memory中出现过，以更好的执行匹配）

如图所示，在之前VOS任务在处理DAVIS等VOS视频数据集时候提出了假设当前帧和前一帧的运动变化有限，所以VOS任务（高度移动等除外）是类似于local的，即相邻帧的target object大概在同一个区域里，在使用Mermory与Query匹配的过程中，在Memory中target object是确定的，但是Query中存在多个物体（两辆车），所以Query的全局与Mermory进行匹配，这是no-local的，并且产生了无效匹配。

KMN Contribution

Contributions of this paper can be summarized as follows. First, KMN is developed to reduce the non-locality of the STM and make the memory network more effective for VOS. Second, Hide-and-Seek is used to pre-train the KMN on static images. In the kernelized memory read of KMN. however, both Query-to-Memory matching and Memory-to-Query matching are conducted.
KMN的两个贡献：使用kernelized减少了STM的非局部性，并且使记忆网络对于VOS任务更加高效，并且使用了Hide-and-Seek策略在静态数据集上进行预训练。在KMN中同时包含了Query-to-Memory matching和Memory-to-Query matching。

最低0.47元/天解锁文章