视频目标检测--Flow-Guided Feature Aggregation for Video Object Detection

最新推荐文章于 2025-09-27 20:41:28 发布

原创最新推荐文章于 2025-09-27 20:41:28 发布 · 5.1k 阅读

13 ·

CC 4.0 BY-SA版权

目标检测专栏收录该内容

59 篇文章

订阅专栏

提出一种视频目标检测框架，通过前后帧特征信息增强单帧检测精度。利用光流场实现特征图的空间配准，并采用自适应权重融合多帧特征。

Flow-Guided Feature Aggregation for Video Object Detection
https://arxiv.org/abs/1703.10025
Our framework is principled, and on par with the best engineered systems winning the ImageNet VID challenges 2016

The code would be released

本文主要利用视频中前后帧的特征信息来提高当前帧的目标检测精度。
we propose to improve the per-frame feature learning by temporal aggregation

为什么需要前后帧信息了，因为视频中有时候每一帧的目标信息不是适合于检测
这里写图片描述

Note that the features of the same object instance are usually not spatially aligned across frames due to video motion.
前后帧的特征信息怎么对应起来了？两个模块：1）基于运动指导的 spatial warping 2）特征融合模块
Two modules are necessary for such feature propagation and enhancement:
1) motion-guided spatial warping. It estimates the motion between frames and warps the feature maps accordingly.
2) feature aggregation module. It figures out how to properly fuse the features from multiple frames.

这里写图片描述