行为识别论文笔记|TSN|Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition, Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool, ECCV 2016, Amsterdam, Netherlands.
Temporal Segment Networks for Action Recognition in Videos, Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool, TPAMI, 2018.
Motivations
- Modeling long-range temporal structure is crucial for human activity recognition.
- Frames in a video are highly redundant.
- Modeling long-range temporal structure is not simply wrapping tons of frames. Frames are dense, but contents are sparse!
- 现有CNNs只能处理较短的时间序列超大计算量,且需要太多训练样本标注成本受限所以现有很多数据集都不大
Solutions
-
别人的方法:
-
第一类:基于卷积的,Karpathy:fusion on Sports-1M、Simonyan:two-stream、Tran :C3D、Sun:FSTCN(分解3D卷积核,加速计算);还有一些论文处理更长时间的视频,用了CNN+RNN 结构,比如:
-