行为识别论文笔记-ARTNet-Appearance-and-Relation Networks for Video Classification
Wang, Limin, et al. “Appearance-and-relation networks for video classification.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
Motivation
3 kinds of architectures for video classification: (1) two-stream CNNs (time-consuming, optical flow in advance) (2) 3D CNNs (worse than two stream) and (3) 2D CNNs with temporal models on top such as LSTM, temporal convolution, sparse sampling and aggregation, and attention modeling. (worse in local spatiotemporal representation)
multiplicative interactions to model relation between different views: Gated Boltzmann machines, Energy models, Independent Subspace Analysis (ISA)(similar to Energy mod

ARTNet论文笔记介绍了Wang等人提出的一种新型视频分类架构,它结合了外观和关系分支来增强时空表示。相比两流CNN和3D CNN,ARTNet通过SMART块在减少计算消耗的同时提高准确性,尤其是对于局部特征的建模。实验表明,它在Kinetics训练和UCF101、HMDB测试集上表现良好,但可能在时序建模效率上不如3D CNN,并且没有使用残差结构可能导致深层网络时序信息减弱。
最低0.47元/天 解锁文章
5111

被折叠的 条评论
为什么被折叠?



