Two-Stream SR-CNNs for Action Recognition in Videos

最新推荐文章于 2024-09-30 18:16:23 发布

原创

最新推荐文章于 2024-09-30 18:16:23 发布 · 1.4k 阅读

·

0

·

CC 4.0 BY-SA版权

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

该博客介绍了用于视频动作识别的两流SR-CNNs框架，通过Faster R-CNN提取场景、人和物体的特征，并使用多种融合策略进行得分整合。实验在UCF101和JHMDB数据集上取得高准确率。详细内容包括框架设计、语义通道、实现细节和评估。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

paper：http://www.bmva.org/bmvc/2016/papers/paper108/index.html
code：https://github.com/yifita/action.sr_cnn
三作主页：http://wanglimin.github.io/

Two-Stream SR-CNNs for Action Recognition in Videos

dataset : UCF101 JHMDB(split 1)
accuracy: 92.6 53.77

framework

输入仍然是双流，但是将RGB和flow都经过了faster-rcnn，得到不同的区域分为了场景、人、物体三类，分别输入网络进行训练。
这里写图片描述

The inputs are first passed through standard convolutional and pooling layers.We replace the last pooling layer with a RoiPooling [2] layer, which separate features for different semantic cues into parallel fully connected layers (called channels) using bounding boxes proposed from a Faster R-CNN [18] object detector (see subsection 3.2).

每个channel都会得到独立的分数，由于有多个物体，作者采用了MIL（(Multiple Instance Learning）来结合最有用的信息。最后所有的score都通过一个fusion layer，得到最终的预测结果。

Fusion

fusion的策略，作者提出了4个：

Max fusing takes the maximum score value among all channels for each class, essentially picking the strongest channel.
Sum fusion directly adds up the scores from different channels, i.e. each channel is treated equal.
Category-wise weighted fusion (Weighted-1) combines channel scores via weighted sum, aiming to represent varied relative contribution of each channel for different classes using their corresponding weights.
As for correlationwise weighted fusion (Weighted-2), the scores of other classes are also taken into consideration

最低0.47元/天解锁文章

200万优质内容无限畅学

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。