Videos
Due to the memory constraints, we have to do some down-sampling
Raw video: long high fps
Training: short clips with low fps
testing: run model on different clips, averaging predictions
Single-frame CNN
train normal 2D CNN to classify frames independently!
easy but a very strong baseline!
Late fusion
take the time axis into account
(flatten / average pooling) concatenate the results of CNNs and feed to MLP to get a classification score
Problem: Hard to compare low-level motion between frames
Early Fusion
compare frames with very first conv layer after that normal 2D CNN
then pass a 2D CNN to get class score
Problem: only one layer of the temporal processing may be not enough
深度学习视频分析:从单帧到时空融合的高级技术

最低0.47元/天 解锁文章
2万+

被折叠的 条评论
为什么被折叠?



