概览
| Dataset | Year | Actions | Videos | Video Type | SOTA |
|---|---|---|---|---|---|
| HMDB51 | ICCV2011 | 51 | 6849 | movie & web video | ≥0.82 |
| UCF101 | ICCV2013 | 101 | 13320 | web video | ≥0.98 |
| Sports-1M | ICCV2014 | 487 | 1,000,000 | web video | ≥0.91 |
| ActivityNet | ICCV2015 | 200 | 20,000 | web video | ≥0.40 |
| Charades | ICCV2016 | 157 | 9,848 | controled settings | mAP≥0.58 |
| Youtube8M(2019) | 2016 | 1000 | 237,000 | movie & web video | mAP≥0.83 |
| AVA | CVPR2018 | 80 | 57,600 | movie | mAP≥0.27 |
| Kinetics-400 | 2017 | 400 | 306,245 | web video | ≥0.82 |
| Something-Something V1 | 2017 | 174 | 108,499 | controled settings | ≥0.52 |
| Something-Something V2 | 2018 | 174 | 220,847 | controled settings | ≥0.67 |
| Kinetics-600 | 2018 | 600 | 495,547 | web video | ≥0.71 |
| Kinetics-700 | 2019 | 700 | 650,317 | web video | ≥0.57 |
| Epic-Kitchens | ECCV2018 | 149 | 432 | controled settings | ≥0.36 |
| Jester | ICCVW2019 | 27 | 148,092 | controled settings | ≥0.96 |
| Moments in Time | TPAMI 2019 | 339 | 1,000,000 | web video | ≥0.34 |
| Multi-Moments in Time | 2019 | 339 | 1,000,000 | web video | ≥0.59 |
按任务划分
video classifification
fullysupervised, whole-clip, forced-choice video classififiers
trim的单个动作样本,适合训练分类器
- KTH
- Weizmann
- Hollywood-2
- HMDB
- UCF101
large-scale video classifification
也是单个动作样本,规模大,通常噪声也比较大,适合做预训练用
- TrecVid MED
- Sports-1M
- YouTube-8M
- Something-something
- SLAC
- Moments in Time
- Kinetics
temporal localization
大规模的untrim的视频,一个样本里有多个动作,提供了每个动作的时间位置
前三个都是从youtube视频来的。
- ActivityNet
- THUMOS
- MultiTHUMOS
- Charades
spatio-temporal localization
提供了时间(动作发生的时间位置)和空间(物体的框)的标注,前三个和最后一个比规模小,时长段,动作是复杂动作。这种动作往往不够明确。ava规模大,同时它的标注是原子级别的动作也就是单个动词之类的。也有一些在UCF101, DALY, Hollywood2Tubes上做untrim视频的时空定位。
- CMU
- MSR Actions
- UCF Sports
- JHMDB
- AVA
3360

被折叠的 条评论
为什么被折叠?



