mmaction2官方github:https://github.com/open-mmlab/mmaction2
GPU平台:https://cloud.videojj.com/auth/register?inviter=18452&activityChannel=student_invite
mmaction2的文档:https://mmaction2.readthedocs.io/zh_CN/latest/faq.html?highlight=start_index#id3
本系列的链接
00【mmaction2 行为识别商用级别】快速搭建mmaction2 pytorch 1.6.0与 pytorch 1.8.0 版本
03【mmaction2 行为识别商用级别】使用mmaction搭建faster rcnn批量检测图片输出为via格式
04【mmaction2 行为识别商用级别】slowfast检测算法使用yolov3来检测人
!!!等论文发布后公开 !!!!05【mmaction2 行为识别商用级别】slowfast 与 yolov5融合(即检测部分使用yolov5)
!!!等论文发布后公开 !!!!06【mmaction2 行为识别商用级别】slowfast 与 yolov5与deepsort融合(即追踪部分使用deepsort)
!!!等论文发布后公开 !!!!07【mmaction2 行为识别商用级别】 yolov5 采用 yolov5-crowdhuman
08【mmaction2 行为识别商用级别】自定义ava数据集 之 将视频裁剪为帧
!!!等论文发布后公开 !!!!10-1【mmaction2 行为识别商用级别】ava数据集缩小版 与 slowfast训练
!!!等论文发布后公开 !!!!10-2【mmaction2 行为识别商用级别】修改ava数据集的帧ID 分析slowfast训练的数据集如何输入的
12【mmaction2 行为识别商用级别】X3D复现 demo实现 检测自己的视频 Expanding Architecturesfor Efficient Video Recognition
AVA标注解释(AVA Annotation Explained)
来自:[Doc] AVA annotations explained:https://github.com/open-mmlab/mmaction2/pull/1097/commits/d7a61f7ed6fdd9326affe8d8bca04cd15610a931
下面是直接复制粘贴过来的内容
In this section, we explain the annotation format of AVA in details:
mmaction2
├── data
│ ├── ava
│ │ ├── annotations
│ │ | ├── ava_dense_proposals_train.FAIR.recall_93.9.pkl
│ │ | ├── ava_dense_proposals_val.FAIR.recall_93.9.pkl
│ │ | ├── ava_dense_proposals_test.FAIR.recall_93.9.pkl
│ │ | ├── ava_train_v2.1.csv
│ │ | ├── ava_val_v2.1.csv
│ │ | ├── ava_train_excluded_timestamps_v2.1.csv
│ │ | ├── ava_val_excluded_timestamps_v2.1.csv
│ │ | ├── ava_action_list_v2.1_for_activitynet_2018.pbtxt
The proposals generated by human detectors
In the annotation folder, ava_dense_proposals_[train/val/test].FAIR.recall_93.9.pkl
are human proposals generated by a human detector. They are used in training, validation and testing respectively. Take ava_dense_proposals_train.FAIR.recall_93.9.pkl
as an example. It is a dictionary of size 203626. The key consists of the videoID
and the timestamp
. For example, the key -5KQ66BBWC4,0902
means the values are the detection results for the frame at the
90
2
n
d
902_{nd}
902nd second in the video -5KQ66BBWC4
. The values in the dictionary are numpy arrays with shape
N
×
5
N \times 5
N×5 ,
N
N
N is the number of detected human bounding boxes in the corresponding frame. The format of bounding box is
[
x
1
,
y
1
,
x
2
,
y
2
,
s
c
o
r
e
]
,
0
≤
x
1
,
y
1
,
x
2
,
w
2
,
s
c
o
r
e
≤
1
[x_1, y_1, x_2, y_2, score], 0 \le x_1, y_1, x_2, w_2, score \le 1
[x1,y1,x2,y2,score],0≤x1,y1,x2,w2,score≤1.
(
x
1
,
y
1
)
(x_1, y_1)
(x1,y1) indicates the top-left corner of the bounding box,
(
x
2
,
y
2
)
(x_2, y_2)
(x2,y2) indicates the bottom-right corner of the bounding box;
(
0
,
0
)
(0, 0)
(0,0) indicates the top-left corner of the image, while
(
1
,
1
)
(1, 1)
(1,1) indicates the bottom-right corner of the image.
The ground-truth labels for spatio-temporal action detection
In the annotation folder, ava_[train/val]_v[2.1/2.2].csv
are ground-truth labels for spatio-temporal action detection, which are used during training & validation. Take ava_train_v2.1.csv
as an example, it is a csv file with 837318 lines, each line is the annotation for a human instance in one frame. For example, the first line in ava_train_v2.1.csv
is '-5KQ66BBWC4,0902,0.077,0.151,0.283,0.811,80,1'
: the first two items -5KQ66BBWC4
and 0902 indicate that it corresponds to the
90
2
n
d
902_{nd}
902nd second in the video -5KQ66BBWC4
. The next four items (
[
0.077
(
x
1
)
,
0.151
(
y
1
)
,
0.283
(
x
2
)
,
0.811
(
y
2
)
]
[0.077(x_1), 0.151(y_1), 0.283(x_2), 0.811(y_2)]
[0.077(x1),0.151(y1),0.283(x2),0.811(y2)]) indicates the location of the bounding box, the bbox format is the same as human proposals. The next item 80 is the action label. The last item 1 is the ID of this bounding box.
Excluded timestamps
ava_[train/val]_excludes_timestamps_v[2.1/2.2].csv
contains excluded timestamps which are not used during training or validation. The format is video_id, second_idx .
Label map
ava_action_list_v[2.1/2.2]_for_activitynet_[2018/2019].pbtxt
contains the label map of the AVA dataset, which maps the action name to the label index.
some issues
It provides the human proposal boxes for AVA videos, during training, we will use the proposal boxes and RoIAlign to obtain instance-level features.
The proposal box is already provided in train.csv. Is the proposal box provided by pkl file again to improve the recognition accuracy?
Not exactly. Boxes provided in CSV files are ground-truth (annotated by humans), boxes provided in pickles files are proposals (predicted by detectors).
So what is the specific role of this pkl file?
The pkl files contain the proposals, we use the proposal boxes and RoIAlign to obtain instance-level features in training and testing. We can not assume ground-truth bounding boxes, so we need to use proposal boxes for training and testing
The proposals were generated by a person detector with 93.9% recognition rate.
AVA is a spatiotemporal detection dataset centered on human behavior, the proposals are used to determine the spatial position of people.
But in ava dataset there is already a person box in csv.It is similar to the proposals.
So I’m confusing about it.
Can you explain it with specific code?It is helpful to understand the function of proposals.
The person box in ava dataset is the ground truth. we cannot use that to test accuracy.
On this spatiotemporal dataset, our task is to find out the position of the persons on each keyframe and recognize the actions they are doing.
Now many methods are to transform it into two tasks: detect the person and recognize the human behavior, the former is done by a person detector, and the latter is done by some models of video understanding.
修改ava数据集的视频帧ID
这一部分,我通过修改视频帧ID来更深入了解ava数据集的组成。
修改ava_train
下面是ava数据集的一部分
第一列:视频的名字
第二列:视频帧ID,比如15:02这一帧,表示为902,15:03这一帧表示为903(这里是我在思考的地方,自定义数据集的时候,这个902,改成 2 可以么)
第三列到第六列: 人的坐标值(x1,y1,x2,y2)
第七列:动作类别编号
第八列:人的ID
在我们自制ava数据集的时候,不会像官网那样从第900秒开始,我们一般都是从第0秒开始,所以
针对第二列的思考,我使用代码将第二列的编号全部剪去900,代码与结果如下:
代码:
import csv
minCsv2 = []
with open('./data/ava/annotations/ava_train_v2.2_mini.csv', 'r') as db01:
reader = csv.reader(db01)
for row in reader:
temp = row
temp[1] = str( int(temp[1]) - 900 )
minCsv2.append(temp)
with open('./data/ava/annotations/ava_train_v2.2_mini2.csv',"w") as csvfile:
writer = csv.writer(csvfile)
writer.writerows(minCsv2)
缩小ava_dense_proposals_train
下面是ava_dense_proposals_train.FAIR.recall_93.9.pkl部分内容
-5KQ66BBWC4,0902 [[0.003 0.125 0.119 0.837 0.742486]
[0.626 0.153 0.797 0.838 0.987177]
[0.326 0.185 0.47 0.887 0.996382]
[0.508 0.117 0.648 0.777 0.903317]
[0.222 0.031 0.362 0.529 0.983264]
[0.108 0.143 0.283 0.805 0.547549]
[0.773 0.143 0.862 0.351 0.82769 ]
[0.706 0.105 0.787 0.31 0.108642]
[0.805 0.289 0.997 0.991 0.983301]
[0.852 0.175 0.929 0.335 0.178122]]
-5KQ66BBWC4,0903 [[0.516 0.134 0.659 0.788 0.995238]
[0.628 0.163 0.781 0.84 0.996272]
[0.326 0.172 0.489 0.895 0.999214]
[0.145 0.161 0.301 0.831 0.9853 ]
[0.876 0.157 0.993 0.443 0.871956]
[0.736 0.113 0.815 0.357 0.716372]
[0.8 0.293 0.997 0.961 0.94542 ]
[0.791 0.16 0.879 0.501 0.503257]
[0.522 0.137 0.656 0.373 0.083969]
[0.838 0.13 0.995 0.574 0.090968]
[0.552 0.115 0.669 0.304 0.106534]
[0.233 0.024 0.362 0.522 0.9284 ]
[0.009 0.183 0.147 0.84 0.991572]
[0.781 0.146 0.865 0.381 0.271116]
[0.592 0.062 0.682 0.265 0.34697 ]]
-5KQ66BBWC4,0904 [[0.215 0.018 0.988 0.991 0.999776]]
-5KQ66BBWC4,0905 [[0.192 0.072 0.396 0.971 0.990042]
[0.391 0.033 0.552 0.625 0.994892]
[0.607 0.062 0.814 0.976 0.995895]
[0.852 0.079 0.998 0.892 0.950123]
[0.059 0.076 0.227 0.893 0.976499]
[0.287 0.086 0.389 0.302 0.41169 ]]
我们还是先缩小ava_dense_proposals_train.FAIR.recall_93.9.pkl,只要其中2个视频,代码如下:
#获取指定视频的pkl
import pickle
import csv
videos = ["053oq2xB3oU", "Ytga8ciKWJc"]
minPkl = {}
f = open('ava_dense_proposals_train.FAIR.recall_93.9.pkl','rb')
info = pickle.load(f, encoding='iso-8859-1')
for i in info:
name,vID = i.split(',')
if name in videos:
minPkl[i] = info[i]
with open('ava_dense_proposals_train_mini.pkl',"wb") as pklfile:
pickle.dump(minPkl, pklfile)
运行后,就可以得到只有"053oq2xB3oU", "Ytga8ciKWJc"的proposals
修改ava_dense_proposals_train
这里要修改的同样是视频帧ID,视频帧ID全部剪去900,ID从2开始,代码如下:
import pickle
minPkl = {}
with open('ava_dense_proposals_train_mini.pkl', 'rb') as f:
info = pickle.load(f, encoding='iso-8859-1')
for i in info:
#minPkl[i] = info[i]
name,vID = i.split(',')
vID = str(int(vID) - 900)
key = name + ',' + vID
minPkl[key] = info[i]
with open('ava_dense_proposals_train_mini2.pkl',"wb") as pklfile:
pickle.dump(minPkl, pklfile)
下面是部分ava_dense_proposals_train_mini2.pkl的内容
053oq2xB3oU,2 [[0.498 0.357 0.586 0.543 0.238802]
[0.495 0.229 0.864 0.792 0.876006]
[0.711 0.233 0.947 0.838 0.839101]
[0.509 0.218 0.766 0.622 0.221814]
[0.711 0.254 0.861 0.619 0.397694]
[0.009 0.125 0.657 0.887 0.993884]
[0.006 0.229 0.22 0.839 0.160506]]
053oq2xB3oU,3 [[0.811 0.234 0.98 0.856 0.948427]
[0.373 0.264 0.564 0.604 0.992532]
[0.481 0.226 0.785 0.825 0.845053]
[0.709 0.247 0.893 0.876 0.893365]
[0.332 0.203 0.893 0.896 0.300915]
[0.409 0.231 0.677 0.713 0.142233]
[0.008 0.126 0.538 0.907 0.98535 ]]
053oq2xB3oU,4 [[0.446 0.205 0.818 0.872 0.901434]
[0.016 0.13 0.504 0.875 0.783437]
[0.711 0.231 0.918 0.862 0.943851]
[0.277 0.247 0.566 0.677 0.939575]
[0.215 0.212 0.612 0.865 0.274805]
[0.367 0.238 0.692 0.798 0.168288]
[0.839 0.222 0.992 0.856 0.973397]
[0.006 0.163 0.264 0.844 0.073463]]
修改配置文件
配置文件主要修改有2部分,一部分是对标注文件的加载,一部分是增加 start_index
配置文件如下my_slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb2.py:
# model setting
model = dict(
type='FastRCNN',
backbone=dict(
type='ResNet3dSlowFast',
pretrained=None,
resample_rate=8,
speed_ratio=8,
channel_ratio=8,
slow_pathway=dict(
type='resnet3d',
depth=50,
pretrained=None,
lateral=True,
conv1_kernel=(1, 7, 7),
dilations=(1, 1, 1, 1),
conv1_stride_t=1,
pool1_stride_t=1,
inflate=(0, 0, 1, 1),
spatial_strides=(1, 2, 2, 1)),
fast_pathway=dict(
type='resnet3d',
depth=50,
pretrained=None,
lateral=False,
base_channels=8,
conv1_kernel=(5, 7, 7),
conv1_stride_t=1,
pool1_stride_t=1,
spatial_strides=(1, 2, 2, 1))),
roi_head=dict(
type='AVARoIHead',
bbox_roi_extractor=dict(
type='SingleRoIExtractor3D',
roi_layer_type='RoIAlign',
output_size=8,
with_temporal_pool=True),
bbox_head=dict(
type='BBoxHeadAVA',
in_channels=2304,
num_classes=81,
multilabel=True,
dropout_ratio=0.5)),
train_cfg=dict(
rcnn=dict(
assigner=dict(
type='MaxIoUAssignerAVA',
pos_iou_thr=0.9,
neg_iou_thr=0.9,
min_pos_iou=0.9),
sampler=dict(
type='RandomSampler',
num=32,
pos_fraction=1,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=1.0,
debug=False)),
test_cfg=dict(rcnn=dict(action_thr=0.002)))
dataset_type = 'AVADataset'
data_root = 'data/ava/rawframes'
anno_root = 'data/ava/annotations'
#ann_file_train = f'{anno_root}/ava_train_v2.1.csv'
ann_file_train = f'{anno_root}/ava_train_v2.2_mini2.csv'
#ann_file_val = f'{anno_root}/ava_val_v2.1.csv'
ann_file_val = f'{anno_root}/ava_train_v2.2_mini2.csv'
#exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.1.csv'
#exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.1.csv'
exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.2.csv'
exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.2.csv'
#label_file = f'{anno_root}/ava_action_list_v2.1_for_activitynet_2018.pbtxt'
label_file = f'{anno_root}/ava_action_list_v2.2_for_activitynet_2019.pbtxt'
#proposal_file_train = (f'{anno_root}/ava_dense_proposals_train.FAIR.'
# 'recall_93.9.pkl')
proposal_file_train = (f'{anno_root}/ava_dense_proposals_train_mini2.pkl')
#proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl'
proposal_file_val = f'{anno_root}/ava_dense_proposals_train_mini2.pkl'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)
train_pipeline = [
dict(type='SampleAVAFrames', clip_len=32, frame_interval=2),
dict(type='RawFrameDecode'),
dict(type='RandomRescale', scale_range=(256, 320)),
dict(type='RandomCrop', size=256),
dict(type='Flip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCTHW', collapse=True),
# Rename is needed to use mmdet detectors
dict(type='Rename', mapping=dict(imgs='img')),
dict(type='ToTensor', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']),
dict(
type='ToDataContainer',
fields=[
dict(key=['proposals', 'gt_bboxes', 'gt_labels'], stack=False)
]),
dict(
type='Collect',
keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'],
meta_keys=['scores', 'entity_ids'])
]
# The testing is w/o. any cropping / flipping
val_pipeline = [
dict(type='SampleAVAFrames', clip_len=32, frame_interval=2),
dict(type='RawFrameDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCTHW', collapse=True),
# Rename is needed to use mmdet detectors
dict(type='Rename', mapping=dict(imgs='img')),
dict(type='ToTensor', keys=['img', 'proposals']),
dict(type='ToDataContainer', fields=[dict(key='proposals', stack=False)]),
dict(
type='Collect',
keys=['img', 'proposals'],
meta_keys=['scores', 'img_shape'],
nested=True)
]
data = dict(
#videos_per_gpu=9,
#workers_per_gpu=2,
videos_per_gpu=5,
workers_per_gpu=2,
val_dataloader=dict(videos_per_gpu=1),
test_dataloader=dict(videos_per_gpu=1),
train=dict(
type=dataset_type,
ann_file=ann_file_train,
exclude_file=exclude_file_train,
pipeline=train_pipeline,
label_file=label_file,
proposal_file=proposal_file_train,
person_det_score_thr=0.9,
data_prefix=data_root,
start_index=1,),
val=dict(
type=dataset_type,
ann_file=ann_file_val,
exclude_file=exclude_file_val,
pipeline=val_pipeline,
label_file=label_file,
proposal_file=proposal_file_val,
person_det_score_thr=0.9,
data_prefix=data_root,
start_index=1,))
data['test'] = data['val']
optimizer = dict(type='SGD', lr=0.1125, momentum=0.9, weight_decay=0.00001)
# this lr is used for 8 gpus
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
# learning policy
lr_config = dict(
policy='step',
step=[10, 15],
warmup='linear',
warmup_by_epoch=True,
warmup_iters=5,
warmup_ratio=0.1)
total_epochs = 20
checkpoint_config = dict(interval=1)
workflow = [('train', 1)]
evaluation = dict(interval=1, save_best='mAP@0.5IOU')
log_config = dict(
interval=20, hooks=[
dict(type='TextLoggerHook'),
])
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = ('./work_dirs/ava/'
'slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb')
load_from = ('https://download.openmmlab.com/mmaction/recognition/slowfast/'
'slowfast_r50_4x16x1_256e_kinetics400_rgb/'
'slowfast_r50_4x16x1_256e_kinetics400_rgb_20200704-bcde7ed7.pth')
resume_from = None
find_unused_parameters = False
训练
命令如下:
python tools/train.py configs/detection/ava/my_slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb2.py --validate
然后我们再试试可以正常训练否
修改训练的配置文件
配置文件:
# model setting
model = dict(
type='FastRCNN',
backbone=dict(
type='ResNet3dSlowFast',
pretrained=None,
resample_rate=8,
speed_ratio=8,
channel_ratio=8,
slow_pathway=dict(
type='resnet3d',
depth=50,
pretrained=None,
lateral=True,
conv1_kernel=(1, 7, 7),
dilations=(1, 1, 1, 1),
conv1_stride_t=1,
pool1_stride_t=1,
inflate=(0, 0, 1, 1),
spatial_strides=(1, 2, 2, 1)),
fast_pathway=dict(
type='resnet3d',
depth=50,
pretrained=None,
lateral=False,
base_channels=8,
conv1_kernel=(5, 7, 7),
conv1_stride_t=1,
pool1_stride_t=1,
spatial_strides=(1, 2, 2, 1))),
roi_head=dict(
type='AVARoIHead',
bbox_roi_extractor=dict(
type='SingleRoIExtractor3D',
roi_layer_type='RoIAlign',
output_size=8,
with_temporal_pool=True),
bbox_head=dict(
type='BBoxHeadAVA',
in_channels=2304,
num_classes=81,
multilabel=True,
dropout_ratio=0.5)),
train_cfg=dict(
rcnn=dict(
assigner=dict(
type='MaxIoUAssignerAVA',
pos_iou_thr=0.9,
neg_iou_thr=0.9,
min_pos_iou=0.9),
sampler=dict(
type='RandomSampler',
num=32,
pos_fraction=1,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=1.0,
debug=False)),
test_cfg=dict(rcnn=dict(action_thr=0.002)))
dataset_type = 'AVADataset'
data_root = 'data/ava/rawframes'
anno_root = 'data/ava/annotations'
#ann_file_train = f'{anno_root}/ava_train_v2.1.csv'
ann_file_train = f'{anno_root}/ava_train_v2.2_mini2.csv'
#ann_file_val = f'{anno_root}/ava_val_v2.1.csv'
ann_file_val = f'{anno_root}/ava_train_v2.2_mini2.csv'
#exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.1.csv'
#exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.1.csv'
exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.2.csv'
exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.2.csv'
#label_file = f'{anno_root}/ava_action_list_v2.1_for_activitynet_2018.pbtxt'
label_file = f'{anno_root}/ava_action_list_v2.2_for_activitynet_2019.pbtxt'
proposal_file_train = (f'{anno_root}/ava_dense_proposals_train.FAIR.'
'recall_93.9.pkl')
proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)
train_pipeline = [
dict(type='SampleAVAFrames', clip_len=32, frame_interval=2),
dict(type='RawFrameDecode'),
dict(type='RandomRescale', scale_range=(256, 320)),
dict(type='RandomCrop', size=256),
dict(type='Flip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCTHW', collapse=True),
# Rename is needed to use mmdet detectors
dict(type='Rename', mapping=dict(imgs='img')),
dict(type='ToTensor', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']),
dict(
type='ToDataContainer',
fields=[
dict(key=['proposals', 'gt_bboxes', 'gt_labels'], stack=False)
]),
dict(
type='Collect',
keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'],
meta_keys=['scores', 'entity_ids'])
]
# The testing is w/o. any cropping / flipping
val_pipeline = [
dict(type='SampleAVAFrames', clip_len=32, frame_interval=2),
dict(type='RawFrameDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCTHW', collapse=True),
# Rename is needed to use mmdet detectors
dict(type='Rename', mapping=dict(imgs='img')),
dict(type='ToTensor', keys=['img', 'proposals']),
dict(type='ToDataContainer', fields=[dict(key='proposals', stack=False)]),
dict(
type='Collect',
keys=['img', 'proposals'],
meta_keys=['scores', 'img_shape'],
nested=True)
]
data = dict(
#videos_per_gpu=9,
#workers_per_gpu=2,
videos_per_gpu=5,
workers_per_gpu=2,
val_dataloader=dict(videos_per_gpu=1),
test_dataloader=dict(videos_per_gpu=1),
train=dict(
type=dataset_type,
ann_file=ann_file_train,
exclude_file=exclude_file_train,
pipeline=train_pipeline,
label_file=label_file,
proposal_file=proposal_file_train,
person_det_score_thr=0.9,
data_prefix=data_root),
val=dict(
type=dataset_type,
ann_file=ann_file_val,
exclude_file=exclude_file_val,
pipeline=val_pipeline,
label_file=label_file,
proposal_file=proposal_file_val,
person_det_score_thr=0.9,
data_prefix=data_root))
data['test'] = data['val']
optimizer = dict(type='SGD', lr=0.1125, momentum=0.9, weight_decay=0.00001)
# this lr is used for 8 gpus
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
# learning policy
lr_config = dict(
policy='step',
step=[10, 15],
warmup='linear',
warmup_by_epoch=True,
warmup_iters=5,
warmup_ratio=0.1)
total_epochs = 20
checkpoint_config = dict(interval=1)
workflow = [('train', 1)]
evaluation = dict(interval=1, save_best='mAP@0.5IOU')
log_config = dict(
interval=20, hooks=[
dict(type='TextLoggerHook'),
])
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = ('./work_dirs/ava/'
'slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb')
load_from = ('https://download.openmmlab.com/mmaction/recognition/slowfast/'
'slowfast_r50_4x16x1_256e_kinetics400_rgb/'
'slowfast_r50_4x16x1_256e_kinetics400_rgb_20200704-bcde7ed7.pth')
resume_from = None
find_unused_parameters = False
训练命令:
python tools/train.py configs/detection/ava/my_slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb2.py --validate
我实现过程出现的问题与解决(可忽略)
但是发现又出现了以下的问题(查了网上的结果,说是数据集标注与图片没有匹配)
我猜测出错的问题在于三个文件的问题:
ava_dense_proposals_test.FAIR.recall_93.9.pkl
ava_dense_proposals_train.FAIR.recall_93.9.pkl
ava_dense_proposals_val.FAIR.recall_93.9.pkl
于是我把其打印出来发现,这是person box,其中也包含视频帧的编号与人的坐标,所以也要将其缩小与视频帧编号从2开始
结果运行后还是出现了上述的错误,跟踪后发现:
在:/home/mmaction2/mmaction/datasets/pipelines/loading.py
打印出的结果是:
results {'frame_dir': '/home/mmaction2/data/ava/rawframes/053oq2xB3oU', 'video_id': '053oq2xB3oU', 'timestamp': 319, 'img_key': '053oq2xB3oU,0319', 'shot_info': (0, 27000), 'fps': 30, 'filename_tmpl': 'img_{:05}.jpg', 'modality': 'RGB', 'start_index': 0, 'timestamp_start': 900, 'timestamp_end': 1800, 'proposals': array([[0, 0, 1, 1]]), 'scores': array([1]), 'gt_bboxes': array([[0.024, 0.167, 0.702, 0.861]]), 'gt_labels': array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0.]], dtype=float32), 'entity_ids': array([393]), 'frame_inds': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), 'clip_len': 32, 'frame_interval': 2, 'num_clips': 1, 'crop_quadruple': array([0., 0., 1., 1.], dtype=float32)}
其中最关键的是:‘timestamp_start’: 900, ‘timestamp_end’: 1800,我们需要将其改为:‘timestamp_start’: 0, ‘timestamp_end’: 900
我们倒着追踪代码,发现 results 在 /home/mmaction2/mmaction/datasets/ava_dataset.py 这里定义:
但是这里并没有直接给出结果,只是一个传递的参数:self.start_index
我们再继续往前追踪:home/mmaction2/mmaction/datasets/base.py ,但是这里并没有出现直接的答案,只是一个线索
我尝试了很多种跟踪,估计是自己水平菜了,没成功溯源到这个start_index的值从哪里传过来的,通过观察,发现 /home/mmaction2/mmaction/datasets/ava_dataset.py 也存在这个start_index:
但是还是没有解决start_index从哪里来的问题,于是我查找到mmaction2的文档:
https://mmaction2.readthedocs.io/zh_CN/latest/faq.html?highlight=start_index#id3
文档里面说:
FileNotFound 如 No such file or directory: xxx/xxx/img_00300.jpg
在 MMAction2 中,对于帧数据集,start_index 的默认值为 1,而对于视频数据集, start_index 的默认值为 0。 如果 FileNotFound 错误发生于视频的第一帧或最后一帧,则需根据视频首帧(即 xxx_00000.jpg 或 xxx_00001.jpg)的偏移量,修改配置文件中数据处理流水线的 start_index 值。
所以,我尝试在config里做操作,位置在:/home/mmaction2/configs/detection/ava/my_slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb2.py,
我加了个 start_index=1在下面
然后再运行训练代码就没问题了。
这三个文件在mmaction2的ava部分的Step 5. Fetch Proposal Files:https://github.com/open-mmlab/mmaction2/blob/master/tools/data/ava/README.md
fetch_ava_proposals.sh:
#!/usr/bin/env bash
set -e
DATA_DIR="../../../data/ava/annotations"
wget https://download.openmmlab.com/mmaction/dataset/ava/ava_dense_proposals_train.FAIR.recall_93.9.pkl -P ${DATA_DIR}
wget https://download.openmmlab.com/mmaction/dataset/ava/ava_dense_proposals_val.FAIR.recall_93.9.pkl -P ${DATA_DIR}
wget https://download.openmmlab.com/mmaction/dataset/ava/ava_dense_proposals_test.FAIR.recall_93.9.pkl -P ${DATA_DIR}
所以我们需要知道这三个文件如何得到的,就需要到:Long-Term Feature Banks寻找答案
我将单独用一个板块来摸索Long-Term Feature Banks:
xxxxxxxxx
下面的链接是关于三个文件作用的描述:
https://github.com/open-mmlab/mmaction2/issues/1336
https://github.com/open-mmlab/mmaction2/issues/729