前言
上篇介绍了mmdet如何构建dataset的大体思路。本篇则更详细的介绍mmdet如何读取数据的。
1、CustomDataset类实例化
在mmdet/datasets/custom.py中定义了类CustomDataset,大多数据集的Dataset的实现均需继承该类,比如coco数据集。不同Dataset类初始化过程大同小异,故本文以父类的实例化为出发点,首先介绍CustomDataset类的初始化配置文件和初始化过程:
# config配置文件
train=dict(
type=dataset_type,
imgset=data_root + 'ImageSets/trainval.txt',
classwise=False,
ann_file=data_root + 'FullDataSet/Annotations/',
img_prefix=data_root + 'FullDataSet/AllImages/',
pipeline=train_pipeline),
@DATASETS.register_module()
class CustomDataset(Dataset):
CLASSES = None
def __init__(self,
ann_file, # anno的路径
pipeline, # 是个字典{'loadImg','loadAnno'}
classes=None,
data_root=None, # 数据集根目录: 比如/home/dataset/coco
img_prefix='', # 图像的目录 : /images
seg_prefix=None,
proposal_file=None,
test_mode=False,
filter_empty_gt=True): # 过滤空的gt
# processing pipeline
self.pipeline = Compose(pipeline) # 数据处理管道
在实例化Dataset后,当遍历读取数据集时,会调用CustomDataset的getitem方法,故这里贴下,这里额外注意下results这个字典:
def __getitem__(self, idx):
if self.test_mode:
return self.prepare_test_img(idx)
while True:
data = self.prepare_train_img(idx) # 以train过程为例
if data is None:
idx = self._rand_another(idx)
continue
return data
def prepare_train_img(self, idx):
img_info = self.data_infos[idx]
ann_info = self.get_ann_info(idx)
results = dict(img_info=img_info, ann_info=ann_info) # result添加了两个信息
if self.proposals is not None:
results['proposals'] = self.proposals[idx]
self.pre_pipeline(results) # 又向results里面传入了新的键
return self.pipeline(results) # 将results送到管道
def pre_pipeline(self, results):
"""Prepare results dict for pipeline"""
results['img_prefix'] = self.img_prefix # 添加了图像前缀等字段
results['seg_prefix'] = self.seg_prefix
results['proposal_file'] = self.proposal_file
results['bbox_fields'] = []
results['mask_fields'] = []
results['seg_fields'] = []
这里贴下results的一个样例。
2、Pipline
在合成字典results后,CustomDaaset内部getitem方法通过self.piplien(results)送到管道。也就是官网中经常看到的那张图:
其中绿色字段表示往result里面新添加的key,橙色字段表示当前pipline可能改变key的value。以Resize为例(绿色注释写的很清楚了):
def __call__(self, results):
"""Call function to resize images, bounding boxes, masks, semantic
segmentation map.
Args:
results (dict): Result dict from loading pipeline.
Returns:
dict: Resized results, 'img_shape', 'pad_shape', 'scale_factor',
'keep_ratio' keys are added into result dict.
"""
if 'scale' not in results:
if 'scale_factor' in results:
img_shape = results['img'].shape[:2]
scale_factor = results['scale_factor'] # 往result添加scale_factor字段
assert isinstance(scale_factor, float)
results['scale'] = tuple( # 添加scale字段
[int(x * scale_factor) for x in img_shape][::-1])
else:
self._random_scale(results)
else:
assert 'scale_factor' not in results, (
"scale and scale_factor cannot be both set.")
self._resize_img(results)
self._resize_bboxes(results)
self._resize_masks(results)
self._resize_seg(results)
return results
由于前面添加的key有可能在后面用到,因此在构建自己的 Pipeline 时候一定要仔细检查你修改或者新增的字典 key 和 value,因为一旦你错误地覆盖或者修改原先字典里面的内容,代码也可能不会报错,如果出现 bug,则比较难排查。
3 、DefaultFormatBundle
&emspl我在看到这个pipline时比较蒙,因此,我单独拿出来分析下,先贴下代码:
from mmcv.parallel import DataContainer as DC # DataContainer暂时理解为一个容器即可
@PIPELINES.register_module()
class DefaultFormatBundle(object):
"""
# 就是将result中下面的key按1 --> 2的顺序打包了
- img: (1)transpose, (2)to tensor, (3)to DataContainer (stack=True)
- proposals: (1)to tensor, (2)to DataContainer
- gt_bboxes: (1)to tensor, (2)to DataContainer
- gt_bboxes_ignore: (1)to tensor, (2)to DataContainer
- gt_labels: (1)to tensor, (2)to DataContainer
- gt_masks: (1)to tensor, (2)to DataContainer (cpu_only=True)
- gt_semantic_seg: (1)unsqueeze dim-0 (2)to tensor,
(3)to DataContainer (stack=True)
"""
def __call__(self, results):
"""Call function to transform and format common fields in results.
Args:
results (dict): Result dict contains the data to convert.
Returns:
dict: The result dict contains the data that is formatted with
default bundle.
"""
if 'img' in results:
img = results['img']
# add default meta keys
results = self._add_default_meta_keys(results)
if len(img.shape) < 3:
img = np.expand_dims(img, -1)
img = np.ascontiguousarray(img.transpose(2, 0, 1))
results['img'] = DC(to_tensor(img), stack=True) # img转成tensor在放入DC
for key in ['proposals', 'gt_bboxes', 'gt_bboxes_ignore', 'gt_labels']:
if key not in results:
continue
results[key] = DC(to_tensor(results[key])) # 转成tensor在放到DC
if 'gt_masks' in results:
results['gt_masks'] = DC(results['gt_masks'], cpu_only=True)
if 'gt_semantic_seg' in results:
results['gt_semantic_seg'] = DC(
to_tensor(results['gt_semantic_seg'][None, ...]), stack=True)
return results
# 添加一些"pad_shape,scale_factor的默认字段避免无Resize的pipline导致的缺少key的问题"
def _add_default_meta_keys(self, results):
"""Add default meta keys.
"""
img = results['img']
results.setdefault('pad_shape', img.shape)
results.setdefault('scale_factor', 1.0)
num_channels = 1 if len(img.shape) < 3 else img.shape[2]
results.setdefault(
'img_norm_cfg',
dict(
mean=np.zeros(num_channels, dtype=np.float32),
std=np.ones(num_channels, dtype=np.float32),
to_rgb=False))
return results
该类简单一句话,就是将result中的某些key:比如img,gt_bbox,gt_labels转成tensor在放到一个DataContainer的容器里面,哈哈那就在介绍下DataContainer。
3.1 DataContainer类
实现在mmcv/parallel/data_container.py中, 这块请看注释。
class DataContainer:
原文翻译:
pytorch中collate打包函数是将一个batch数据在某个维度上进行堆叠,这种方式有以下缺点:
1、所有tensor必须有相同尺寸
2、类型限制比较死,仅有tensor或者numpy。
为了更能灵活处理打包tensor,设计了DataContainer,但pytorch无法打包运算DataContainer对象。
故又设计了MMDataPareel来处理运算DC中内容。
def __init__(self,
data,
stack=False,
padding_value=0,
cpu_only=False,
pad_dims=2):
self._data = data
self._cpu_only = cpu_only
self._stack = stack
self._padding_value = padding_value
assert pad_dims in [None, 1, 2, 3]
self._pad_dims = pad_dims
4、Collate
@PIPELINES.register_module()
class Collect(object):
def __init__(self,
keys,
meta_keys=('filename', 'ori_filename', 'ori_shape',
'img_shape', 'pad_shape', 'scale_factor', 'flip',
'flip_direction', 'img_norm_cfg')):
self.keys = keys
self.meta_keys = meta_keys
def __call__(self, results):
"""Call function to collect keys in results. The keys in ``meta_keys``
will be converted to :obj:mmcv.DataContainer.
Args:
results (dict): Result dict contains the data to collect.
Returns:
dict: The result dict contains the following keys
- keys in``self.keys``
- ``img_metas``
"""
data = {}
img_meta = {}
for key in self.meta_keys:
img_meta[key] = results[key]
data['img_metas'] = DC(img_meta, cpu_only=True) # 添加了一个img_meta的key。
for key in self.keys:
data[key] = results[key]
return data
就是添加了一个新的key: ‘img_meta’。
总结
最后在复习下这张图即可。