【Faster rcnn】【input-data】【layer.py解析】固定输入尺寸

最新推荐文章于 2025-02-16 17:02:07 发布

Amor_tila

最新推荐文章于 2025-02-16 17:02:07 发布

阅读量2.5k

点赞数

分类专栏：代码解读

代码解读专栏收录该内容

17 篇文章

订阅专栏

本文详细介绍了Fast R-CNN中的数据层(RoIDataLayer)工作原理，包括其内部结构、配置参数、数据处理流程及如何实现图像数据的批处理(minibatch)。涵盖了从数据读取、预处理到传递给网络的全过程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

转载

http://blog.youkuaiyun.com/u010668907/article/details/51946021

http://m.blog.youkuaiyun.com/u010668907/article/details/51945844

http://blog.youkuaiyun.com/iamzhangzhuping/article/details/51355497

博客一

1.初始数据通过imdb类的操作放在它的属性roidb里。

2.roidb只是一个字典，可以拿出来当做一个单独的字典，脱离imdb。

3.roi_data_layer下的layer就是input-data。Forward中加载数据并控制一次一张图片

的数据进入网络。送到rpn-data中三组数据：

gt_boxes ：大小（一张图片xml中box个数, 5）；一张图中box的坐标以及类别

data ：大小（1,3,高,宽）；一张图的数据

im_info：大小（1, 3）；（高, 宽, 下面提到的比例）

图片的大小与原图不同，每张图的高或宽被rescale成600，另一边会按照相同的比例rescale

4.`AnchorTargetLayer就是rpn-data.计算anchors,以及anchors是否合理（大小，overlap），并根据每个anchor与gt_box的重叠度判断labels；anchors大小是卷积网络过来数据的高宽再乘9个（即，一个点有9个）.最后产生四组数据（设k=len(anchors)）：

labels：大小（k, 1）；前景=1，背景=0，否则=-1

rpn_bbox_targets: 大小(k, 4)

bbox_inside_weights: 大小（k, 4）;有前景=1，否则为0

bbox_outside_weights: 大小(k, 4); 有前景或背景=1/（前景+背景），否则为0

：

博客二：

3.1 setup在caffe.SGDSolver时调用；setup的top（list猜测是c++的vector）的每个项是caffe._caffe.Blob

（猜测，输出的Top shape就是上面的top,在setup中被shape；top[0],1 3 [600] 1000;top[1],1 3;top[2], 1 4)（疑问，在forward中blob的数据shape被重置，有时大小甚至会不定）

3.2 name_to_top: {'gt_boxes': 2, 'data': 0, 'im_info': 1}字典的value值是top的对应索引

3.3 solver.step(1)会调用layer的reshape、forward

3.4 self._perm：把roidb的索引打乱，造成图片的shuffle，打乱的索引存储的地方

3.5 cfg.TRAIN.IMS_PER_BATCH：（猜测，每次取图片的数量）

3.6 self._cur：相当于一个指向_perm的指针，每次取走图片后，他会跟着变化

3.7 db_inds：本次取得图片的索引

3.8 def _get_next_minibatch_inds(self)：取得本次图片的索引，即db_inds

3.9 minibatch_db：本次的roidb

3.10 _num_classes：网络里的类别数值21

3.11 forward（）：得到blob并处理放进top

solver.step(1)-》reshape-》forward-》_get_next_minbatch-》_get_next_minbatch_inds-》(前面在layers里,现在进入minibatch组建真正的blob)get_minibatch

博客二：

def setup(self, bottom, top)方法：该方法主要是在创建RoIDataLayer的时候调用，初始化self._name_to_top_map（从blobname 到 blobid的一个映射）。结合_caffe.cpp里面.def("setup", &Layer<Dtype>::LayerSetUp)，个人认为，setup(self, bottom, top)应该还是调用底层的Layer::LayerSetUp方法，同时bottom, top也分别对应着：const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top。
回顾底层的src/Net.cpp文件中，caffe将在Creating Layer，AppendTob 和 AppendBottom完成之后，再调用Layer::SetUp方法来 setting up layer…

def setup(self, bottom, top):
        """Setup the RoIDataLayer."""
        # print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Creating layer input-data'
        # parse the layer parameter string, which must be valid YAML
        layer_params = yaml.load(self.param_str_)

        # 解析prototxt文件中Python Layer的python_param参数
        self._num_classes = layer_params['num_classes']

        self._name_to_top_map = {}
        # print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ _name_to_top_map'
        # data blob: holds a batch of N images, each with 3 channels
        idx = 0

        # 设定top[0]即‘data’的shape，这样，即使每次迭代的minibatch中图片的shape不同，也能保证在前向传播的
        # 时候不发生错误，训练日志中输出的Top shape信息也是在这里设置的。但是每次具体的foward的时候都需要重新reshape top blobs。
        top[idx].reshape(cfg.TRAIN.IMS_PER_BATCH, 3,
            max(cfg.TRAIN.SCALES), cfg.TRAIN.MAX_SIZE)
        self._name_to_top_map['data'] = idx
        idx += 1

        # 在训练RPN的时候，cfg.TRAIN.HAS_RPN为true
        if cfg.TRAIN.HAS_RPN:
            top[idx].reshape(1, 3)
            self._name_to_top_map['im_info'] = idx
            idx += 1

            top[idx].reshape(1, 4)
            self._name_to_top_map['gt_boxes'] = idx
            idx += 1
        else: # not using RPN
            # rois blob: holds R regions of interest, each is a 5-tuple
            # (n, x1, y1, x2, y2) specifying an image batch index n and a
            # rectangle (x1, y1, x2, y2)
            top[idx].reshape(1, 5)
            self._name_to_top_map['rois'] = idx
            idx += 1

            # labels blob: R categorical labels in [0, ..., K] for K foreground
            # classes plus background
            top[idx].reshape(1)
            self._name_to_top_map['labels'] = idx
            idx += 1

            # 例如，在训练fast rcnn的时候，cfg.TRAIN.BBOX_REG
            if cfg.TRAIN.BBOX_REG:
                # bbox_targets blob: R bounding-box regression targets with 4
                # targets per class
                top[idx].reshape(1, self._num_classes * 4)
                self._name_to_top_map['bbox_targets'] = idx
                idx += 1

                # bbox_inside_weights blob: At most 4 targets per roi are active;
                # thisbinary vector sepcifies the subset of active targets
                # bbox_inside_weights blob 和 bbox_outside_weights blob 是用在SmoothL1Loss layer
                #中
                top[idx].reshape(1, self._num_classes * 4)
                self._name_to_top_map['bbox_inside_weights'] = idx
                idx += 1

                top[idx].reshape(1, self._num_classes * 4)
                self._name_to_top_map['bbox_outside_weights'] = idx
                idx += 1

        print 'RoiDataLayer: name_to_top:', self._name_to_top_map
        assert len(top) == len(self._name_to_top_map)
 
 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66

def _shuffle_roidb_inds(self): 打乱training roidb的顺序

def _shuffle_roidb_inds(self):
        """Randomly permute the training roidb."""
        if cfg.TRAIN.ASPECT_GROUPING:
       # 将roidb中长宽比近似的图像放在一起（其实也就2种情况，扁的还是竖的），有利于计算速度（具体的，还不清除）
            widths = np.array([r['width'] for r in self._roidb])
            heights = np.array([r['height'] for r in self._roidb])
            horz = (widths >= heights)
            vert = np.logical_not(horz)
            horz_inds = np.where(horz)[0]
            vert_inds = np.where(vert)[0]
            inds = np.hstack((
                np.random.permutation(horz_inds),
                np.random.permutation(vert_inds)))
            inds = np.reshape(inds, (-1, 2))
            # permutation随机打乱,而且返回的元素没有重复（np.random.choice()中replace=False）, 类似功能的函数还有np.random.choice（）
            row_perm = np.random.permutation(np.arange(inds.shape[0]))
            inds = np.reshape(inds[row_perm, :], (-1,))
            self._perm = inds
        else:
            self._perm = np.random.permutation(np.arange(len(self._roidb)))
        #当前处理的图像的索引
        self._cur = 0
 
 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

def _get_next_minibatch_inds(self) 在这个方法中，为什么要考虑“self._cur + cfg.TRAIN.IMS_PER_BATCH >= len(self._roidb)” 这是因为训练的时候要迭代好几遍整个训练集

def _get_next_minibatch_inds(self):
        """Return the roidb indices for the next minibatch."""
        if self._cur + cfg.TRAIN.IMS_PER_BATCH >= len(self._roidb):
            self._shuffle_roidb_inds()

        db_inds = self._perm[self._cur:self._cur + cfg.TRAIN.IMS_PER_BATCH]
        self._cur += cfg.TRAIN.IMS_PER_BATCH
        return db_inds

 
 1
2
3
4
5
6
7
8
9

def _get_next_minibatch(self)

def _get_next_minibatch(self):
        """Return the blobs to be used for the next minibatch.

        If cfg.TRAIN.USE_PREFETCH is True, then blobs will be computed in a
        separate process and made available through self._blob_queue.
        """
        if cfg.TRAIN.USE_PREFETCH:
            return self._blob_queue.get()
        else:
            db_inds = self._get_next_minibatch_inds()
            minibatch_db = [self._roidb[i] for i in db_inds]
            # 调用minibatch.py中的get_minibatch方法
            return get_minibatch(minibatch_db, self._num_classes)

 
 1
2
3
4
5
6
7
8
9
10
11
12
13
14

def forward(self, bottom, top): 前向传播，这个层的前向传播只需要进行拷贝就可以了，在不同的阶段下，根据各自的prototxt文件定义的网络结构来拷贝数据；
+ 有一点需要记住的是：在模板类Layer的forward函数里面，会再次调用调用Reshape()函数，也就是说，即使我们每次迭代每个minibatch里的图像（或者特征）的shape不一致，也没有关系，因为在真正调用forward_cpu / forward_gpu 之前都会重新Reshape；SetUp里面的Reshape只是设置了初始的Top blobs 的shape

def forward(self, bottom, top):
        """Get blobs and copy them into this layer's top blob vector."""
        blobs = self._get_next_minibatch()

        # 1. 对于stage1_rpn_train.pt文件中，该layer只有3个top blob：'data'、'im_info'、'gt_boxes'
        # 2. 对于stage1_fast_rcnn_train.pt文件中，该layer有6个top blob：top: 'data'、 
        #'rois'、'labels'、'bbox_targets'、'bbox_inside_weights'、'bbox_outside_weights'
        for blob_name, blob in blobs.iteritems():
            top_ind = self._name_to_top_map[blob_name]
            # Reshape net's input blobs  调用Caffe.Blob的reshape方法
            # 每次迭代forwad的时候都需要reshape，是因为每次迭代都需要去取minibatch，即
            # _get_next_minibatch， 在train fast-rcnn的时候，每个minibatch所包含的图像的data，rois， 
            # labels， bbox_targets等的具体的shape都会有所改变，所以每次迭代都需要reshape top blobs
            top[top_ind].reshape(*(blob.shape))
            # Copy data into net's input blobs
            top[top_ind].data[...] = blob.astype(np.float32, copy=False)
 
 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

def backward(self, top, propagate_down, bottom):

def backward(self, top, propagate_down, bottom):
        """This layer does not propagate gradients."""
        pass


 
 1
2
3
4
5

def reshape(self, bottom, top):

def reshape(self, bottom, top):
        """Reshaping happens during the call to forward."""
        pass
 
 1
2
3

def set_roidb(self, roidb)：主要工作：1. RoIDataLayer设置roidb，2. 打乱shuffle

def set_roidb(self, roidb):
        """Set the roidb to be used by this layer during training."""
        # self._roidb = roidb，self表示RoIDataLayer的实例对象，而非类 pascal_voc 或者 imdb的实例对象；
        # 赋值符号右侧的roidb是我们在创建imdb 或者pascal_voc实例对象时设置的，并且在新建SolverWrapper实例
        # 之后在其__init__方法中调用self.solver.net.layers[0].set_roidb(roidb) 传参而来。
        self._roidb = roidb
        self._shuffle_roidb_inds()
        if cfg.TRAIN.USE_PREFETCH:
            self._blob_queue = Queue(10)
            self._prefetch_process = BlobFetcher(self._blob_queue,
                                                 self._roidb,
                                                 self._num_classes)
            self._prefetch_process.start()
            # Terminate the child process when the parent exists
            def cleanup():
                print 'Terminating BlobFetcher'
                self._prefetch_process.terminate()
                self._prefetch_process.join()
            import atexit
            atexit.register(cleanup)

def setup(self, bottom, top):
        """Setup the RoIDataLayer."""
        # print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Creating layer input-data'
        # parse the layer parameter string, which must be valid YAML
        layer_params = yaml.load(self.param_str_)

        # 解析prototxt文件中Python Layer的python_param参数
        self._num_classes = layer_params['num_classes']

        self._name_to_top_map = {}
        # print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ _name_to_top_map'
        # data blob: holds a batch of N images, each with 3 channels
        idx = 0

        # 设定top[0]即‘data’的shape，这样，即使每次迭代的minibatch中图片的shape不同，也能保证在前向传播的
        # 时候不发生错误，训练日志中输出的Top shape信息也是在这里设置的。但是每次具体的foward的时候都需要重新reshape top blobs。
        top[idx].reshape(cfg.TRAIN.IMS_PER_BATCH, 3,
            max(cfg.TRAIN.SCALES), cfg.TRAIN.MAX_SIZE)
        self._name_to_top_map['data'] = idx
        idx += 1

        # 在训练RPN的时候，cfg.TRAIN.HAS_RPN为true
        if cfg.TRAIN.HAS_RPN:
            top[idx].reshape(1, 3)
            self._name_to_top_map['im_info'] = idx
            idx += 1

            top[idx].reshape(1, 4)
            self._name_to_top_map['gt_boxes'] = idx
            idx += 1
        else: # not using RPN
            # rois blob: holds R regions of interest, each is a 5-tuple
            # (n, x1, y1, x2, y2) specifying an image batch index n and a
            # rectangle (x1, y1, x2, y2)
            top[idx].reshape(1, 5)
            self._name_to_top_map['rois'] = idx
            idx += 1

            # labels blob: R categorical labels in [0, ..., K] for K foreground
            # classes plus background
            top[idx].reshape(1)
            self._name_to_top_map['labels'] = idx
            idx += 1

            # 例如，在训练fast rcnn的时候，cfg.TRAIN.BBOX_REG
            if cfg.TRAIN.BBOX_REG:
                # bbox_targets blob: R bounding-box regression targets with 4
                # targets per class
                top[idx].reshape(1, self._num_classes * 4)
                self._name_to_top_map['bbox_targets'] = idx
                idx += 1

                # bbox_inside_weights blob: At most 4 targets per roi are active;
                # thisbinary vector sepcifies the subset of active targets
                # bbox_inside_weights blob 和 bbox_outside_weights blob 是用在SmoothL1Loss layer
                #中
                top[idx].reshape(1, self._num_classes * 4)
                self._name_to_top_map['bbox_inside_weights'] = idx
                idx += 1

                top[idx].reshape(1, self._num_classes * 4)
                self._name_to_top_map['bbox_outside_weights'] = idx
                idx += 1

        print 'RoiDataLayer: name_to_top:', self._name_to_top_map
        assert len(top) == len(self._name_to_top_map)
 
 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66