Faster R-CNN Keras版源码史上最详细解读系列之RPN训练数据处理二

最新推荐文章于 2023-08-09 09:39:30 发布

原创最新推荐文章于 2023-08-09 09:39:30 发布 · 1.2k 阅读

6 ·

CC 4.0 BY-SA版权

文章标签：

#Faster R-CNN Keras版源码史上最详细解读系列 #Faster R-CNN Keras #计算机视觉 #目标检测 #深度学习

计算机视觉同时被 3 个专栏收录

24 篇文章

订阅专栏

目标检测

24 篇文章

订阅专栏

Faster R-CNN Keras版源码史上最详细解读系列

11 篇文章

订阅专栏

Faster R-CNN Keras版源码史上最详细解读系列之RPN训练数据处理二

训练数据处理

训练数据处理

继续我们的数据预处理，上次讲到data_generators.py的这个方法calc_rpn，这个方法就是用来生成RPN模型的输入和输出的，输入比较简单，就是一张处理后的图，输出是为了跟网络的预测输出求出误差，我们知道的图片信息并不是RPN网络的输出格式，所以要进行预处理计算。我们来看看源码吧，比较复杂，多看几遍，多用Debug调试，加上我的一些注释，应该会比较好理解，我打算全部代码最后再贴，先分开一段段将。

初始化一些参数

先看一个缩放的函数，vgg.py中的get_img_output_length，就是求特征图的尺寸，后面会用到：

# 获取特征图的尺寸
def get_img_output_length(width, height):
    def get_output_length(input_length):
        return input_length//16

    return get_output_length(width), get_output_length(height)

那我们从头开始讲，首先是进行数据准备和初始化,因为用到的数据比较多，我把相应的数据打出来会比较好点，不然那么多数据的都不知道是长什么样的：

  	# 特征图缩放的倍数
	downscale = float(C.rpn_stride) # 16
	# 锚框的尺寸
	anchor_sizes = C.anchor_box_scales # [128, 256, 512]
	# 锚框的比例
	anchor_ratios = C.anchor_box_ratios # [[1, 1], [1./math.sqrt(2), 2./math.sqrt(2)], [2./math.sqrt(2), 1./math.sqrt(2)]]
	# 锚框数量
	num_anchors = len(anchor_sizes) * len(anchor_ratios) # 3x3=9

	# calculate the output map size based on the network architecture
	# 尺寸resize后再缩放成特征图大小 
	(output_width, output_height) = img_length_calc_function(resized_width, resized_height) # (25,18)

	n_anchratios = len(anchor_ratios) # 3
	
	# initialise empty output objectives

   # 保存是否有物体 3维矩阵，其实就是特征图有output_height*output_width个点，
	# 每个锚框是是否包含物体
	y_rpn_overlap = np.zeros((output_height, output_width, num_anchors)) # (18,25,9)

	# 每个锚框是否有效，即锚框是否有用
	y_is_box_valid = np.zeros((output_height, output_width, num_anchors)) # (18,25,9)

	# 保存锚框的回归梯度
	y_rpn_regr = np.zeros((output_height, output_width, num_anchors * 4)) # (18,25,36)

	# 图片中总共的真实框gt数量，里面的字典如{'class': 'chair', 'x1': x1, 'x2': x2, 'y1': y1, 'y2': y2, 'difficult': o}
	num_bboxes = len(img_data['bboxes']) 

	# 生成一个长度为num_bboxes的全0列表,为每一个真实框(gt)保存对应的合适的锚框的数量，一对多 列表 [0 0 0 0 0 0 0 0 0 0...]
	num_anchors_for_bbox = np.zeros(num_bboxes).astype(int) # (num_bboxes,)

	# 生成一个矩阵 num_bboxes行，4列，为每一个真实框(gt)保存最好的4个锚框信息 特征图上的行，列，宽高比，尺寸大小，初始都是-1
	best_anchor_for_bbox = -1*np.ones((num_bboxes, 4)).astype(int) # (num_bboxes,4)

	# 生成一个长度为num_bboxes的全0列表 为每一个真实框(gt)保存最好的iou值，列表 [0 0 0 0 0 0 0 0 0 0...]
	best_iou_for_bbox = np.zeros(num_bboxes).astype(np.float32) # (num_bboxes,)

	# 生成一个矩阵 num_bboxes行，4列 为每一个真实框(gt)保存在resize图上映射的最好的锚框坐标x1,x2,y1,y2，
	best_x_for_bbox = np.zeros((num_bboxes, 4)).astype(int) # (num_bboxes,4)

	# 生成一个矩阵 num_bboxes行，4列 为每一个真实框(gt)保存最好的真实框和锚框的回归梯度dx,dy,dw,dh，
	best_dx_for_bbox = np.zeros((num_bboxes, 4)).astype(np.float32) # (num_bboxes,4)

	# get the GT box coordinates, and resize to account for image resizing
	# 初始化真实框gt矩阵
	gta = np.zeros((num_bboxes, 4)) # (num_bboxes,4)
	for bbox_num, bbox in enumerate(img_data['bboxes']):
		# get the GT box coordinates, and resize to account for image resizing
		# 获取resize后的真实框的坐标，还未映射到特征图上的尺寸
		gta[bbox_num, 0] = bbox['x1'] * (resized_width / float(width))
		gta[bbox_num, 1] = bbox['x2'] * (resized_width / float(width))
		gta[bbox_num, 2] = bbox['y1'] * (resized_height / float(height))
		gta[bbox_num, 3] = bbox['y2'] * (resized_height / float(height))

锚框筛选

做了很多的数据初始化，这个还是好理解的，后续的操作基本上是在特征图上做的，后面的操作就是去过滤掉一些不合适的锚框：

# 计算量:3*3*18*25*num_bboxes 即 特征图长x特征图宽x锚框数x真实框gt数
	for anchor_size_idx in range(len(anchor_sizes)):
		for anchor_ratio_idx in range(n_anchratios):
			# 获取各种锚框大小 anchor_x：宽 anchor_y：高
			anchor_x = anchor_sizes[anchor_size_idx] * anchor_ratios[anchor_ratio_idx][0]
			anchor_y = anchor_sizes[anchor_size_idx] * anchor_ratios[anchor_ratio_idx][1]	
			# 在特征图的每个点映射到resize的原图上的某个点，锚框以该点为中心，然后对锚框进行筛选，出边界就不要了
			for ix in range(output_width):					
				# x-coordinates of the current anchor box
				# 获取锚框映射在resize图上的x1,x2， +0.5是因为这个锚点是取中心点
				x1_anc = downscale * (ix + 0.5) - anchor_x / 2
				x2_anc = downscale * (ix + 0.5) + anchor_x / 2	
				
				# ignore boxes that go across image boundaries
				# x1,x2跑出边界外面的就不要了
				if x1_anc < 0 or x2_anc > resized_width:
					continue
					
				for jy in range(output_height):

					# y-coordinates of the current anchor box

					y1_anc = downscale * (jy + 0.5) - anchor_y / 2
					y2_anc = downscale * (jy + 0.5) + anchor_y / 2

					# ignore boxes that go across image boundaries
					# y1,y2跑出边界外面的就不要了
					if y1_anc < 0 or y2_anc > resized_height:
						continue

					# bbox_type indicates whether an anchor should be a target
					# 先设置为负样本
					bbox_type = 'neg'

					# this is the best IOU for the (x,y) coord and the current anchor
					# note that this is different from the best IOU for a GT bbox
					# 设置真实框和对应锚框的最好iou，初始化0
					best_iou_for_loc = 0.0

					# 每个锚框都要进行n个真实框gt的IOU计算，类别判定 bbox_num表示真实框的索引
					for bbox_num in range(num_bboxes):
						
						# get IOU of the current GT box and the current anchor box

						# 计算真实框gt和锚框的交并比
						curr_iou = iou([gta[bbox_num, 0], gta[bbox_num, 2], gta[bbox_num, 1], gta[bbox_num, 3]], [x1_anc, y1_anc, x2_anc, y2_anc])

						# calculate the regression targets if they will be needed
						# 如果iou大于真实框gt存在锚框的iou，或者大于一个iou阈值，则会进行回归计算
						if curr_iou > best_iou_for_bbox[bbox_num] or curr_iou > C.rpn_max_overlap:
							# 真实框gt的中心坐标
							cx = (gta[bbox_num, 0] + gta[bbox_num, 1]) / 2.0
							cy = (gta[bbox_num, 2] + gta[bbox_num, 3]) / 2.0

							# 锚框的中心坐标
							cxa = (x1_anc + x2_anc)/2.0
							cya = (y1_anc + y2_anc)/2.0

							# 算中心坐标差距，并归一化，消除尺度不同的影响
							tx = (cx - cxa) / (x2_anc - x1_anc)
							ty = (cy - cya) / (y2_anc - y1_anc)

							# 算宽高差距，并归一化
							tw = np.log((gta[bbox_num, 1] - gta[bbox_num, 0]) / (x2_anc - x1_anc))
							th = np.log((gta[bbox_num, 3] - gta[bbox_num, 2]) / (y2_anc - y1_anc))

						# 如果真实框不是背景
						if img_data['bboxes'][bbox_num]['class'] != 'bg':

							# all GT boxes should be mapped to an anchor box, so we keep track of which anchor box was best
							# 如果交并比大于与当前真实框相交的最大的交并比时
							if curr_iou > best_iou_for_bbox[bbox_num]:
								# 记录锚框在特征图上的某个坐标jy, ix 锚框比例和尺寸的序号
								best_anchor_for_bbox[bbox_num] = [jy, ix, anchor_ratio_idx, anchor_size_idx]

								# 更新真实框对应的最大交并比
								best_iou_for_bbox[bbox_num] = curr_iou

								# 更新在resize图上映射的锚框坐标x1,x2,y1,y2
								best_x_for_bbox[bbox_num,:] = [x1_anc, x2_anc, y1_anc, y2_anc]

								# 更新锚框和真实框之间的偏移量
								best_dx_for_bbox[bbox_num,:] = [tx, ty, tw, th]

							# we set the anchor to positive if the IOU is >0.7 (it does not matter if there was another better box, it just indicates overlap)
							# 如果交并比大于0.7就是正样本
							if curr_iou > C.rpn_max_overlap:
								bbox_type = 'pos'

								# 预测框gt对应的锚框数量+1
								num_anchors_for_bbox[bbox_num] += 1
								# we update the regression layer target if this IOU is the best for the current (x,y) and anchor position
								# 如果IOU大于最好的某个真实框的IOU时候，更新，也就是锚框也会选最好的真实框
								if curr_iou > best_iou_for_loc:
									# 更新回归梯度最好的IOU
									best_iou_for_loc = curr_iou

									# 因为IOU大，所以回归梯度好，要更新回归梯度
									best_regr = (tx, ty, tw, th)

							# if the IOU is >0.3 and <0.7, it is ambiguous and no included in the objective
							# iou在0.3到0.7之间的是中立的，不处理
							if C.rpn_min_overlap < curr_iou < C.rpn_max_overlap:
								# gray zone between neg and pos
								if bbox_type != 'pos':
									bbox_type = 'neutral'

					# turn on or off outputs depending on IOUs
					# 某个锚框是否有用，是否有物体，取决于正负样本，也即IOU
					if bbox_type == 'neg':
						# 如果是负例，有效锚框标志位1
						y_is_box_valid[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 1
						# 如果是负例，锚框物体标志位0
						y_rpn_overlap[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 0
					elif bbox_type == 'neutral':
						# 如果是不处理的，两个都为0
						y_is_box_valid[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 0
						y_rpn_overlap[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 0
					elif bbox_type == 'pos':
						# 如果是正例，两个都为1
						y_is_box_valid[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 1
						y_rpn_overlap[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 1
						# 计算锚框回归值的起始位置，因为每个锚框有4个值，所以起始位置是锚框的位置x4 比如锚框位置是3，那就是12,13,14,15四个位置
						start = 4 * (anchor_ratio_idx + n_anchratios * anchor_size_idx)
						# 设置锚框的回归梯度 每个框4个值 9个框36个值，所以每4个一组
						y_rpn_regr[jy, ix, start:start+4] = best_regr

首先遍历所有锚框的尺寸和宽高比，求出每个锚框在原图缩放后(暂时记为缩放图)对应的中心点，把中心点在缩放图的外的剔除掉，然后每个锚框跟所有的真实框做IOU计算curr_iou = iou([gta[bbox_num, 0], gta[bbox_num, 2], gta[bbox_num, 1], gta[bbox_num, 3]], [x1_anc, y1_anc, x2_anc, y2_anc])，也就是交并比，理解起来也不难，具体的函数我贴下：

# 并集
def union(au, bu, area_intersection):
	area_a = (au[2] - au[0]) * (au[3] - au[1])
	area_b = (bu[2] - bu[0]) * (bu[3] - bu[1])
	area_union = area_a + area_b - area_intersection
	return area_union

# 交集 坐标系是左上角是0,0点
def intersection(ai, bi):
	x = max(ai[0], bi[0])
	y = max(ai[1], bi[1])
	w = min(ai[2], bi[2]) - x
	h = min(ai[3], bi[3]) - y
	if w < 0 or h < 0:
		return 0
	return w*h

# 交并比
def iou(a, b):
	# a and b should be (x1,y1,x2,y2)
	# 固定坐标左上角 右下角
	if a[0] >= a[2] or a[1] >= a[3] or b[0] >= b[2] or b[1] >= b[3]:
		return 0.0

	area_i = intersection(a, b)
	area_u = union(a, b, area_i)

	return float(area_i) / float(area_u + 1e-6)

如果某个锚框和某个真实框的交并比大于一个阈值，或者大于该真实框最大交并比时if curr_iou > best_iou_for_bbox[bbox_num] or curr_iou > C.rpn_max_overlap，计算锚框和真实框的回归梯度，也就是偏移量，这个其实就是作为有监督学习的标签数据，要跟RPN模型输出的数据求误差的。

然后如果真实框的类别不是背景的话if img_data['bboxes'][bbox_num]['class'] != 'bg'，判断交并比是否比该真实框最大交并比大curr_iou > best_iou_for_bbox[bbox_num]，如果大的就替换，同时更新该锚框在特征图上的位置，尺寸，和宽高比索引best_anchor_for_bbox[bbox_num] = [jy, ix, anchor_ratio_idx, anchor_size_idx]，更新真实框对应的最大交并比best_iou_for_bbox[bbox_num] = curr_iou，
更新该真实框在缩放图上的锚框坐标best_x_for_bbox[bbox_num,:] = [x1_anc, x2_anc, y1_anc, y2_anc]，
更新锚框和真实框之间的偏移量best_dx_for_bbox[bbox_num,:] = [tx, ty, tw, th]。
然后要进行正负例的处理，如果交并比大于最大阈值，设为正例，真实框对应的合适的锚框数+1，如果是所有真实框中最大的IOUif curr_iou > best_iou_for_loc，就更新最好的偏移量best_regr = (tx, ty, tw, th)。如果交并比在最大和最小阈值之间的if C.rpn_min_overlap < curr_iou < C.rpn_max_overlap，设为中立样本。

然后进行根据样本类型，进行有效位的更新，如果是负例，对应锚框有效位y_is_box_valid设置为1，是否有物体的对应有效为y_rpn_overlap设置为0，如果是中立样本锚框有效位和是否有物体设置为0，如果是正样本都设置为1，然后保存最好的回归梯度y_rpn_regr[jy, ix, start:start+4] = best_regr，这里其实就是一个锚框可能对应很多个真实框，但是IOU不一样的，取最大的。

上面的步骤一直循环，直到所有的锚框都有对应的分类标志y_rpn_overlap，还有有标志y_is_box_valid，以及回归梯度y_rpn_regr。

确保每个真实框都能有一个对应的锚框

然后我们要确保每个真实框能有一个正例，真的没相交的框，也没办法，但是一般不太可能，那么多框锚框呢：

	# we ensure that every bbox has at least one positive RPN region
	# 确保每个真实框至少有一个是正例的锚框，在没有大于0.7就选best_anchor_for_bbox的，
	# 这样负样本和中立样本也可能变成正样本啦
	# 如果这也没有那就没办法了，那就说明没有相交的锚框
	for idx in range(num_anchors_for_bbox.shape[0]):
		# 没找到IOU大于0.7的锚框
		if num_anchors_for_bbox[idx] == 0:

			# no box with an IOU greater than zero ...
			# 没有找到相交的锚框，就不处理，没相交的可能性很小很小
			if best_anchor_for_bbox[idx, 0] == -1:
				continue

			# 有锚框，虽然IOU没大于0.7 ，也作为正例
			y_is_box_valid[
				best_anchor_for_bbox[idx,0], best_anchor_for_bbox[idx,1], best_anchor_for_bbox[idx,2] + n_anchratios *
				best_anchor_for_bbox[idx,3]] = 1
			y_rpn_overlap[
				best_anchor_for_bbox[idx,0], best_anchor_for_bbox[idx,1], best_anchor_for_bbox[idx,2] + n_anchratios *
				best_anchor_for_bbox[idx,3]] = 1
			start = 4 * (best_anchor_for_bbox[idx,2] + n_anchratios * best_anchor_for_bbox[idx,3])
			y_rpn_regr[
				best_anchor_for_bbox[idx,0], best_anchor_for_bbox[idx,1], start:start+4] = best_dx_for_bbox[idx, :]

有些可能是有交并比，但是并没有超过阈值，为了让每个真实框都有对应的正样本，否则就缺少一个物体没被识别了，我们就选最大的那个best_anchor_for_bbox作为正样本，然后去更新相应的标签和回归梯度。best_anchor_for_bbox[bbox_num] 形状是(高索引，宽索引，宽高比索引,尺寸比索引)，y_is_box_valid对应的是(高索引，宽索引，锚框索引)，所以才有上面的根据尺寸比索引和宽高比索引去计算锚框索引。
看这3行应该能明白对应关系：

best_anchor_for_bbox[bbox_num] = [jy, ix, anchor_ratio_idx, anchor_size_idx]

y_is_box_valid[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 1

y_is_box_valid[
				best_anchor_for_bbox[idx,0], best_anchor_for_bbox[idx,1], best_anchor_for_bbox[idx,2] + n_anchratios *
				best_anchor_for_bbox[idx,3]] = 1

维度的转置和一致性

然后为了后续计算方便，我们把通道放到了最前面，前面再加一个维度，统一格式：

	# 维度转置， 变成 锚框序号,jy, ix 然后在前面增加一个维度
	y_rpn_overlap = np.transpose(y_rpn_overlap, (2, 0, 1)) # (9,18,25)
	y_rpn_overlap = np.expand_dims(y_rpn_overlap, axis=0) # (1,9,18,25)

	y_is_box_valid = np.transpose(y_is_box_valid, (2, 0, 1)) # (9,18,25)
	y_is_box_valid = np.expand_dims(y_is_box_valid, axis=0) # (1,9,18,25)

	y_rpn_regr = np.transpose(y_rpn_regr, (2, 0, 1)) # (36,18,25)
	y_rpn_regr = np.expand_dims(y_rpn_regr, axis=0) # (1,36,18,25)

正负样本均衡

然后把同时包含物体且有效的锚框选出来：

# 取出正例和负例的索引数组 logical_and逻辑与 np.where求出相应的索引数组,简单理解就是把符合条件的挑出来，返回的shi 数每个维度的索引数组，因为有3各维度，所以是3个数组
	pos_locs = np.where(np.logical_and(y_rpn_overlap[0, :, :, :] == 1, y_is_box_valid[0, :, :, :] == 1))
	neg_locs = np.where(np.logical_and(y_rpn_overlap[0, :, :, :] == 0, y_is_box_valid[0, :, :, :] == 1))

再进行正负样本的平衡，保证正负比例1:1，但是你会发现，这个num_regions = 256其实有针对性的，如果有些正例20，负例200，正例+负例也没超过256，那就可能负例很多，这个可能还要调过的，或许是针对VOC数据的：

# 获取正例的个数
	num_pos = len(pos_locs[0])

	# one issue is that the RPN has many more negative than positive regions, so we turn off some of the negative
	# regions. We also limit it to 256 regions.
	# 因为负例多于正例，所以要除去一些负例，限制最多256个区域，正负比例不可以小于1:1，即负的不能多余正的
	num_regions = 256

	# 如果正例大于128，多余的那部分就不要了，正例最多128
	if len(pos_locs[0]) > num_regions/2:
		val_locs = random.sample(range(len(pos_locs[0])), len(pos_locs[0]) - num_regions/2)
		y_is_box_valid[0, pos_locs[0][val_locs], pos_locs[1][val_locs], pos_locs[2][val_locs]] = 0
		num_pos = num_regions/2

	# 如果负例+正例 大于256个,多余的负例也不要了，可以保持正负比例1:1
	if len(neg_locs[0]) + num_pos > num_regions:
		val_locs = random.sample(range(len(neg_locs[0])), len(neg_locs[0]) - num_pos)
		y_is_box_valid[0, neg_locs[0][val_locs], neg_locs[1][val_locs], neg_locs[2][val_locs]] = 0

在前面增加有效位

最后把y_is_box_valid和y_rpn_overlap按第二维度拼起来，也就好比两个矩阵一左一右拼一起了，同时呢y_rpn_overlap和y_rpn_regr也拼起来了，为了维度一样，还要元素连续复制4份，最后深拷贝一份返回：

# 然后拼接起来 前面9个维度表示是否可用，也就是对应的正负样本来计算分类，中立的没有分类，不算，后面表示是否有物体，就是分类
	y_rpn_cls = np.concatenate([y_is_box_valid, y_rpn_overlap], axis=1) # (1,18,18,25)

	# 为了和y_rpn_regr维度一样，y_rpn_overlap复制4份,是连续复制，比如 1,2,3 复制4份就是1,1,1,1,2,2,2,2,3,3,3,,3,然后和y_rpn_regr拼起来36+36=72
	# 这样就能对应每一个回归梯度是否有用了，前36位也是有效位，表示是不是物体，是物体才做回归，不是物体就不用做了
	y_rpn_regr = np.concatenate([np.repeat(y_rpn_overlap, 4, axis=1), y_rpn_regr], axis=1) # (1,72,18,25)
	
	# 深拷贝，完全拷贝一组新的返回
	return np.copy(y_rpn_cls), np.copy(y_rpn_regr)

我想之所以要拼接的原因可能是因为在计算loss的时候方便，前面的维度直接可以决定数量，因为无效位置是0就不用算了，有效位置是1，可以拿来矩阵元素相乘，后面在计算损失的时候会看到，有效位的损失会累加起来，除以总共计算数据的个数，总共的个数刚好是所有有效位的和，因为每一位是1，加起来就是总共的个数，可能作者的用意就在这里吧，我猜的。

图片像素值的归一化和最后的维度转换

然后我们再看回data_generators.py的最后面部分：

				try:
					# 计算RPN分类和回归
					y_rpn_cls, y_rpn_regr = calc_rpn(C, img_data_aug, width, height, resized_width, resized_height, img_length_calc_function)
				except:
					continue

				# Zero-center by mean pixel, and preprocess image

				# 更改维度顺序，转成RGB，cv默认是BGR
				x_img = x_img[:,:, (2, 1, 0)]  # BGR -> RGB
				x_img = x_img.astype(np.float32)
				# 做自定义的标准化
				x_img[:, :, 0] -= C.img_channel_mean[0]
				x_img[:, :, 1] -= C.img_channel_mean[1]
				x_img[:, :, 2] -= C.img_channel_mean[2]
				x_img /= C.img_scaling_factor

				# 转置 通道放最前面了
				x_img = np.transpose(x_img, (2, 0, 1)) # (3,300,400)
				x_img = np.expand_dims(x_img, axis=0) # (1,3,300,400)

				# 将回归梯度后半部分回归梯度值进行缩放
				y_rpn_regr[:, y_rpn_regr.shape[1]//2:, :, :] *= C.std_scaling

				# tf的话通道放最后
				if backend == 'tf':
					x_img = np.transpose(x_img, (0, 2, 3, 1))
					y_rpn_cls = np.transpose(y_rpn_cls, (0, 2, 3, 1))
					y_rpn_regr = np.transpose(y_rpn_regr, (0, 2, 3, 1))

				yield np.copy(x_img), [np.copy(y_rpn_cls), np.copy(y_rpn_regr)], img_data_aug

			except Exception as e:
				print(e)
				continue

此时你已经得到分类信息y_rpn_cls：
在这里插入图片描述
梯度回归信息：

后面就是对图片像素值做处理了，组自定义的归一化处理，然后通过转置把通道放前面来，前面再增加一个维度，保持格式一致。之后还要对回归梯度进行一个缩放，注意索引是从36位开始，前36位表示是否有物体，之后如果是tensorflow后台，把通道维度放最后。最后把相应的数据封装好返回。

进行一次训练

然后我们就要进行训练了，一次一张图：

 # 返回三个损失 总得loss rpn_loss_cls  rpn_loss_regr
            loss_rpn = model_rpn.train_on_batch(X, Y)

我们来看下X,Y的信息：
在这里插入图片描述

你现在就会发现，为什么要筛选的时候用了那么多的标志位，而不是说去直接删除某个锚框怎么样的，因为这样做，可以和RPN预测的分类和回归梯度的维度一致：

看到没，最后分类的形状就是(1,特征图高，特征图宽，9)，回归形状是(1,特征图高，特征图宽，36)，然后计算损失的时候就比较简单了，跟我上面有说道的，可以直接计算有效位来计算总个数，方便求平均损失。