最近做了两个项目
一个是香烟检测任务,这个检测结果还可以,裱框挺准的,数据问题不再考虑之内
一个是天池大数据比赛-津南大数据啥安检的比赛
在调代码发现几个以前没注意的问题,
一、 代码使用基于github上面的mask-rcn修改得到的Faster-rcnn
这套代码很正规,不过也有不少工程上不足,改了不少工程问题。
mask-rcnn :https://github.com/matterport/Mask_RCNN
题主修改的 faster-rcnn: https://github.com/jiaowojiangdaye/faster-rcnn-tianchi-20190301
(题主的这个就是比赛用的,把输入数据形式都简化了,但也没写很详细的readme,谨慎使用)
常用参数列表:
#%% important args
Mode='train' #'train' or 'evaluate' or 'retrival'
Mode='evaluate'
#%% evaluate
if Mode == 'evaluate':
USING_NEGATIVE_IMG = True
USING_POSITIVE_IMG = False
NEGATIVE_MULT = 1
real_test = True # if true means that we load image without gt
init_with = "this" # imagenet, coco, or last this
EVA_LIMIT=10000
model_version = 'last0322_only_heads_0721_final'
weight_base = 'jinnan' # coco clothes
THIS_WEIGHT_PATH = 'logs/jinnan20190324T0119/mask_rcnn_jinnan_0721.h5'
if weight_base == 'coco':
NUM_CLASSES = 1000+1 # clothes has 80 classes
# Adjust down if you use a smaller GPU.
IMAGES_PER_GPU = 1
GPU_COUNT = 1
IMAGES_PER_GPU = 1
# You can increase this during training to generate more propsals.
RPN_NMS_THRESHOLD = 0.99
# Skip detections with < 60% confidence
DETECTION_MIN_CONFIDENCE =0.93
# Non-maximum suppression threshold for detection
DETECTION_NMS_DIFF_CLS = False
DETECTION_NMS_THRESHOLD = 0.54
POST_NMS_ROIS_INFERENCE = 1000
map_iou_thr = 0.7
arg_str = '_rn'+str(RPN_NMS_THRESHOLD)[2:4] +\
'_ds'+str(DETECTION_MIN_CONFIDENCE)[2:4] +\
'_dn'+str(DETECTION_NMS_THRESHOLD)[2:4]
save_base_dir='test_' + model_version +'_'+ str(real_test)+'_' + arg_str
submit_path = save_base_dir + '/submit_'+arg_str+'.json'
check_path(save_base_dir)
#%%train
if Mode == 'train':
USING_NEGATIVE_IMG = True
USING_POSITIVE_IMG = True
NEGATIVE_MULT = 1
POSITIVE_MULT = 1
# Which weights to start with?
init_with = "last" # imagenet, coco, or last this
THIS_WEIGHT_PATH = '/media/mosay/数据/jz/tianchi/train/faster_rcnn/models/mask_rcnn_jinnan_0766.h5'
COCO_WEIGHTS_PATH = 'models/mask_rcnn_coco.h5'
# Learning rate and momentum
# The Mask RCNN paper uses lr=0.02, but on TensorFlow it causes
# weights to explode. Likely due to differences in optimizer
# implementation.
LEARNING_RATE = 0.0001
LEARNING_MOMENTUM = 0.9
# Weight decay regularization
WEIGHT_DECAY = 0.0001
# Uncomment to train on 8 GPUs (default is 1)
GPU_COUNT = 1
# We use a GPU with 12GB memory, which can fit two images.
# Adjust down if you use a smaller GPU.
IMAGES_PER_GPU = 1
# Number of training steps per epoch
STEPS_PER_EPOCH = 300
VALIDATION_STEPS = 50
EPOCHS = 2000
USE_RPN_ROIS = True
rpn_fg_iou_thr = 0.5
rpn_bg_iou_thr = 0.5
# You can increase this during training to generate more propsals.
RPN_NMS_THRESHOLD = 0.99
RPN_TRAIN_ANCHORS_PER_IMAGE = 256
POST_NMS_ROIS_TRAINING = 2000
#%% stable args
IMAGE_RESIZE_MODE = "square"
IMAGE_MIN_DIM = 1024
IMAGE_MAX_DIM = 1024
BACKBONE = "resnet101"
# Image mean (RGB)
# MEAN_PIXEL = np.array([123.7, 116.8, 103.9])
MEAN_PIXEL = np.array([211.7, 213.7, 186.8])
# Length of square anchor side in pixels
# RPN_ANCHOR_SCALES = (32, 64, 128, 256, 512)
# RPN_ANCHOR_SCALES = (16, 32, 64, 128, 256)
# RPN_ANCHOR_SCALES = (16, 32, 32, 64, 128)
RPN_ANCHOR_SCALES = (16, 32, 64, 128, 256)
# RPN_ANCHOR_RATIOS = [0.5, 1, 2]
RPN_ANCHOR_RATIOS = [0.25, 0.5, 1, 2, 4]
LOSS_WEIGHTS = {
"rpn_class_loss": 1.,
"rpn_bbox_loss": 1.,
"mrcnn_class_loss": 1.,
"mrcnn_bbox_loss": 1.
}
数据分析:
1. 先统计目标的大小,数量等毕竟要选择anchors的一系列值,图像增强等等参数问题。
在香烟数据集上,物体物理变化不大,参数如下即可
IMAGE_RESIZE_MODE = "square"
IMAGE_MIN_DIM = 512
IMAGE_MAX_DIM = 512
RPN_ANCHOR_SCALES = (16, 32, 64, 128, 256)
RPN_ANCHOR_RATIOS = [0.25, 0.5, 1, 2, 4]
在天池的x光图片下,物体尺度多变,长宽比多变,不方便所以有更多变化
IMAGE_RESIZE_MODE = "square"
IMAGE_MIN_DIM = 1024
IMAGE_MAX_DIM = 1024
BACKBONE = "resnet101"
# Length of square anchor side in pixels
RPN_ANCHOR_SCALES = (16, 32, 64, 128, 256)
# RPN_ANCHOR_RATIOS = [0.5, 1, 2]
RPN_ANCHOR_RATIOS = [0.25, 0.5, 1, 2, 4]
上面有一个很重要的参数 ‘RPN_ANCHOR_SCALES’
这个参数有5个值,对应resnet101(这里用的coco数据的参数fine-tune,所以只用这个backone)的fpn结构输出的5个层。
注意每个anchor_scale只在一个层上进行滑动生成anchors,比如16对应只在原图缩小4倍的feature map上进行滑动选框,而和其他feature map无直接关系。那么也就意味着浅层feature map是永远不可能预测出大物体的(同样,深层是永远不可能预测出小物体的),那么这里为了实际场合中尺度多变的环境下,做好数据的尺度扩增(如果训练很多则不用介意,每层feature maps的预测能力都会被很好地训练到)。
这里x光图片的尺度变化理应不算太大,但由于输入图片是固定大小的,原图可能会放一个横着放几个包的x光图片,那相当于一个包的x光图片缩小了几倍,所以这里输入图片用了1024*1024 图像增强的尺度变化达到(0.4, 2)区间,比较稳定了。
关于正负样本比例问题:
fg_roi_count = int(config.TRAIN_ROIS_PER_IMAGE * config.ROI_POSITIVE_RATIO)
if fg_ids.shape[0] > fg_roi_count:
keep_fg_ids = np.random.choice(fg_ids, fg_roi_count, replace=False)
else:
keep_fg_ids = fg_ids
# BG
remaining = config.TRAIN_ROIS_PER_IMAGE - keep_fg_ids.shape[0]
if bg_ids.shape[0] > remaining:
keep_bg_ids = np.random.choice(bg_ids, remaining, replace=False)
else:
keep_bg_ids = bg_ids
# Combine indices of ROIs to keep
keep = np.concatenate([keep_fg_ids, keep_bg_ids])
可以看到挑选方式,只是不能超过总训练rpn的个数(比如256个,1:*2)的情况
最多训练256×0.33个正样本,和256×(1-0.33)个负样本,多了扔掉,少了补零,
其中正样本贡献分类损失和位置回归损失,负样本只贡献分类损失,位置回归损失是0.
剩下的基本都是如何减少误识别的问题了。
1. faster-rcnn检测的时候的nms本身只做了同类的nms,那么在样本复杂的x光图片下(各种堆叠),
客观来说此时已经作为目标的筛选情形下,类间还是类内已经一视同仁了,所以改成所有结果nms,程序变得更简单
这样的话调节nms的阈值的时候更加统一,把阈值调高之后,直接更加不容易漏,而不用管是否同一类别问题。
2.

1226





