上一篇文章总结了训练的步骤,如果对ssd框架不是很了解的话,代码层有点难以理解。网上有很多关于相关模型的介绍,但是结合代码的很少。这里总结一下自己的理解,希望对新入手的同学有所帮助!有错误的地方还望指正。
学习一个新的模型,首先学习的是框架和思想,网上很多讲解SSD模型的文章。这里就不浪费时间重复了,直接从代码上理解模型的框架:
x = Input(shape=(img_height, img_width, img_channels))
# The following identity layer is only needed so that the subsequent lambda layers can be optional.
x1 = Lambda(identity_layer, output_shape=(img_height, img_width, img_channels), name='identity_layer')(x)
if not (subtract_mean is None):
x1 = Lambda(input_mean_normalization, output_shape=(img_height, img_width, img_channels), name='input_mean_normalization')(x1)
if not (divide_by_stddev is None):
x1 = Lambda(input_stddev_normalization, output_shape=(img_height, img_width, img_channels), name='input_stddev_normalization')(x1)
if swap_channels:
x1 = Lambda(input_channel_swap, output_shape=(img_height, img_width, img_channels), name='input_channel_swap')(x1)
conv1 = Conv2D(32, (5, 5), strides=(1, 1), padding="same", kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='conv1')(x1)
conv1 = BatchNormalization(axis=3, momentum=0.99, name='bn1')(conv1) # Tensorflow uses filter format [filter_height, filter_width, in_channels, out_channels], hence axis = 3
conv1 = ELU(name='elu1')(conv1)
pool1 = MaxPooling2D(pool_size=(2, 2), name='pool1')(conv1)
conv2 = Conv2D(48, (3, 3), strides=(1, 1), padding="same", kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='conv2')(pool1)
conv2 = BatchNormalization(axis=3, momentum=0.99, name='bn2')(conv2)
conv2 = ELU(name='elu2')(conv2)
pool2 = MaxPooling2D(pool_size=(2, 2), name='pool2')(conv2)
conv3 = Conv2D(64, (3, 3), strides=(1, 1), padding="same", kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='conv3')(pool2)
conv3 = BatchNormalization(axis=3, momentum=0.99, name='bn3')(conv3)
conv3 = ELU(name='elu3')(conv3)
pool3 = MaxPooling2D(pool_size=(2, 2), name='pool3')(conv3)
conv4 = Conv2D(64, (3, 3), strides=(1, 1), padding="same", kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='conv4')(pool3)
conv4 = BatchNormalization(axis=3, momentum=0.99, name='bn4')(conv4)
conv4 = ELU(name='elu4')(conv4)
pool4 = MaxPooling2D(pool_size=(2, 2), name='pool4')(conv4)
conv5 = Conv2D(48, (3, 3), strides=(1, 1), padding="same", kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='conv5')(pool4)
conv5 = BatchNormalization(axis=3, momentum=0.99, name='bn5')(conv5)
conv5 = ELU(name='elu5')(conv5)
pool5 = MaxPooling2D(pool_size=(2, 2), name='pool5')(conv5)
conv6 = Conv2D(48, (3, 3), strides=(1, 1), padding="same", kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='conv6')(pool5)
conv6 = BatchNormalization(axis=3, momentum=0.99, name='bn6')(conv6)
conv6 = ELU(name='elu6')(conv6)
pool6 = MaxPooling2D(pool_size=(2, 2), name='pool6')(conv6)
conv7 = Conv2D(32, (3, 3), strides=(1, 1), padding="same", kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='conv7')(pool6)
conv7 = BatchNormalization(axis=3, momentum=0.99, name='bn7')(conv7)
conv7 = ELU(name='elu7')(conv7)
# The next part is to add the convolutional predictor layers on top of the base network that we defined above.
# Note that I use the term "base network" differently than the paper does.
# To me, the base network is everything that is not convolutional predictor layers or anchor
# In this case we'll have four predictor layers, but of course you could box layers.
# easily rewrite this into an arbitrarily deep base network and add an arbitrary number of predictor layers on top of the base network by simply following the pattern shown here.
# Build the convolutional predictor layers on top of conv layers 4, 5, 6, and 7.
# We build two predictor layers on top of each of these layers: One for class prediction (classification), one for box coordinate prediction (localization)
# We precidt `n_classes` confidence values for each box, hence the `classes` predictors have depth `n_boxes * n_classes`
# We predict 4 box coordinates for each box, hence the `boxes` predictors have depth `n_boxes * 4`
# Output shape of `classes`: `(batch, height, width, n_boxes * n_classes)`
classes4 = Conv2D(n_boxes[0] * n_classes, (3, 3), strides=(1, 1), padding="same", kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='classes4')(conv4)
classes5 = Conv2D(n_boxes[1] * n_classes, (3, 3), strides=(1, 1), padding="same", kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='classes5')(conv5)
classes6 = Conv2D(n_boxes[2] * n_classes, (3, 3), strides=(1, 1), padding="same", kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='classes6')(conv6)
classes7 = Conv2D(n_boxes[3] * n_classes, (3, 3), strides=(1, 1), padding="same", kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='classes7')(conv7)
# Output shape of `boxes`: `(batch, height, width, n_boxes * 4)`
boxes4 = Conv2D(n_boxes[0] * 4, (3, 3), strides=(1, 1), padding="same", kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='boxes4')(conv4)
boxes5 = Conv2D(n_boxes[1] * 4, (3, 3), strides=(1, 1), padding="same", kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='boxes5')(conv5)
boxes6 = Conv2D(n_boxes[2] * 4, (3, 3), strides=(1, 1), padding="same", kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='boxes6')(conv6)
boxes7 = Conv2D(n_boxes[3] * 4, (3, 3), strides=(1, 1), padding="same", kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='boxes7')(conv7)
# Generate the anchor boxes
# Output shape of `anchors`: `(batch, height, width, n_boxes, 8)` aspect_ratios = [0.5,1,2] two_boxes_for_ar1:是否生成两个横纵比例为1的框
anchors4 = AnchorBoxes(img_height, img_width, this_scale=scales[0], next_scale=scales[1], aspect_ratios=aspect_ratios[0],
two_boxes_for_ar1=two_boxes_for_ar1, this_steps=steps[0], this_offsets=offsets[0],
clip_boxes=clip_boxes, variances=variances, coords=coords, normalize_coords=normalize_coords, name='anchors4')(