图像语义分割-----SegNet学习笔记+tensorflow

最新推荐文章于 2025-06-18 15:23:05 发布

qq_41576083

最新推荐文章于 2025-06-18 15:23:05 发布

阅读量5.9k

点赞数 1

CC 4.0 BY-SA版权

本文链接：https://blog.youkuaiyun.com/qq_41576083/article/details/84973380

本文介绍了SegNet网络的原理和功能，这是一个用于像素级语义分割的模型。SegNet通过4层下采样和4层上采样实现图像分类，并使用softmax进行像素分类。文章详细阐述了模型搭建、损失函数定义以及训练方法，包括变量声明、卷积层和反卷积层的封装，以及训练过程中的优化器和样本加载。同时，文中还提到了模型的测试与可视化问题。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

SegNet网络是一个像素级的语义分割模型，即会针对图像中的每一个像素，对每一个像素进行分类，去识别该像素属于的类别，整个网络分为4层下采样以及4层上采样，最后将一个[W, H, 3]的输入图像处理成[W, H, NUM_CLASSES]的向量，再通过softmax进行分类，转化为[W, H, 1]的张量，再对属于不同种类的像素分别涂上不同的颜色，从新变成[W, H, 3]的图像，但是其中的物体以不同的颜色进行了标记区分，这就是SegNet网络的功能：

代码地址：https://github.com/tkuanlun350/Tensorflow-SegNet

代码在model.py中完成了模型搭建，loss定义，以及训练方法，下面就分块记录其实现方式。

一、模型搭建

网络模型并不是很复杂，作者在搭建时将其分为了三个部分，由下向上依次是封装带L2正则化的变量定义，封装带bn层的卷积层与反卷积层定义，封装整个网络，下面也是分开来看：

1.1 变量声明封装

变量声明中又分为向量式变量与矩阵式变量，分别在Utils.py中定义如下：

# 很简单的直接初始化一个shape型向量
def _variable_on_cpu(name, shape, initializer):
  with tf.device('/cpu:0'):
    var = tf.get_variable(name, shape, initializer=initializer)
  return var

# 初始化一个带L2正则化的矩阵变量
def _variable_with_weight_decay(name, shape, initializer, wd):
  # 先初始化一个shape型矩阵变量
  var = _variable_on_cpu(
      name,
      shape,
      initializer)
  # 再根据权重，附加L2正则化并将其正则化loss加入总loss中
  if wd is not None:
    weight_decay = tf.multiply(tf.nn.l2_loss(var), wd, name='weight_loss')
    tf.add_to_collection('losses', weight_decay)
  return var

1.2 卷积层与反卷积层封装

卷积层定义：

# 这里作者自己复现了卷积过程，对某一被卷积区域s，通过s*w + b来计算卷积结果
def conv_layer_with_bn(inputT, shape, train_phase, activation=True, name=None):
    # 通过输出通道数目来确定b的维数
    out_channel = shape[3]
    # 初始化卷积核矩阵以及偏移变量
    # 完成卷积计算
    with tf.variable_scope(name) as scope:
      kernel = _variable_with_weight_decay('ort_weights', shape=shape, initializer=orthogonal_initializer(), wd=None)
      conv = tf.nn.conv2d(inputT, kernel, [1, 1, 1, 1], padding='SAME')
      biases = _variable_on_cpu('biases', [out_channel], tf.constant_initializer(0.0))
      bias = tf.nn.bias_add(conv, biases)
      # 通过传入参数中是否使用激活函数标志位
      # 输出bn层输出或者再通过一次relu激活函数
      # bn层定义在下面介绍
      if activation is True:
        conv_out = tf.nn.relu(batch_norm_layer(bias, train_phase, scope.name))
      else:
        conv_out = batch_norm_layer(bias, train_phase, scope.name)
    return conv_out

# bn层在使用时需要确定是否在训练，因为bn层中本身有两个参数是需要训练的
# 非训练模式下，可以直接加载训练后的参数
def batch_norm_layer(inputT, is_training, scope):
  return tf.cond(is_training,
          lambda: tf.contrib.layers.batch_norm(inputT, is_training=True,
                           center=False, updates_collections=None, scope=scope+"_bn"),
          lambda: tf.contrib.layers.batch_norm(inputT, is_training=False,
                           updates_collections=None, center=False, scope=scope+"_bn", reuse = True))

反卷积定义：

def deconv_layer(inputT, f_shape, output_shape, stride=2, name=None):
  strides = [1, stride, stride, 1]
  with tf.variable_scope(name):
    # 首先要用特殊的初始化方式来初始化卷积核，下面给出定义
    weights = get_deconv_filter(f_shape)
    # 进行反卷积操作，这里的反卷积定义可以自行查阅，大概就是先扩充再卷积
    deconv = tf.nn.conv2d_transpose(inputT, weights, output_shape,
                                        strides=strides, padding='SAME')
  return deconv

def get_deconv_filter(f_shape):
  # 此处所用的反卷积核都是2*2的核
  # 其每个位置的参数的初始值由该参数的位置与矩阵尺寸确定，公式如下：
  width = f_shape[0]
  heigh = f_shape[0]
  f = int(width/2.0)
  c = (2 * f - 1 - f % 2) / (2.0 * f)
  bilinear = np.zeros([f_shape[0], f_shape[1]])
  for x in range(width):
      fo