欢迎访问我的个人博客:zengzeyu.com
前言
FCN(fully convolutional networks, 全卷积神经网络)的图片语义分割(semantic segmentation)论文:Fully Convolutional Networks for Semantic Segmentation。全卷积网络首现于这篇文章。这篇文章是将CNN结构应用到图像语义分割领域并取得突出结果的开山之作,因而拿到了CVPR 2015年的Best paper honorable mention。图像语义分割,简而言之就是对一张图片上的所有像素点进行分类。如下图就是一个语义分割例子,不同颜色像素代表不同类别:
UCB的FCN源码Github地址:https://github.com/shelhamer/fcn.berkeleyvision.org
源码中一共包含了4种网络结构模型:nyud-fcn、pascalcontext-fcn、siftflow-fcn、voc-fcn。每一种网络结构根据提取卷积层不同,又分了3-4个不等的网络类别。
工作中个人的数据类型和格式不一定与voc-fcn-alexnet源代码提供的数据接口相同或类似(图片),如本文接下来要输入网络模型的数据类型为由激光雷达(LiDAR)扫描得到的点云数据(.pcd),那么如何进行实际操作呢?下面一步一步进行。
1. 激光雷达数据转换
1.1 激光雷达点云数据介绍
首先介绍机械式旋转激光雷达生成的数据格式,激光雷达内部电机以一定角速度旋转,通过固定于其上的激光发射器和激光接收器测量激光雷达到障碍物的距离。以速腾聚创公司生产的16线激光雷达RS-LiDAR-16为例,每秒进行10次360°旋转(10Hz),每次旋转扫描得到周围场景的信息,每一线激光旋转一周得到2016个点,储存在 .pcd 格式文件中。以二维彩色图像的方式(如.png)来理解.pcd文件,16线代表图片高度,2016代表图片宽度,一共16x2016=32256个像素点。每个点 point 的数据有[x, y, z, intensity],与二维图片中的RGB通道(RGB chanel)是同样的道理,每一个数据代表一个通道。
1.2 点云预处理
根据点云数据特征属性对其进行预处理,每个 point 的处理后特征有[row, column, height, range, mark]分别代表 point 的:[行序号, 列标号, 高度, 距离, 属性],其中 height 与 z 值相等,range 由 sqrt(x^2 + y^2 + z^2) 计算得出, mark 为通过决策树(Decision tree)方式对 point 进行分类得到属性:障碍物点(obstacle mark)或地面点(ground mark),与ground true图片道理相同,作为训练预测分类的结果参考标准用于计算loss。这里作用相当于,人工添加了更多的特征通道,方便进行分类和预测。
以上预处理得到的数据通过cnpy库转换为 .npy 格式的二进制文件,方便NumPy对数据进行读取,cnpy库使用教程请移步:cnpy库使用笔记以及官方example。每一帧点云数据储存为一个 .npy 格式文件,命名方式越简单越好,方便读取排序,本文直接以序号作为文件名[0.npy, 1.npy, …, n.npy ]。
2. FCN-AlexNet的点云数据分类任务
FCN-AlexNet的点云数据分类任务工程包含:
- 5个Python文件: pcl_data_layer.py, net.py, solver.py, surgery.py, score.py
- 3个prototxt文件: train.prototxt, val.prototxt, solver.prototxt
- 1个caffe_model文件: fcn-alexnet-pascal.caffemodel
2.1 FCN-AlexNet读取数据层(Data layer)
文件命名为pcl_data_layer.py,该文件内包含class PCLSegDataLayer()
类函数:
import caffe
import numpy as np
import random
import os
class PCLSegDataLayer(caffe.Layer):
def setup(self, bottom, top):
params = eval(self.param_str)
self.npy_dir = params["pcl_dir"]
self.list_name = list()
# two tops: data and label
if len(top) != 2:
raise Exception("Need to define two tops: data and label.")
# data layers have no bottoms
if len(bottom) != 0:
raise Exception("Do not define a bottom.")
self.load_file_name( self.npy_dir, self.list_name )
self.idx = 0
def reshape(self, bottom, top):
self.data, self.label = self.load_file( self.idx )
# reshape tops to fit (leading 1 is for batch dimension)
top[0].reshape(1, *self.data.shape)
top[1].reshape(1, *self.label.shape)
def forward(self, bottom, top):
# assign output
top[0].data[...] = self.data
top[1].data[...] = self.label
# pick next input
self.idx += 1
if self.idx == len(self.list_name):
self.idx = 0
def backward(self, top, propagate_down, bottom):
pass
def load_file(self, idx):
in_file = np.load(self.list_name[idx]) #[mark, row, col, height, range]
in_data = in_file[:,:,1:-1]
in_data = in_data.transpose((2, 0, 1))
in_label = in_file[:,:,0]
return in_data, in_label
def load_file_name(self, path, list_name):
for file in os.listdir(path):
file_path = os.path.join(path, file)
if os.path.isdir(file_path):
os.listdir(file_path, list_name)
else:
list_name.append(file_path)
- setup(): 建立类时的参数
- reshape(): 根据输入调整模型入口大小
- forward(): 前向传播,由于是数据输入层,所以输出为原点云数据及其分类label
- backward(): 后向传播,数据层没有后向传播,所以舍弃
- load_file_name(): 读取指定文件夹内 .npy 格式文件并储存如列表list
- load_file(): 载入单个.npy 文件,并按照储存顺序对属性进行分类,输出data和label
2.2 FCN-AlexNet模型定义函数(net.py)
net.py文件用于生成net.prototxt文件,其定义了整个模型的结构和模型每层的各个参数。当然,模型网络结构可以利用官方已经训练好的fcn-alexnet-pascal.caffemodel来导出,也可以使用net.py自己生成,为了简化操作,本文使用fcn-alexnet-pascal.caffemodel来导出模型网络结构。
import sys
sys.path.append('../../python')
import caffe
from caffe import layers as L, params as P
from caffe.coord_map import crop
def conv_relu(bottom, ks, nout, stride=1, pad=0, group=1):
conv = L.Convolution(bottom, kernel_size=ks, stride=stride,
num_output=nout, pad=pad, group=group)
return conv, L.ReLU(conv, in_place=True)
def max_pool(bottom, ks, stride=1):
return L.Pooling(bottom, pool=P.Pooling.MAX, kernel_size=ks, stride=stride)
def fcn(split):
n = caffe.NetSpec()
pydata_params = dict()
pydata_params['pcl_dir'] = '../fcn_data_gen/data/npy' #.npy files path
pylayer = 'PCLSegDataLayer'
n.data, n.label = L.Python(module='pcl_data_layer', layer=pylayer,
ntop=2, param_str=str(pydata_params))
# the base net
n.conv1, n.relu1 = conv_relu(n.data, 11, 96, stride=4, pad=100)
n.pool1 = max_pool(n.relu1, 3, stride=2)
n.norm1 = L.LRN(n.pool1, local_size=5, alpha=1e-4, beta=0.75)
n.conv2, n.relu2 = conv_relu(n.norm1, 5, 256, pad=2, group=2)
n.pool2 = max_pool(n.relu2, 3, stride=2)
n.norm2 = L.LRN(n.pool2, local_size=5, alpha=1e-4, beta=0.75)
n.conv3, n.relu3 = conv_relu(n.norm2, 3, 384, pad=1)
n.conv4, n.relu4 = conv_relu(n.relu3, 3, 384, pad=1, group=2)
n.conv5, n.relu5 = conv_relu(n.relu4, 3, 256, pad=1, group=2)
n.pool5 = max_pool(n.relu5, 3, stride=2)
# fully conv
n.fc6, n.relu6 = conv_relu(n.pool5, 6, 4096)
n.drop6 = L.Dropout(n.relu6, dropout_ratio=0.5, in_place=True)
n.fc7, n.relu7 = conv_relu(n.drop6, 1, 4096)
n.drop7 = L.Dropout(n.relu7, dropout_ratio=0.5, in_place=True)
n.score_fr = L.Convolution(n.drop7, num_output=21, kernel_size=1, pad=0,
param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
n.upscore = L.Deconvolution(n.score_fr,
convolution_param=dict(num_output=21, kernel_size=63, stride=32,
bias_term=False),
param=[dict(lr_mult=0)])
n.score = crop(n.upscore, n.data)
n.loss = L.SoftmaxWithLoss(n.score, n.label,
loss_param=dict(normalize=True, ignore_label=255))
return n.to_proto()
def make_net():
with open('train.prototxt', 'w') as f:
f.write(str(fcn('train')))
with open('val.prototxt', 'w') as f:
f.write(str(fcn('seg11valid')))
if __name__ == '__main__':
make_net()
- conv_relu(): 定义卷积层输入参数
- max_pool(): 定义池化层输入参数
- fcn(): 定义模型网络结构
fcn()模型结构详解
这里建议结合AlexNet原论文ImageNet Classification with Deep Convolutional Neural Networks一起看,并参考AlexNet模型结构图例来进行比较好理解每个参数的意义。
(1). 数据输入层
n = caffe.NetSpec()
pydata_params = dict()
pydata_params['pcl_dir'] = '../fcn_data_gen/data/npy' #.npy files path
pylayer = 'PCLSegDataLayer'
n.data, n.label = L.Python(module='pcl_data_layer', layer=pylayer,
ntop=2, param_str=str(pydata_params))
找到pcl_data_layer.py
文件中的PCLSegDataLayer
函数,使用该类处理数据方式作为模型数据输入层函数。
(2). 第一个卷积层
n.conv1, n.relu1 = conv_relu(n.data, 11, 96, stride=4, pad=100)
n.pool1 = max_pool(n.relu1, 3, stride=2)
n.norm1 = L.LRN(n.pool1, local_size=5, alpha=1e-4, beta=0.75)
关于为何pad=100
,此文中有详细解释:FCN学习:Semantic Segmentation
(3). 第二个卷积层
n.conv2, n.relu2 = conv_relu(n.norm1, 5, 256, pad=2, group=2)
n.pool2 = max_pool(n.relu2, 3, stride=2)
n.norm2 = L.LRN(n.pool2, local_size=5, alpha=1e-4, beta=0.75)
(4). 第三个卷积层
n.conv3, n.relu3 = conv_relu(n.norm2, 3, 384, pad=1)
(5). 第四个卷积层
n.conv4, n.relu4 = conv_relu(n.relu3, 3, 384, pad=1, group=2)
(6). 第五个卷积层
n.conv5, n.relu5 = conv_relu(n.relu4, 3, 256, pad=1, group=2)
n.pool5 = max_pool(n.relu5, 3, stride=2)
(7). 第六个全连接层
n.fc6, n.relu6 = conv_relu(n.pool5, 6, 4096)
n.drop6 = L.Dropout(n.relu6, dropout_ratio=0.5, in_place=True)
(8). 第七个全连接层
n.fc7, n.relu7 = conv_relu(n.drop6, 1, 4096)
n.drop7 = L.Dropout(n.relu7, dropout_ratio=0.5, in_place=True)
(9). 第八个全连接层
n.score_fr = L.Convolution(n.drop7, num_output=21, kernel_size=1, pad=0,
param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
n.upscore = L.Deconvolution(n.score_fr,
convolution_param=dict(num_output=21, kernel_size=63, stride=32,
bias_term=False),
param=[dict(lr_mult=0)])
n.score = crop(n.upscore, n.data)
n.loss = L.SoftmaxWithLoss(n.score, n.label,
loss_param=dict(normalize=True, ignore_label=255))
2.3 FCN-AlexNet求解函数(solve.py)
solve.py 文件是整个模型的入口,它整合各个文件,输入外部参数,对结果进行求解并输出。由 solve.py 生成的 solver.prototxt 文件定义了求解函数的结构。
import caffe
import surgery, score
import numpy as np
import os
import sys
try:
import setproctitle
setproctitle.setproctitle(os.path.basename(os.getcwd()))
except:
pass
weights = '../ilsvrc-nets/fcn-alexnet-pascal.caffemodel'
# init
# caffe.set_device(int(sys.argv[0]))
# caffe.set_mode_gpu()
solver = caffe.SGDSolver('solver.prototxt')
solver.net.copy_from(weights)
# surgeries
interp_layers = [k for k in solver.net.params.keys() if 'up' in k]
surgery.interp(solver.net, interp_layers)
# scoring
val = np.loadtxt('../data/pascal/seg11valid.txt', dtype=str)
for _ in range(25):
solver.step(4000)
score.seg_tests(solver, False, val, layer='score')
weights = '../ilsvrc-nets/fcn-alexnet-pascal.caffemodel'
: 导入训练好的模型,可在[Netscope]中输入net.prototxt来进行网络结构可视化# caffe.set_device(int(sys.argv[0]))
# caffe.set_mode_gpu()
: 设置gpu来进行训练,本人电脑使用gpu报错,所以没有使用solver = caffe.SGDSolver('solver.prototxt')
solver.net.copy_from(weights)
:设置求解器模型# surgeries
: (待补充)# scoring
: (待补充)
3. 点云分割试验结果
pydev debugger: process 9249 is connecting
Connected to pydev debugger (build 173.4301.16)
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0313 11:41:39.369604 9249 solver.cpp:45] Initializing solver from parameters:
train_net: "train.prototxt"
test_net: "val.prototxt"
test_iter: 736
test_interval: 999999999
base_lr: 0.0001
display: 20
max_iter: 100000
lr_policy: "fixed"
momentum: 0.9
weight_decay: 0.0005
snapshot: 4000
snapshot_prefix: "snapshot/train"
test_initialization: false
average_loss: 20
iter_size: 20
I0313 11:41:39.369671 9249 solver.cpp:92] Creating training net from train_net file: train.prototxt
I0313 11:41:39.370101 9249 net.cpp:51] Initializing net from parameters:
state {
phase: TRAIN
}
layer {
name: "data"
type: "Python"
top: "data"
top: "label"
python_param {
module: "pcl_data_layer"
layer: "PCLSegDataLayer"
param_str: "{\'pcl_dir\': \'/home/zzy/CLionProjects/ROS_Project/ws/src/fcn_data_gen/data/npy\'}"
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 96
pad: 100
kernel_size: 11
group: 1
stride: 4
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "norm1"
type: "LRN"
bottom: "pool1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "norm1"
top: "conv2"
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
stride: 1
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "norm2"
type: "LRN"
bottom: "pool2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "norm2"
top: "conv3"
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 1
stride: 1
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "conv4"
type: "Convolution"
bottom: "conv3"
top: "conv4"
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 2
stride: 1
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "conv4"
top: "conv4"
}
layer {
name: "conv5"
type: "Convolution"
bottom: "conv4"
top: "conv5"
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
group: 2
stride: 1
}
}
layer {
name: "relu5"
type: "ReLU"
bottom: "conv5"
top: "conv5"
}
layer {
name: "pool5"
type: "Pooling"
bottom: "conv5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fc6"
type: "Convolution"
bottom: "pool5"
top: "fc6"
convolution_param {
num_output: 4096
pad: 0
kernel_size: 6
group: 1
stride: 1
}
}
layer {
name: "relu6"
type: "ReLU"
bottom: "fc6"
top: "fc6"
}
layer {
name: "drop6"
type: "Dropout"
bottom: "fc6"
top: "fc6"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc7"
type: "Convolution"
bottom: "fc6"
top: "fc7"
convolution_param {
num_output: 4096
pad: 0
kernel_size: 1
group: 1
stride: 1
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7"
top: "fc7"
}
layer {
name: "drop7"
type: "Dropout"
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "score_fr"
type: "Convolution"
bottom: "fc7"
top: "score_fr"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 21
pad: 0
kernel_size: 1
}
}
layer {
name: "upscore"
type: "Deconvolution"
bottom: "score_fr"
top: "upscore"
param {
lr_mult: 0
}
convolution_param {
num_output: 21
bias_term: false
kernel_size: 63
stride: 32
}
}
layer {
name: "score"
type: "Crop"
bottom: "upscore"
bottom: "data"
top: "score"
crop_param {
axis: 2
offset: 18
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "score"
bottom: "label"
top: "loss"
loss_param {
ignore_label: 255
normalize: true
}
}
I0313 11:41:39.370163 9249 layer_factory.hpp:77] Creating layer data
I0313 11:41:39.370743 9249 net.cpp:84] Creating Layer data
I0313 11:41:39.370753 9249 net.cpp:380] data -> data
I0313 11:41:39.370759 9249 net.cpp:380] data -> label
I0313 11:41:39.372340 9249 net.cpp:122] Setting up data
I0313 11:41:39.372354 9249 net.cpp:129] Top shape: 1 3 16 2016 (96768)
I0313 11:41:39.372357 9249 net.cpp:129] Top shape: 1 16 2016 (32256)
I0313 11:41:39.372360 9249 net.cpp:137] Memory required for data: 516096
I0313 11:41:39.372364 9249 layer_factory.hpp:77] Creating layer data_data_0_split
I0313 11:41:39.372370 9249 net.cpp:84] Creating Layer data_data_0_split
I0313 11:41:39.372372 9249 net.cpp:406] data_data_0_split <- data
I0313 11:41:39.372376 9249 net.cpp: