利用FCN-32s,FCN-16s和FCN-8s训练自己制作的数据集

前言

之前写过一篇博客是制作自己的数据集利用FCN-32s模型训练,对FCN-16s和FCN-8s训练写的比较粗略,所以写这篇博客主要是补充FCN-16s和FCN-8s训练过程。

训练前准备

在使用fcn之前需要配置caffe环境,可以参考win10+vs2013+caffe+gpu+python环境配置这篇博客,对如何制作自己的数据集以及FCN-32s训练过程可以参考FCN制作自己的数据集并训练和测试这篇博客

训练过程

训练FCN-16s
如果你的fcn-32s模型训练好之后,会在路径D:\caffe\caffe-master\fcn-master\voc-fcn32s\snapshot(这是我fcn-master代码的路径)中看到train_iter_100000.caffemodel文件,这个文件就是FCN-32s模型权重。如果你的fcn-32s模型训练好之后,会在路径D:\caffe\caffe-master\fcn-master\voc-fcn32s\snapshot(这是我fcn-master代码的路径)中看到train_iter_100000.caffemodel文件,这个文件就是FCN-32s模型权重。
在这里插入图片描述
接着在fcn-master目录下找到所有py文件复制到voc-fcn16s文件夹中,同时新建文件夹snapshot
在这里插入图片描述
你的文件夹中可能没有deploy.prototxt文件,这里给出下载地址deploy,里面有两个文件,将deploy_fcn16.prototxt重命名为deploy.prototxt并复制到到voc-fcn16s文件夹中
在这里插入图片描述
用vs打开solver.prototxt文件,这里面是一些参数配置,由于我的图片较少,所以我把test_iter参数改为2,这个参数是指在测试时一次测试多少张图片。其他参数默认就行。
在这里插入图片描述
对于train.prototxt,val.prototxt和voc_layers.py三个文件修改的方法和训练fcn-32s时一模一样,所以参考上述的FCN制作自己的数据集并训练和测试的博客就行。

打开solve.py文件,修改代码为下

import caffe
import surgery, score

import numpy as np
import os
import sys
sys.path.append('D:/caffe/caffe-master/python')
try:
    import setproctitle
    setproctitle.setproctitle(os.path.basename(os.getcwd()))
except:
    pass

weights = 'D:/caffe/caffe-master/fcn-master/voc-fcn32s/snapshot1/train_iter_100000.caffemodel'

# init
#caffe.set_device(int(sys.argv[1]))
caffe.set_mode_gpu()
caffe.set_device(0)

solver = caffe.SGDSolver('solver.prototxt')
solver.net.copy_from(weights)


# surgeries
interp_layers = [k for k in solver.net.params.keys() if 'up' in k]
surgery.interp(solver.net, interp_layers)

# scoring
val = np.loadtxt('D:/caffe/caffe-master/fcn-master/data/pascal/VOCdevkit/VOC2012/ImageSets/Segmentation/seg11valid.txt', dtype=str)

for _ in range(25):
    solver.step(4000)
    score.seg_tests(solver, False, val, layer='score')

到这所有准备工作都已结束,下面开始训练。
在cmd命令窗口中,指定到fcn-master\voc-fcn16s目录下,输入python solve.py命令,如下所示。
在这里插入图片描述
等待漫长的训练后,在voc-fcn16s\snapshot目录下生成train_iter_100000.caffemodel文件
在这里插入图片描述

测试单张图片
打开infer.py文件,修改代码如下,其中data文件夹是我在voc-fcn16s目录下创建的,不是fcn-master目录下的data

import numpy as np
from PIL import Image

import caffe
import vis
import sys
sys.path.append('D:/caffe/caffe-master/python')

# the demo image is "2007_000129" from PASCAL VOC

# load image, switch to BGR, subtract mean, and make dims C x H x W for Caffe
im = Image.open('data/image.jpg')
in_ = np.array(im, dtype=np.float32)
in_ = in_[:,:,::-1]
in_ -= np.array((104.00698793,116.66876762,122.67891434))
in_ = in_.transpose((2,0,1))

# load net
net = caffe.Net('deploy.prototxt', 'snapshot/train_iter_100000.caffemodel', caffe.TEST)
# shape for input (data blob is N x C x H x W), set data
net.blobs['data'].reshape(1, *in_.shape)
net.blobs['data'].data[...] = in_
# run net and take argmax for prediction
net.forward()
out = net.blobs['score'].data[0].argmax(axis=0)

# visualize segmentation in PASCAL VOC colors
voc_palette = vis.make_palette(5)
out_im = Image.fromarray(vis.color_seg(out, voc_palette))
out_im.save('data/output.png')
masked_im = Image.fromarray(vis.vis_seg(im, out, voc_palette))
masked_im.save('data/visualization.jpg')

到此所有的fcn-16s模型训练工作都已结束。

训练FCN-16s用的是FCN-32s的权重,训练FCN-8s用的是FCN-16s的权重,所以对于FCN-8s模型的训练和FCN-16s训练的过程一模一样,所以在这就不在重复叙述了,如果又不懂的地方可以随时问我,我的邮箱是905885574@q.com,欢迎一起交流探讨。

Fully Convolutional Networks for Semantic Segmentation This is the reference implementation of the models and code for the fully convolutional networks (FCNs) in the PAMI FCN and CVPR FCN papers: Fully Convolutional Models for Semantic Segmentation Evan Shelhamer*, Jonathan Long*, Trevor Darrell PAMI 2016 arXiv:1605.06211 Fully Convolutional Models for Semantic Segmentation Jonathan Long*, Evan Shelhamer*, Trevor Darrell CVPR 2015 arXiv:1411.4038 Note that this is a work in progress and the final, reference version is coming soon. Please ask Caffe and FCN usage questions on the caffe-users mailing list. Refer to these slides for a summary of the approach. These models are compatible with BVLC/caffe:master. Compatibility has held since master@8c66fa5 with the merge of PRs #3613 and #3570. The code and models here are available under the same license as Caffe (BSD-2) and the Caffe-bundled models (that is, unrestricted use; see the BVLC model license). PASCAL VOC models: trained online with high momentum for a ~5 point boost in mean intersection-over-union over the original models. These models are trained using extra data from Hariharan et al., but excluding SBD val. FCN-32s is fine-tuned from the ILSVRC-trained VGG-16 model, and the finer strides are then fine-tuned in turn. The "at-once" FCN-8s is fine-tuned from VGG-16 all-at-once by scaling the skip connections to better condition optimization. FCN-32s PASCAL: single stream, 32 pixel prediction stride net, scoring 63.6 mIU on seg11valid FCN-16s PASCAL: two stream, 16 pixel prediction stride net, scoring 65.0 mIU on seg11valid FCN-8s PASCAL: three stream, 8 pixel prediction stride net, scoring 65.5 mIU on seg11valid and 67.2 mIU on seg12test FCN-8s PASCAL at-once: all-at-once, three stream, 8 pixel prediction stride net, scoring 65.4 mIU on seg11valid FCN-AlexNet PASCAL: AlexNet (CaffeNet) architecture, single stream, 32 pixel prediction stride net, scoring 48.0 mIU on seg11valid. Unlike the FCN-32/16/8s models, this network is trained with gradient accumulation, normalized loss, and standard momentum. (Note: when both FCN-32s/FCN-VGG16 and FCN-AlexNet are trained in this same way FCN-VGG16 is far better; see Table 1 of the paper.) To reproduce the validation scores, use the seg11valid split defined by the paper in footnote 7. Since SBD train and PASCAL VOC 2011 segval intersect, we only evaluate on the non-intersecting set for validation purposes. NYUDv2 models: trained online with high momentum on color, depth, and HHA features (from Gupta et al. https://github.com/s-gupta/rcnn-depth). These models demonstrate FCNs for multi-modal input. FCN-32s NYUDv2 Color: single stream, 32 pixel prediction stride net on color/BGR input FCN-32s NYUDv2 HHA: single stream, 32 pixel prediction stride net on HHA input FCN-32s NYUDv2 Early Color-Depth: single stream, 32 pixel prediction stride net on early fusion of color and (log) depth for 4-channel input FCN-32s NYUDv2 Late Color-HHA: single stream, 32 pixel prediction stride net by late fusion of FCN-32s NYUDv2 Color and FCN-32s NYUDv2 HHA SIFT Flow models: trained online with high momentum for joint semantic class and geometric class segmentation. These models demonstrate FCNs for multi-task output. FCN-32s SIFT Flow: single stream stream, 32 pixel prediction stride net FCN-16s SIFT Flow: two stream, 16 pixel prediction stride net FCN-8s SIFT Flow: three stream, 8 pixel prediction stride net Note: in this release, the evaluation of the semantic classes is not quite right at the moment due to an issue with missing classes. This will be corrected soon. The evaluation of the geometric classes is fine. PASCAL-Context models: trained online with high momentum on an object and scene labeling of PASCAL VOC. FCN-32s PASCAL-Context: single stream, 32 pixel prediction stride net FCN-16s PASCAL-Context: two stream, 16 pixel prediction stride net FCN-8s PASCAL-Context: three stream, 8 pixel prediction stride net Frequently Asked Questions Is learning the interpolation necessary? In our original experiments the interpolation layers were initialized to bilinear kernels and then learned. In follow-up experiments, and this reference implementation, the bilinear kernels are fixed. There is no significant difference in accuracy in our experiments, and fixing these parameters gives a slight speed-up. Note that in our networks there is only one interpolation kernel per output class, and results may differ for higher-dimensional and non-linear interpolation, for which learning may help further. Why pad the input?: The 100 pixel input padding guarantees that the network output can be aligned to the input for any input size in the given datasets, for instance PASCAL VOC. The alignment is handled automatically by net specification and the crop layer. It is possible, though less convenient, to calculate the exact offsets necessary and do away with this amount of padding. Why are all the outputs/gradients/parameters zero?: This is almost universally due to not initializing the weights as needed. To reproduce our FCN training, or train your own FCNs, it is crucial to transplant the weights from the corresponding ILSVRC net such as VGG16. The included surgery.transplant() method can help with this. What about FCN-GoogLeNet?: a reference FCN-GoogLeNet for PASCAL VOC is coming soon.帮我翻译一下
05-31
评论 8
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

花生米生花@

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值