2018.02.27 leaning journal
by 赵木木
1. caffe-segnet 训练探索
上一篇:http://blog.youkuaiyun.com/linwantian/article/details/79377588
已经将segnet-caffe的训练做了说明;
对在训练过程中遇到的问题进行记录:
键入训练命令:
relaybot@ubuntu:~/mumu$ ./SegNet/caffe-segnet/build/tools/caffe train -gpu 0 -solver ./SegNet/Models/segnet_solver.prototxt
segnet-caffe 训练遇到错误:Check failed: error == cudaSuccess (2 vs. 0) out of memory
网上有解释是:batch_size太大了,一次性读入的图片太多了,所以就超出了显存。因此需要将train.prototxt中的layer的batch_size调小一点。
我的笔记本的显卡比较菜鸟:将其改为1才成功跳过这个bug
然后是训练时间耗时很长,我在Models/segnet_solver.prototxt
文件中修改了下训练的大小,有些名字不太懂,其实这类问题都可以上 github 的 Issues 中找,后面有些问题也是通过查看Issues解决的。
前面在Models/segnet_solver.prototxt
文件中,将max_iter
设置为1000,也就是需要迭代1000次才能训练结束,display
设置为20,也就是每迭代20次显示一次迭代信息,我的笔记本这次训练耗时一个小时,换成别的显卡肯定快许多。
I0227 09:27:29.151870 3754 solver.cpp:228] Iteration 960, loss = 0.465969
I0227 09:27:29.151957 3754 solver.cpp:244] Train net output #0: accuracy = 0.255446
I0227 09:27:29.151971 3754 solver.cpp:244] Train net output #1: loss = 0.465969 (* 1 = 0.465969 loss)
I0227 09:27:29.151979 3754 solver.cpp:244] Train net output #2: per_class_accuracy = 0
I0227 09:27:29.151986 3754 solver.cpp:244] Train net output #3: per_class_accuracy = 0.0160278
I0227 09:27:29.151993 3754 solver.cpp:244] Train net output #4: per_class_accuracy = 0
I0227 09:27:29.151998 3754 solver.cpp:244] Train net output #5: per_class_accuracy = 0.949324
I0227 09:27:29.152004 3754 solver.cpp:244] Train net output #6: per_class_accuracy = 0.106586
I0227 09:27:29.152011 3754 solver.cpp:244] Train net output #7: per_class_accuracy = 0
I0227 09:27:29.152017 3754 solver.cpp:244] Train net output #8: per_class_accuracy = 0
I0227 09:27:29.152024 3754 solver.cpp:244] Train net output #9: per_class_accuracy = 0
I0227 09:27:29.152029 3754 solver.cpp:244] Train net output #10: per_class_accuracy = 0
I0227 09:27:29.152035 3754 solver.cpp:244] Train net output #11: per_class_accuracy = 0
I0227 09:27:29.152040 3754 solver.cpp:244] Train net output #12: per_class_accuracy = 0
I0227 09:27:29.152048 3754 sgd_solver.cpp:106] Iteration 960, lr = 0.01
I0227 09:28:46.873960 3754 solver.cpp:228] Iteration 980, loss = 0.859281
I0227 09:28:46.874079 3754 solver.cpp:244] Train net output #0: accuracy = 0.473837
I0227 09:28:46.874097 3754 solver.cpp:244] Train net output #1: loss = 0.859281 (* 1 = 0.859281 loss)
I0227 09:28:46.874106 3754 solver.cpp:244] Train net output #2: per_class_accuracy = 0.978743
I0227 09:28:46.874114 3754 solver.cpp:244] Train net output #3: per_class_accuracy = 0.108868
I0227 09:28:46.874120 3754 solver.cpp:244] Train net output #4: per_class_accuracy = 0.0167785
I0227 09:28:46.874127 3754 solver.cpp:244] Train net output #5: per_class_accuracy = 0.918752
I0227 09:28:46.874135 3754 solver.cpp:244] Train net output #6: per_class_accuracy = 0.675187
I0227 09:28:46.874141 3754 solver.cpp:244] Train net output #7: per_class_accuracy = 0.00214325
I0227 09:28:46.874147 3754 solver.cpp:244] Train net output #8: per_class_accuracy = 0.915285
I0227 09:28:46.874155 3754 solver.cpp:244] Train net output #9: per_class_accuracy = 0
I0227 09:28:46.874161 3754 solver.cpp:244] Train net output #10: per_class_accuracy = 0
I0227 09:28:46.874168 3754 solver.cpp:244] Train net output #11: per_class_accuracy = 0.919846
I0227 09:28:46.874174 3754 solver.cpp:244] Train net output #12: per_class_accuracy = 0
I0227 09:28:46.874182 3754 sgd_solver.cpp:106] Iteration 980, lr = 0.01
I0227 09:30:00.935937 3754 solver.cpp:454] Snapshotting to binary proto file /home/relaybot/mumu/SegNet/Models/Training/segnet_iter_1000.caffemodel
I0227 09:30:03.815346 3754 sgd_solver.cpp:273] Snapshotting solver state to binary proto file /home/relaybot/mumu/SegNet/Models/Training/segnet_iter_1000.solverstate
I0227 09:30:05.194144 3754 solver.cpp:317] Iteration 1000, loss = 0.789985
I0227 09:30:05.194195 3754 solver.cpp:337] Iteration 1000, Testing net (#0)
I0227 09:31:08.651491 3754 solver.cpp:404] Test net output #0: accuracy = 0.474696
I0227 09:31:08.651664 3754 solver.cpp:404] Test net output #1: loss = 0.772579 (* 1 = 0.772579 loss)
I0227 09:31:08.651679 3754 solver.cpp:404] Test net output #2: per_class_accuracy = 0.945665
I0227 09:31:08.651688 3754 solver.cpp:404] Test net output #3: per_class_accuracy = 0.0743196
I0227 09:31:08.651698 3754 solver.cpp:404] Test net output #4: per_class_accuracy = 0.0442652
I0227 09:31:08.651707 3754 solver.cpp:404] Test net output #5: per_class_accuracy = 0.868098
I0227 09:31:08.651716 3754 solver.cpp:404] Test net output #6: per_class_accuracy = 0.348668
I0227 09:31:08.651726 3754 solver.cpp:404] Test net output #7: per_class_accuracy = 0.0093155
I0227 09:31:08.651734 3754 solver.cpp:404] Test net output #8: per_class_accuracy = 0.809386
I0227 09:31:08.651743 3754 solver.cpp:404] Test net output #9: per_class_accuracy = 0
I0227 09:31:08.651751 3754 solver.cpp:404] Test net output #10: per_class_accuracy = 0.0136846
I0227 09:31:08.651759 3754 solver.cpp:404] Test net output #11: per_class_accuracy = 0.655417
I0227 09:31:08.651768 3754 solver.cpp:404] Test net output #12: per_class_accuracy = 0.0351428
I0227 09:31:08.651777 3754 solver.cpp:322] Optimization Done.
I0227 09:31:08.651783 3754 caffe.cpp:254] Optimization Done.
relaybot@ubuntu:~/mumu$
训练结束,在/SegNet/Models/Training
文件夹底下输出结果:
按照上一篇:http://blog.youkuaiyun.com/linwantian/article/details/79377588 中的步骤生成权重文件;
需要提前在目录下新建文件夹:
/SegNet/Models/Inference/
键入命令:
relaybot@ubuntu:~/mumu$ python ./SegNet/Scripts/compute_bn_statistics.py ./SegNet/Models/segnet_train.prototxt ./SegNet/Models/Training/segnet_iter_1000.caffemodel ./SegNet/Models/Inference/
生成权重文件:
键入命令显示图像输出结果,此部分代码是python文件:test_segmentation_camvid.py
,其中事先在文件中更改目录;
relaybot@ubuntu:~/mumu$ python ./SegNet/Scripts/test_segmentation_camvid.py --model ./SegNet/Models/segnet_inference.prototxt --weights ./SegNet/Models/Inference/test_weights.caffemodel --iter 233
会生成三张图,需要将三张图都关闭后,才会显示下一个图片对象的输入输出结果;