Caffe学习(八) unbutun18.04下配置caffe-ocr
前言
鉴于https://github.com/senlinuc/caffe_ocr没有linux版本,这里记录下如何实现window版本转linux版本
准备阶段
https://github.com/senlinuc/caffe_ocr
https://github.com/BVLC/caffe
安装caffe
这里主要是在unbutun上安装caffe版本,不做过多叙述
移植过程
整体过程剖析
caffe_ocr主要包含几个点:
1、caffe 多标签分类
2、DenseBlock(DenseBlock_layer.cpp)层的添加
3、lstm的替换
4、ctcloss的添加
说明:特别说明一个点,如果想要快速进行caffe的使用,可以采用替换DenseBlock和忽略lstm的方式,即DenseBlock可以用卷积层替代(DenseBlock_layer.cpp不仅进行了对DenseBlock的整合,还进行了一定的速度优化),而实际ocr可以直接cnn+ctc直接实现end-to-end的结果,lstm增加了时序逻辑,在实际长短文变化的情况效果会好。
caffe多标签分类
这里可以参考
https://blog.youkuaiyun.com/mqyw29995/article/details/111541004
注意这里不需要修改支持浮点型数据输入。
按照caffe_ocr的作者的方式进行
https://blog.youkuaiyun.com/hubin232/article/details/50960201
参考这个进行修改,这里我进行简化的概括下
需要修改:io.hpp、 io.cpp、data_layers.hpp、caffe.proto、data_layer.cpp、image_data_layer.cpp、memory_data_layer.cpp
而convert_imageset 是生成lmdb的我们不用这个,所以可以先忽略
1、io.hpp
路径:caffe-master\include\caffe\util\io.hpp
修改点:line 128:函数修改为支持多标签,这里可以考虑在最后一行添加(c++支持函数重载)
//###
bool ReadFileToDatum(const string& filename, const vector<int> label, Datum* datum);
//###
inline bool ReadFileToDatum(const string& filename, Datum* datum) {
return ReadFileToDatum(filename, vector<int>(), datum);
}
//###
bool ReadImageToDatum(const string& filename, const vector<int> label,
const int height, const int width, const bool is_color,
const std::string & encoding, Datum* datum);
//###
inline bool ReadImageToDatum(const string& filename, const vector<int> label,
const int height, const int width, const bool is_color, Datum* datum) {
return ReadImageToDatum(filename, label, height, width, is_color,
"", datum);
}
//###
inline bool ReadImageToDatum(const string& filename, const vector<int> label,
const int height, const int width, Datum* datum) {
return ReadImageToDatum(filename, label, height, width, true, datum);
}
//###
inline bool ReadImageToDatum(const string& filename, const vector<int> label,
const bool is_color, Datum* datum) {
return ReadImageToDatum(filename, label, 0, 0, is_color, datum);
}
//###
inline bool ReadImageToDatum(const string& filename, const vector<int> label,
Datum* datum) {
return ReadImageToDatum(filename, label, 0, 0, true, datum);
}
//###
inline bool ReadImageToDatum(const string& filename, const vector<int> label,
const std::string & encoding, Datum* datum) {
return ReadImageToDatum(filename, label, 0, 0, true, encoding, datum);
}
2、io.cpp
路径:caffe-master\include\caffe\util\io.cpp
修改点:line 119:函数修改为支持多标签,这里可以考虑在函数后添加函数(c++支持函数重载)
line 200:函数修改为支持多标签,这里可以考虑在函数后添加函数(c++支持函数重载)
bool ReadImageToDatum(const string& filename, const vector<int> label,
const int height, const int width, const bool is_color,
const std::string & encoding, Datum* datum) {
cv::Mat cv_img = ReadImageToCVMat(filename, height, width, is_color);
if (cv_img.data) {
if (encoding.size()) {
if ((cv_img.channels() == 3) == is_color && !height && !width &&
matchExt(filename, encoding))
return ReadFileToDatum(filename, label, datum);
std::vector<uchar> buf;
cv::imencode("." + encoding, cv_img, buf);
datum->set_data(std::string(reinterpret_cast<char*>(&buf[0]),
buf.size()));
//###
//datum->set_label(label);
for (int i = 0; i < label.size(); i++){
datum->add_labels(label[i]);
}
datum->set_encoded(true);
return true;
}
CVMatToDatum(cv_img, datum);
//###
//datum->set_label(label);
for (int i = 0; i < label.size(); i++){
datum->add_labels(label[i]);
}
return true;
}
else {
return false;
}
}
//###
bool ReadFileToDatum(const string& filename, const vector<int> label,
Datum* datum) {
std::streampos size;
fstream file(filename.c_str(), ios::in | ios::binary | ios::ate);
if (file.is_open()) {
size = file.tellg();
std::string buffer(size, ' ');
file.seekg(0, ios::beg);
file.read(&buffer[0], size);
file.close();
datum->set_data(buffer);
datum->clear_labels();
for (int i = 0; i < label.size(); i++){
datum->add_labels(label[i]);
}
datum->set_encoded(true);
return true;
}
else {
return false;
}
}
3、 caffe.proto
路径:caffe-master\src\caffe\proto\caffe.proto
修改点:line 41 添加多分类标签变量
line 835 添加多分类标签变量
epeated float labels = 8;
//###
optional uint32 labels_size = 13 [default = 1];
4、data_layer.cpp
路径:caffe-master\src\caffe\layers\data_layer.cpp
修改点:line 51 替换为读取多标签格式
line 111 替换为读取多标签格式
vector<int> label_shape(2);
label_shape[0] = batch_size;
label_shape[1] = datum.labels_size();
int labelSize = datum.labels_size();
for (int i = 0; i < labelSize; i++){
top_label[item_id*labelSize + i] = datum.labels(i);
}
batch->data_.Reshape(top_shape);
修改完上述之后,直接make all -j 8
这样便是成功情况。
caffe添加DenseBlock层
DenseBlock层可以使用Tongcheng的,caffe_ocr引用的也是这个,这里我是直接用caffe_ocr中的
https://github.com/Tongcheng/caffe
1、分别复制 src\caffe\layers 中的DenseBlock_layer.cpp 和 DenseBlock_layer.cu 到目标文件夹下的 src\caffe\layers 中
2、复制 include\caffe\layers 中的 DenseBlock_layer.hpp 到目标文件夹下的 include\caffe\layers 中
3、修改 caffe-master\src\caffe\proto\caffe.proto
line:425 message LayerParameter 中添加
optional DenseBlockParameter denseblock_param = 200;
line: 428
message DenseBlockParameter {
//if you don't use BC, BN-ReLU-Conv3 is one transition
//if you use BC, then BN-ReLU-Conv1-BN-ReLU-Conv3 is one transition
optional int32 numTransition = 1 [default = 40];
optional int32 initChannel = 2 [default = 16];
optional int32 growthRate = 3 [default = 12];
//Convolution related parameters
optional int32 pad_h = 4 [default = 1];
optional int32 pad_w = 5 [default = 1];
optional int32 conv_verticalStride = 6 [default = 1];
optional int32 conv_horizentalStride = 7 [default = 1];
optional int32 filter_H = 8 [default = 3];
optional int32 filter_W = 9 [default = 3];
optional FillerParameter Filter_Filler = 10;
//BN related parameters
optional FillerParameter BN_Scaler_Filler = 11;
optional FillerParameter BN_Bias_Filler = 12;
//Performance Related parameters
optional int32 gpuIdx = 15 [default = 0];
//Dropout related parameter
optional bool use_dropout = 16 [default = false];
optional float dropout_amount = 17 [default = 0];
optional bool use_BC = 18 [default = false];
//If it is not ultra space efficient, then it stores output data of conv1
optional bool BC_ultra_space_efficient = 19 [default = false];
optional int32 workspace_MB = 20 [default = 8];
optional float moving_average_fraction = 21 [default = 0.1];
}
然后再重新编译,顺利通过就说明没有问题
这里需要注意的是需要再makeconfig中开始use_cudnn否则Denseblock编译会报错,或者屏蔽DenseBlock_layer.cu 以及 DenseBlock_layer.hpp 中相应调用cudnn的模块
caffe添加transpose层
同上操作类似,直接应用caffe_ocr中的transpose_layer
1、分别复制 src\caffe\layers 中的transpose_layer.cpp 和 transpose_layer.cu 到目标文件夹下的 src\caffe\layers 中
2、复制 include\caffe\layers 中的 transpose_layer.hpp 到目标文件夹下的 include\caffe\layers 中
3、修改 caffe-master\src\caffe\proto\caffe.proto
line 247
optional TransposeParameter transpose_param=201;
line 1487 文件最末尾添加(说明:添加位置并无特定需求)
message TransposeParameter {
// For example, if you want to transpose NxCxHxW into WxNxHxC,
// the parameter should be the following:
// transpose_param { dim: 3 dim: 0 dim: 2 dim: 1 }
// ie, if the i-th dim has value n, then the i-th axis of top is equal to the n-th axis of bottom.
repeated int32 dim=1;
}
然后再重新编译,顺利通过就说明没有问题
caffe添加ctc层
同上操作类似,直接应用caffe_ocr中相关文件
1、分别复制 src\caffe\layers 中的ctc_decoder_layer.cpp 、ctcpp_entrypoint.cpp、ctcpp_entrypoint.cu 、warp_ctc_loss_layer.cpp、warp_ctc_loss_layer.cu 到目标文件夹下的 src\caffe\layers 中
2、复制 include\caffe\layers 中的 ctc_decoder_layer.hpp 和 warp_ctc_loss_layer.hpp 到目标文件夹下的 include\caffe\layers 中
复制 include\ 中的 ctc.h、ctcpp.h、detail、contrib 到目标文件夹下的 include\ 中
3、修改 caffe-master\src\caffe\proto\caffe.proto
line 429
optional CTCLossParameter ctc_loss_param = 202;
optional CTCDecoderParameter ctc_decoder_param = 203;
line 699
//###
message CTCDecoderParameter {
// The index of the blank index in the labels. A negative (default)
// value will use the last index
optional int32 blank_index = 1 [default = 0];
// Collapse the repeated labels during the ctc calculation
// e.g. collapse [0bbb11bb11bb0b2] to [01102] instead of [0111102],
// where b means blank label.
// The default behaviour is to merge repeated labels.
// Note: blank labels will be removed in any case.
optional bool ctc_merge_repeated = 2 [default = true];
}
//###
message CTCLossParameter {
// Adds delayed output to the CTC loss calculation (untested!)
optional int32 output_delay = 1 [default = 0];
// The index of the blank index in the labels. A negative (default)
// value will use the last index
optional int32 blank_index = 2 [default = 0];
// Collapse repeating labels of the target sequence before calculating
// the loss and the gradients (e.g. collapse [01102] to [0102])
// The default behaviour is to keep repeated labels. Elsewise the
// network will not learn to predict repetitions.
optional bool preprocess_collapse_repeated = 3 [default = false];
// Collapse the repeated labels during the ctc calculation
// e.g collapse [0bbb11bb11bb0b2] to [01102] instead of [0111102],
// where b means blank label.
// The default behaviour is to merge repeated labels.
// Note: blank labels will be removed in any case.
optional bool ctc_merge_repeated = 4 [default = true];
/// This parameter is for test cases only!
/// The time for which to calculate the loss (see Graves Eq. (7.27) )
/// Note that the result must be the same for each 0 <= t < T
/// Therefore you can chose an arbitrary value, default 0
optional int32 loss_calculation_t = 5 [default = 0];
}
注意CTC中需要用到线程,如果没有添加的可以自行再makefile里添加
caffe添加其它层
1、复制 src\caffe\layers 中的reduce.cu 到目标文件夹下的 src\caffe\layers 中
复制 interp.cpp、interp.cu 到目标文件夹下的 src\caffe\util 中
2、复制 include\caffe\util 中的 interp.hpp 到目标文件夹下的 include\caffe\util 中
复制 include\ 中的 common.cuh 到目标文件夹下的 include\ 中
3、修改 caffe-master\src\caffe\proto\caffe.proto
line 434
optional InterpParameter interp_param = 204;
line 952
message InterpParameter {
optional int32 height = 1 [default = 0]; // Height of output
optional int32 width = 2 [default = 0]; // Width of output
optional int32 zoom_factor = 3 [default = 1]; // zoom factor
optional int32 shrink_factor = 4 [default = 1]; // shrink factor
optional int32 pad_beg = 5 [default = 0]; // padding at begin of input
optional int32 pad_end = 6 [default = 0]; // padding at end of input
}
完成上面的所有后,进行编译并生成pycaffe后,便可以先测试densnet+lstm的模型
caffe添加reverse_layer层
1、复制 src\caffe\layers 中的reverse_layer.cpp、reverse_layer.cu、reverse_time_layer.cpp、reverse_time_layer.cu 到目标文件夹下的 src\caffe\layers 中
2、复制 include\caffe\layers 中的 reverse_layer.hpp、reverse_time_layer.hpp 到目标文件夹下的 include\caffe\layers 中
3、修改 caffe-master\src\caffe\proto\caffe.proto
lin 435
optional ReverseParameter reverse_param = 205;
optional ReverseTimeParameter reverse_time_param = 206;
line 1199
//###
message ReverseParameter {
// axis controls the data axis which shall be inverted.
// The layout of the content will not be inverted
//
// The default axis is 0 that means:
// data_previous[n] == data_afterwards[N - n -1]
// where N is the shape of axis(n)
//
// Usually this layer will be used with recurrent layers to invert the
// time axis which is axis 0
// This layer will therefore swap the order in time but not the
// order of the actual data.
optional int32 axis = 1 [default = 0];
}
//###
message ReverseTimeParameter {
// if true the rest of the sequence will not be reversed but copied
// if false no more operation will be performed for the reset.
// this can lead to random numbers in the blob.
optional bool copy_remaining = 1 [default = false];
}
编译
测试预测效果
顺利完成上面的过程后,就可以开始测试caffe_ocr上提供的模型。
测试densenet-res-blstm模型
训练
再进行训练前需要再进行caffe的修改
1、修改image_data_layer.hpp
vector<std::pair<std::string, std::vector<int> > > lines_;
vector<std::pair<std::string, std::vector<float> > > regression_lines_;
2、复制 group_image_data_layer.hpp 到 caffe-master\include\caffe\layers
3、修改memory_data_layer.hpp
int batch_size_, channels_, height_, width_, size_, label_size_;
4、修改math_function.hpp
template <typename Dtype>
void caffe_bound(const int n, const Dtype* a, const Dtype min,
const Dtype max, Dtype* y);
5、修改solver.hpp
/**
* @brief Solver that only computes gradients, used as worker
* for multi-GPU training.
*/
template <typename Dtype>
class WorkerSolver : public Solver<Dtype> {
public:
explicit WorkerSolver(const SolverParameter& param,
const Solver<Dtype>* root_solver = NULL)
: Solver<Dtype>(param, root_solver) {}
protected:
void ApplyUpdate() {}
void SnapshotSolverState(const string& model_filename) {
LOG(FATAL) << "Should not be called on worker solver.";
}
void RestoreSolverStateFromBinaryProto(const string& state_file) {
LOG(FATAL) << "Should not be called on worker solver.";
}
void RestoreSolverStateFromHDF5(const string& state_file) {
LOG(FATAL) << "Should not be called on worker solver.";
}
};
6、复制 softmax_loss_layer_multi_label.hpp 到 caffe-master\include\caffe\layers
7、复制替换accuracy_layer.cpp(复制 accuracy_layer.cpp 到 caffe-master\src\caffe\layers)
8、修改data_layer.cpp
if (rand_skip_num_ > 0)
{
unsigned int skip = caffe_rng_rand() % rand_skip_num_;
unsigned int k = 0;
while (k<skip) {
Next();
k++;
}
LOG_IF(INFO, Caffe::root_solver())
<< "skip " << skip;
rand_skip_num_ = 0;//skip once
}
9、修改memory_data_layer.cpp (共3处)
label_size_ = this->layer_param_.memory_data_param().label_size();
···
vector<int> label_shape;
for (int item_id = 0; item_id < num; ++item_id)
{
for (int i = 0; i < label_size_;i++)
top_label[item_id*label_size_ + i] = labels_[item_id*label_size_ + i];
}
added_label_.Reshape(batch_size_, label_size_, 1, 1);
...
top[1]->Reshape(batch_size_, label_size_, 1, 1);
10、复制 softmax_loss_layer_multi_label.cpp、softmax_loss_layer_multi_label.cu 到 caffe-master\src\caffe\layers
11、修改caffe.proto(共5处)
// add noise when train
optional bool add_noise = 8 [default = false];
// noise ratio
optional float noise_ratio = 9;
// If we want to do data augmentation, Scaling factor for randomly scaling input images
repeated float scale_factors = 10;
// the width for cropped region
optional uint32 crop_width = 11 [default = 0];
// the height for cropped region
optional uint32 crop_height = 12 [default = 0];
optional bool update_global_stats = 4 [default = false];
optional uint32 task_class_num = 11 [default = 1];
optional uint32 task_class_num = 14 [default = 1];
optional bool regression = 15 [default = false];
optional uint32 label_size = 5;
11、blocking_queue.cpp 复制到caffe-master\src\caffe\util 替换原文件
12、math_functions.cpp 复制到caffe-master\src\caffe\util 替换原文件
13、data_reader.cpp 复制到caffe-master\src\caffe
14、data_transformer.cpp 复制到caffe-master\src\caffel 替换原文件
15、复制 lstm_layer_Junhyuk.cpp、lstm_layer_Junhyuk.cu 到 caffe-master\src\caffel\layer
复制 lstm_layer_Junhyuk.hpp 到 caffe-master\include\caffel\layer
修改caffe.proto
optional LSTMParameter lstm_param = 207;
message LSTMParameter {
optional uint32 num_output = 1; // The number of outputs for the layer
optional float clipping_threshold = 2 [default = 0.0];
optional FillerParameter weight_filler = 3; // The filler for weight
optional FillerParameter bias_filler = 4; // The filler for the bias
optional uint32 batch_size = 5 [default = 1];
}
说明:
如果出现编译错误直接替换data_layer.cpp、image_data_layer.cpp、io.cpp
替换data_layer.hpp、io.hpp
复制data_reader.hpp到 caffe-master\include
修改 带***部分
message Datum {
optional int32 channels = 1;
optional int32 height = 2;
optional int32 width = 3;
// the actual image data, in bytes
optional bytes data = 4;
repeated int32 label = 5; //***
//optional int32 label = 5; //### ***
// Optionally, the datum could also hold float data.
repeated float float_data = 6;
// If true data contains an encoded image that need to be decoded
optional bool encoded = 7 [default = false];
//repeated float labels = 8; //### ***
}
若example内有保存,直接删除相应文件夹即可(共删除三个文件夹)
完成上述操作后再编译,成功后就可以测试训练情况
实测训练
1、准备训练集,这里注意的是ocr支持不定长但输入的数据需要定长,即补0至同一长度
2、训练测试
就此完成整个的移植
最后整个工程放在github上
https://github.com/mqyw/caffe-ocr-linxu-master