一、背景
大家经常会在Python环境下用TensorFlow训练自己的深度学习模型。为了日后能够使用训练好的模型,在Python环境下,TensorFlow提供了
tf.train.Saver
类,用来保存模型。这个类提供了一整套函数用于方便的存储和恢复模型!
但实际情况是:大多数人会用Python环境下TensorFlow训练模型,而在实际的预测任务时,希望用C/C++语言来实现模型并加载已经训练好的参数进行预测。虽然TensorFlow也提供了C++接口,但有几个现实问题困扰着我们:
1、直接用tf.train.Saver类存储的模型数据量很大,AlexNet几十个卷积核时的模型约为两三百兆,还是很大的!
2、C++编译TensorFlow很麻烦
3、没经过精简的TensorFlow比较庞大,在移动端几乎运行不起来!
号外:最近Google针对移动端深度学习框架发布了TensorFlow Lite,据说连训练模型都能在手机上跑了!这岂不是可以边学习边预测了!要这样的话手机就真的智能了!
----分割线----
另外:opencv3.3也已经实现了直接加载TensorFlow模型的DNN模块!具体可阅读:
https://github.com/opencv/opencv/tree/master/modules/dnn
opencv的DNN模块可以直接用C++环境加载TensorFlow训练好的模型,都不用你自己再用C++实现模型了,确实是太方便了!不过问题也是模型容量大,而且在Python环境下存模型时必须存整个模型(默认设置)!
二、Python环境下用numpy.savez()函数存模型的权值矩阵W和偏置B
其实训练模型,最主要的就是训练权值W和偏置B。只有把这些数据存下来,就意味着把模型存下来了!然后你可以用任意语言重写模型,加载这些训练好的W和B就行了!下边举例介绍在Python环境下用numpy提供的savez()函数存储W和B。
直接贴代码:
# -*- coding=UTF-8 -*-
import sys
import os
import random
import cv2
import math
import numpy as np
import tensorflow as tf
def weight_variable(shape):
initial = tf.truncated_normal(shape, mean = 0.0, stddev = 0.1, dtype = tf.float32)
return tf.Variable(initial)
def bias_varibale(shape):
initial = tf.constant(0.123, shape = shape)
return tf.Variable(initial)
def conv2d(x, w):
# x shape is [batch, image_hei, image_wid, image_channel]
# w shape is [kernel_hei, kernel_hei, image_channel, kernel_number]
return tf.nn.conv2d(x, w, strides=[1,1,1,1], padding='SAME')
# input data format
inShape = (5, 5, 2) # (rows, cols, channels)
# for simpllicity, here we use one sample as input,
# this means the batch = 1
aSample = weight_variable([1, inShape[0], inShape[1], inShape[2]])
# define CNN model -----------------------------#
# Layer 0 : convolutional layer
L0_KerSize = 3
L0_KerNum = 4
L0_W = weight_variable ([L0_KerSize, L0_KerSize, inShape[2], L0_KerNum])
L0_B = bias_varibale ([L0_KerNum])
L0_Out = tf.nn.relu(conv2d(aSample, L0_W) + L0_B)
with tf.Session() as session:
session.run(tf.initialize_all_variables())
W = session.run(L0_W)
print '---- L0_W.shape = ', W.shape, '----'
print '>> The 1st kernel for the 1st channel of input data: '
print W[:, :, 0, 0]
print '>> The 2cd kernel for the 1st channel of input data: '
print W[:, :, 0, 1]
rs = session.run(L0_Out)
B = session.run(L0_B)
print '---- L0_B.shape = ', B.shape, '----'
print B
print '---- L0_Out.shape = ', rs.shape, '----'
print rs[0,:,:,0]
print rs[0,:,:,1]
print rs[0,:,:,2]
print rs[0,:,:,3]
# save model
np.savez('./model.npz', \
L0_W = session.run(L0_W), \
L0_B = session.run(L0_B))
# save the sample
np.savez('./sample.npz', session.run(aSample))
上边代码中,实现了一个卷积层,每个卷积核大小3*3。由于定义的输入数据(图像)有两个通道,针对每个通道的卷积核有4个。所以卷积核的数量为2*4 = 8个。其中的W和B都用随机值填充(就当时训练好的数据哈)!
运行上边的代码,结果中有:
可以看到W的shape为[3,3,2,4],B的shape为[4];同时列出了针对第一个通道的前两个卷积核的值,和B的值!
注意:B是针对输出数据的,由于输出数据为4通道,所以B就4个值。虽然有8个卷积核,但B和输入通道数量无关!
在代码的最下边的几行,用numpy.savez()函数保存了W和B,文件名为:model.npz。
三、在C++环境下用cnpy库加载W和B
关于cnpy库呢,是国外一小哥写的,比较简单!有源码:
https://github.com/rogersce/cnpy.git
如果不愿意分析源码,直接安装并按照其提供的例子调用相关函数即可!
cnpy的安装:
0. Git上clone源码
1. 没安装camke的童鞋,请自行安装camke哈
2. cd到源码目录下,终端输入命令mkdir build创建一个build文件夹:create a build directory, say $HOME/build
3. cd $HOME/build
4. cmake /path/to/cnpy
5. make
6. make install
7. ldconfig设置环境
好,接下来贴出C++源码:
#include <iostream>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <string>
#include <dirent.h>
#include <unistd.h>
#include <vector>
#include <sstream>
#include <fstream>
#include <sys/io.h>
#include <sys/times.h>
#include <iomanip>
#include <tuple>
using namespace std;
/************************************************
* About cnpy, Please consult: https://github.com/rogersce/cnpy.git
*
* npz_load(fname,varname) will load and return the NpyArray for
* data varname from the specified .npz file.
*
The data structure for loaded data is below.
Data is accessed via the data<T>() method, which returns
a pointer of the specified type (which must match the underlying
datatype of the data). The array shape and
word size are read from the npy header.
struct NpyArray {
std::vector<size_t> shape;
size_t word_size;
template<typename T> T* data();
};
*/
#include "cnpy.h"
#include <complex>
#include <cstdlib>
static bool LoadModelFromFile(string strFile)
{
if (access(strFile.c_str(), 0) == -1) {
cout << ">> error. File not exists. Info = " << strFile.c_str() << endl;
return false;
}
cnpy::npz_t npzData = cnpy::npz_load(strFile);
// W ---------------------------------------//
if (1) {
cnpy::NpyArray arr = npzData["L0_W"];
cout << ">> L0_W shape (";
for (int i = 0; i < (int)arr.shape.size(); i++)
cout << arr.shape[i] << ", " ;
cout << ")" << endl;
// Please attention: if dtype = tf.float32 in tensorflow, here the data type
// must be float, if you use double, the data will be wrong.
float *mv1 = arr.data<float>();
int nOffset0 = arr.shape[1]*arr.shape[2]*arr.shape[3];
int nOffset1 = arr.shape[2]*arr.shape[3];
int nOffset2 = arr.shape[3];
cout << mv1[0] << endl;
cout << ">> The 1st kernel for the 1st channel of input data:" << endl;
for (int r = 0; r < arr.shape[0]; r++) {
for (int c = 0; c < arr.shape[1]; c++) {
for (int chan = 0; chan < arr.shape[2]; chan++) {
if (chan != 0)
continue;
for (int k = 0; k < arr.shape[3]; k++) {
if (k != 0)
continue;
cout << setw(12) << setiosflags(ios::fixed) << setprecision(5)
<< mv1[r*nOffset0 + c*nOffset1 + chan*nOffset2 + k];
if (c == arr.shape[1] - 1)
cout << endl;
}
}
}
}
}
// B ---------------------------------------//
if (1) {
cnpy::NpyArray arr = npzData["L0_B"];
cout << ">> L0_B shape (";
for (int i = 0; i < (int)arr.shape.size(); i++)
cout << arr.shape[i] << ", " ;
cout << ")" << endl;
float *mv1 = arr.data<float>();
for (int i = 0; i < arr.shape[0]; i++) {
cout << setw(12) << setiosflags(ios::fixed) << setprecision(5) << mv1[i];
}
cout << endl;
}
return true;
}
int main(int argc, char** argv)
{
cout << "# STA ##############################" << endl;
cout << "\n" << endl;
LoadModelFromFile("./model.npz");
cout << "\n" << endl;
cout << "# END ##############################" << endl;
return 0;
}
实际运行的效果:
针对第一个通道的第一个卷积核:
针对第一个通道的第二个卷积核:
可以看出数据是一致的!需要注意的是:如果TensorFlow中的数据类型为tf.float32,则cnpy中要用float,不能用double,否则数据就乱了!!!
忘了贴makefile了:
CPP=g++
CPPFLAGS+=-fpermissive -Wsign-compare -Ofast -std=c++11
INCLUDE+=-I/usr/local/include/
MKDEP=gcc -E -MM
LIBINC=-L/usr/local/lib
SRCS =
DESS = 1.cpp
OBJS=$(SRCS:%.cpp=%.o)
EXES=$(DESS:%.cpp=%.exec)
LIBS= -lcnpy -lz
all: $(OBJS) $(EXES)
.cpp.o:
$(CPP) $(CPPFLAGS) $(INCLUDE) -c $< -o $@
%.exec: %.cpp $(OBJS) .depend
$(CPP) $(CPPFLAGS) $(INCLUDE) $< -o $@ $(LIBINC) $(OBJS) $(LIBS)
.depend: makefile
$(MKDEP) $(INCLUDE) $(SRCS) $(DESS) --std=c++11 > .depend
ifeq (.depend,$(wildcard .depend))
include .depend
endif
clean:
$(RM) $(OBJS) .depend
$(RM) $(EXES) .depend
注释:1.cpp是源码文件!把源码复制后保存成cpp文件,如果名字变了,此处改一下即可!别的一般不用改!
以上文件可git下载:git@code.youkuaiyun.com:guoyunfei20/writewandb_loadincpp.git
四、什么!如何用C++实现TensorFlow训练的模型用于预测(前向传播)?
当然是依靠Eigen了,后续贴代码哈!