本文介绍如何利用ONNXRuntime推理引擎,将TensorFlow2.x训练好的分割模型部署到工程项目中。当然,如果训练的框架是使用Pytorch,只要成功导出.onnx模型,后续的部署是一样的流程。
可以到代码仓库下载代码、模型和测试图片,进行测试:
GitCode:https://gitcode.com/weixin_43013458/onnx_deploy/overview
1. 环境版本
环境/包 | 版本 | 备注 |
---|---|---|
TensorFlow | 2.5.0 | |
onnx | 1.9.0 | 详见*1 |
tf2onnx | 1.9.1 | 详见*2, *3 |
onnxruntime-gpu | 1.14.1 | 详见*4 |
visual studio | 2022 | |
opencv | 4.5.1 | 没有特别要求 |
备注:
*1 根据所使用的TensorFlow找对应的tf2onnx版本
比如,如果tf2onnx版本太老,会出现以下报错:
ModuleNotFoundError: No module named ‘tensorflow.tools.graph_transforms’
这是因为tf2onnx版本太老,用到了tf1.x的库
*2 onnx和tf2onnx版本要匹配
确定了tf2onnx版本后,要找相应的onnx版本,用pip安装tf2onnx的时候会默认安装onnx,注意检查版本。比如,
ImportError: DLL load failed while importing onnx_cpp2py_export: 动态链接库(DLL)初始化例程失败。
这是onnx和tf2onnx版本不匹配导致报错
*3 tf2onnx版本要支持Opeset
在onnx转换时有Opeset参数,高的Opset支持更多的模型函数操作,Opset低了可能导致一些带有复杂函数的模型转换失败。较低版本的onnx支持较低的Opset,比如,onnx-1.9.0支持的最大Opset是14
新版的tf2onnx能够在转换时进行通道转换,TensorFlow的通道顺序是NHWC,如果想转换时变换通道顺序为NCHW则设置--outputs_as_nchw
参数,但是该设置在tf2onnx-1.12.0及更高版本才有
*4 onnxruntime要选择gpu版本的
低版本的onnxruntime没有库CUDAProvider.lib,会导致CUDA操作失败
版本: >=1.14.1 (验证没问题)
2. onnx模型导出
TensorFlow导出onnx模型依赖tf2onnx包,支持识别的checkpoint和saved-model.pb,这里建议用saved-model.pb,使用model.save("./saved_pb")
保存模型,得到一个文件夹,包含以下文件:
在当前目录下运行命令行代码:
python -m tf2onnx.convert --save-model ./saved_pb --output ./exportedModel.onnx --opset 14
导出后看网络的输入输出的名称是什么,后面用得到。
我的输入层名是’input_1’,输出层名是’activation_22’
2024-12-19 22:04:10,763 - INFO - Using tensorflow=2.5.0, onnx=1.9.0, tf2onnx=1.9.1/8e8c23
2024-12-19 22:04:10,763 - INFO - Using opset <onnx, 14>
2024-12-19 22:04:13,045 - INFO - Computed 0 values for constant folding
2024-12-19 22:04:20,579 - INFO - Optimizing ONNX model
2024-12-19 22:04:27,780 - INFO - After optimization: BatchNormalization -22 (23->1), Concat +5 (25->30), Const -295 (370->75), Gather +10 (0->10), Identity -5 (5->0), Reshape +1 (0->1), Split +5 (0->5), Squeeze -5 (15->10), Transpose -113 (114->1), Unsqueeze -9 (20->11)
2024-12-19 22:04:28,303 - INFO -
2024-12-19 22:04:28,303 - INFO - Successfully converted TensorFlow model ./model/pb to ONNX
2024-12-19 22:04:28,305 - INFO - Model inputs: ['input_1']
2024-12-19 22:04:28,305 - INFO - Model outputs: ['activation_22']
2024-12-19 22:04:28,306 - INFO - ONNX model is saved at ./exportedModel.onnx
接下来用python验证导出模型是否有误,因为在python上实现很简单,到C++上相对复杂,不好调试。注意,下面测试代码,只是不需要tensorflow库了,推理只用session.run
, 前处理和后处理要自己写。
import onnxruntime as ort
import numpy as np
import cv2
# Load the ONNX model
session = ort.InferenceSession("./exportedModel.onnx", providers=['CUDAExecutionProvider'])
# Prepare an input image
dir = './test_onnx/input/'
file_name = 'test.png'
img = cv2.imread(dir + file_name)
img_norm = (img / 255).astype(np.float32)
img_norm = np.expand_dims(img_norm, 0)
# input_1:我的输入层名称
outputs = session.run(None, {"input_1": img_norm})
# 我的输出列表只有一个输出,所以直接取0
print(outputs[0].shape)
# probility map
prob = outputs[0]
test_prediction = prob.argmax(3)
test_prediction = np.squeeze(test_prediction)
test_prediction = np.expand_dims(test_prediction, -1)
test_prediction = test_prediction.astype(np.uint8)
test_prediction = np.squeeze(test_prediction)
test_prediction[test_prediction == 1] = 40
test_prediction[test_prediction == 2] = 80
test_prediction[test_prediction == 3] = 120
test_prediction[test_prediction == 4] = 160
test_prediction[test_prediction == 5] = 200
cv2.iwrite(f'./test_onnx/prediction/unetONNX_{file_name}', test_prediction)
保存得分割图和用tensorflow输出的一致,那就说明导出的onnx模型完全正确!
3. visual studio环境配置
VS需要配置opencv,onnxruntime,CUDA(包含cudnn)
用到的库有:
opencv_world451d.lib
onnxruntime.lib
onnxruntime_providers_cuda.lib
onnxruntime_providers_shared.lib
onnxruntime_providers_tensorrt.lib
还要把几个动态库文件放到和项目执行文件所在文件夹
4. C++推理代码
需要先说明的一点,流程是一样的,需要用到的库函数是一样的,但是前处理和后处理要根据自己训练的模型来写,下面提供的是本项目做图像分割的代码示例。
LayerSegInference.h
#pragma once
#define RET_OK nullptr
#include <string>
#include <vector>
#include <cstdio>
#include <opencv2/opencv.hpp>
#include "onnxruntime_cxx_api.h"
class OrtLayerSeg
{
public:
OrtLayerSeg();
~OrtLayerSeg();
public:
// 供类外部调用的运行函数
char* RunSession(cv::Mat& iImg, std::vector<int>& layerSuface, cv::Mat& outputImg);
private:
// 配置onnx,cuda等
char* CreateSession();
// 数据blob,拉直成一维的
template<typename T>
char* BlobFromImage(cv::Mat& iImg, T& iBlob)
// 做pad填充或者归一化等
char* PreProcess(cv::Mat& iImg, std::vector<int>& ImgSize, cv::Mat& oImg);
// 模型推理,输出数据解析
template<typename N>
char* TensorProcess(cv::Mat& iImg, N& blob, std::vector<int64_t>& inputNodeDims,
std::vector<int>& layerSuface, cv::Mat& outputImg);
private:
Ort::Env env;
Ort::Session* session;
Ort::RunOptions options;
// 输入输出层名已知,直接赋值
std::vector<const char*> inputNodeNames = {'input_1'};
std::vector<const char*> outputNodeNames = {'activation_22'};
const ORTCHAR_T* modelPath = L"./exportedModel.onnx";
std::vector<int> imgSize;
bool cudaEnable = true;
int logSeverityLevel = 3;
int intraOpNumThreads = 1;
};
LayerSegInference.cpp
#include "LayerSegInference.h"
std::vector<cv::Vec3b> colors = {
cv::Vec3b(0, 0, 0), // Black
cv::Vec3b(255, 0, 0), // Blue
cv::Vec3b(0, 255, 0), // Green
cv::Vec3b(0, 0, 255), // Red
cv::Vec3b(255, 255, 0), // Cyan
cv::Vec3b(255, 0, 255), // Magenta
};
OrtLayerSeg::OrtLayerSeg() {
CreateSession();
}
OrtLayerSeg::~OrtLayerSeg() {
delete session;
}
template<typename T>
char* OrtLayerSeg::BlobFromImage(cv::Mat& iImg, T& iBlob) {
int channels = iImg.channels();
int imgHeight = iImg.rows;
int imgWidth = iImg.cols;
// 我的输入图都是灰度图,所以只有两层循环。RGB图再加一个channels通道
for (int h = 0; h < imgHeight; h++)
{
for (int w = 0; w < imgWidth; w++)
{
//std::cout << "h: " << h << " w: " << w << std::endl;
iBlob[h * imgWidth + w] = typename std::remove_pointer<T>::type(
(iImg.at<uchar>(h, w)) / 255.0f);
}
}
return RET_OK;
}
char* OrtLayerSeg::CreateSession()
{
char* Ret = RET_OK;
try
{
env = Ort::Env(ORT_LOGGING_LEVEL_WARNING, "LayerSeg");
Ort::SessionOptions sessionOption;
// 因为用CUDA,所以要配置GPU
if (cudaEnable)
{
OrtCUDAProviderOptions cudaOption;
cudaOption.device_id = 0;
sessionOption.AppendExecutionProvider_CUDA(cudaOption);
}
sessionOption.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);
sessionOption.SetIntraOpNumThreads(intraOpNumThreads);
sessionOption.SetLogSeverityLevel(logSeverityLevel);
session = new Ort::Session(env, modelPath, sessionOption);
// 明确知道onnx的输入输出格式就不需要用Get,Get方法存在泄露问题,不推荐使用
/*Ort::AllocatorWithDefaultOptions allocator;
size_t inputNodesNum = session->GetInputCount();
for (size_t i = 0; i < inputNodesNum; i++)
{
auto input_node_name = session->GetInputNameAllocated(i, allocator);
inputNodeNames.push_back(input_node_name.get());
}
size_t OutputNodesNum = session->GetOutputCount();
for (size_t i = 0; i < OutputNodesNum; i++)
{
auto output_node_name = session->GetOutputNameAllocated(i, allocator);
outputNodeNames.push_back(output_node_name.get());
}*/
options = Ort::RunOptions{ nullptr };
return RET_OK;
}
catch (const std::exception& e)
{
const char* str1 = "[ONNX LayerSeg]:";
const char* str2 = e.what();
std::string result = std::string(str1) + std::string(str2);
std::cout << result << std::endl;
char output[] = "[ONNX LayerSeg]: Create session failed.";
Ret = output;
return Ret;
}
}
char* OrtLayerSeg::RunSession(cv::Mat& iImg, std::vector<int>& layerSuface, cv::Mat& outputImg)
{
char* Ret = RET_OK;
cv::Mat processedImg;
PreProcess(iImg, imgSize, processedImg);
float* blob = new float[processedImg.total() * 1];
BlobFromImage(processedImg, blob);
std::vector<int64_t> inputNodeDims = { 1, imgSize.at(0), imgSize.at(1), 1};
TensorProcess(iImg, blob, inputNodeDims, layerSuface, outputImg);
return Ret;
}
char* OrtLayerSeg::PreProcess(cv::Mat& iImg, std::vector<int>& ImgSize, cv::Mat& oImg)
{
char* Ret = RET_OK;
int pad_width, pad_height;
if (iImg.cols % 32 == 0) pad_width = 0;
else pad_width = 32 - iImg.cols % 32;
if (iImg.rows % 32 == 0) pad_height = 0;
else pad_height = 32 - iImg.rows % 32;
cv::copyMakeBorder(iImg, oImg, 0, pad_height, 0, pad_width, cv::BORDER_CONSTANT, 0);
ImgSize = { oImg.rows, oImg.cols };
return Ret;
}
template<typename N>
char* OrtLayerSeg::TensorProcess(cv::Mat& iImg, N& blob, std::vector<int64_t>& inputNodeDims,
std::vector<int>& layerSuface, cv::Mat& outputImg)
{
Ort::Value inputTensor = Ort::Value::CreateTensor<typename std::remove_pointer<N>::type>(
Ort::MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU), blob, imgSize.at(0) * imgSize.at(1) * 1,
inputNodeDims.data(), inputNodeDims.size());
auto outputTensor = session->Run(options, inputNodeNames.data(), &inputTensor, 1, outputNodeNames.data(),
outputNodeNames.size());
Ort::TypeInfo typeInfo = outputTensor.front().GetTypeInfo();
auto tensor_info = typeInfo.GetTensorTypeAndShapeInfo();
std::vector<int64_t> outputNodeDims = tensor_info.GetShape(); // (b, h, w, c)
auto output = outputTensor.front().GetTensorMutableData<typename std::remove_pointer<N>::type>();
delete[] blob;
// 我的网络模型输出是softmax激活后的probility map,所以下面取最大概率通道为分割类别
int height = outputNodeDims[1];
int width = outputNodeDims[2];
int classNum = outputNodeDims[3];
cv::Mat prediction = cv::Mat::zeros(height, width, CV_8UC3);
// 输出数据也是一维的,height*width*classNum
float* rawData = (float*)output;
for (int i = 0; i < height; ++i) {
for (int j = 0; j < width; ++j) {
cv::Mat prob(1, classNum, CV_32FC1, rawData);
cv::Point class_id;
double maxLabelScore;
cv::minMaxLoc(prob, 0, &maxLabelScore, 0, &class_id);
prediction.at<cv::Vec3b>(i, j) = colors[class_id.x];
rawData += classNum;
}
}
cv::Mat prediction_crop = prediction(cv::Rect(0, 0, iImg.cols, iImg.rows));
cv::cvtColor(iImg, outputImg, cv::COLOR_GRAY2BGR);
cv::addWeighted(prediction_crop, 0.3, outputImg, 0.9, 0, outputImg);
return RET_OK;
}
main.cpp
#include "LayerSegInference.h"
int main()
{
std::string image_name = "test.png";
std::string image_path = "./test_onnx/input/" + image_name;
cv::Mat image = cv::imread(image_path, cv::IMREAD_GRAYSCALE);
cv::Mat outputImg;
OrtLayerSeg* LayerSeg = new OrtLayerSeg;
std::vector<int> layerSuface;
char* ret = LayerSeg->RunSession(image, layerSuface, outputImg);
cv::imwrite("./output/" + image_name, outputImg);
return 0;
}
输出图如下,实现了图片不同层的分割。
有任何疑问欢迎提出!