1,介绍
TensorRT是一个高性能的深度学习推理(Inference)优化器,可以为深度学习应用提供低延迟、高吞吐率的部署推理。
2,安装
2.1 已经编译完成,nvidia官网下载
cuda:10.0
cudnn:7.6.0
cmake:3.9.2
tensorrt:7.1
2.2 配置环境变量
vim ~/.bashrc
# set tensorrt
export TENSORRT_ROOT=$HOME/TensorRT
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$TENSORRT_ROOT/lib
source ~/.bashrc
2.3 编译
sudo apt-get install zlib1g zlib1g-dev
git clone https://github.com/NVIDIA/TensorRT.git
cd TensorRT
git submodule sync
git submodule update --init --recursive
mkdir -p build && cd build
cmake .. -DTRT_LIB_DIR=$TRT_RELEASE/lib -DTRT_OUT_DIR=`pwd`/out -DCUDA_VERSION=10.2
make -j$(nproc)
make install # 根据硬件环境编译链接库,并更新
# 获取指定版本
git clone -b 7.1.3 https://github.com/NVIDIA/TensorRT.git
cd TensorRT
git checkout -b 7.1.3
git submodule sync
git submodule update --init --recursive
相关参考
https://zhuanlan.zhihu.com/p/181274475
3,python
3.1 安装要求
pip install pycuda # 版本>=2019.1.1
pip install tensorrt
pip install uff
pip install graphsurgeon
# 缺少依赖库
sudo cp libnvinfer.so.7 /usr/lib
sudo cp libnvonnxparser.so.7 /usr/lib
sudo cp libnvparsers.so.7 /usr/lib
sudo cp libnvinfer_plugin.so.7 /usr/lib
sudo cp libmyelin.so.1 /usr/lib
cuda -> pytorch -> tensorrt
10.2 -> 1.4.0 -> 7.0
10.2 -> 1.5/1.6 -> 7.1
3.2 代码
python
>>> import tensorrt
>>> tensorrt.__version__
'7.1.3.4'
3.4 动态维度
import torch
from torchvision import models
import time
resnet18 = models.resnet18(pretrained=True)
resnet18 = resnet18.eval().cuda()
x = torch.randn((1,3,224,224), dtype=torch.float32).cuda()
for i in range(10):
t1 = time.time()
torch.cuda.synchronize()
out = resnet18(x)
torch.cuda.synchronize()
t2 = time.time()
print("pytorch {} inference:{}".format(i, t2-t1))
output = out.data.cpu().numpy()
# onnx
onnx_file = "resnet18.onnx"
input = ["input"]
output = ["output"]
dynamic_axes = {'input' : {0 : 'batch_size'},
'output' : {0 : 'batch_size'}}
torch.onnx.export(resnet18,
x,
onnx_file,
export_params=True,
opset_version=10,
do_constant_folding=True,
input_names = input,
output_names = output,
dynamic_axes=dynamic_axes)
镜像
docker pull nvcr.io/nvidia/tensorrt:21.05-py3