tensorrt学习记录-----1

TensorRT是一个用于深度学习推理的高性能库,它通过解析网络定义,进行平台和网络特定的优化,生成高效的推理引擎。在部署阶段,TensorRT支持自定义插件以实现不被原生支持的层。优化选项包括批处理大小、工作区大小、混合精度和动态形状限制。构建阶段会消除无用计算、折叠常数并重新组织操作以提高GPU效率,还支持自动降低计算精度和量化。应用通常一次性构建引擎并序列化为计划文件,以便于后续的推理过程。

https://docs.nvidia.com/deeplearning/tensorrt/api/index.html includes implementations for the most common deep learning layers.

https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_plugin.html to provide implementations for infrequently used or more innovative layers that are not supported natively by TensorRT.

Alternatives to using TensorRT include:
‣ Using the training framework itself to perform inference.
‣ Writing a custom application that is designed specifically to execute the network using
low-level libraries and math operations.

Generally, the workflow for developing and deploying a deep learning model goes through three phases.
‣ Phase 1 is training
‣ Phase 2 is developing a deployment solution, and
‣ Phase 3 is the deployment of that solution
TensorRT is generally not used during any part of the training phase.
在这里插入图片描述
After the network is parsed, consider your optimization options – batch size, workspace size, mixed precision, and bounds on dynamic shapes. These options are chosen and specified as part of the TensorRT build step, where you build an optimized inference engine based on your network.

To initialize the inference engine, the application first deserializes the model from the plan file into an inference engine. TensorRT is usually used asynchronously; therefore, when the input data arrives, the program
calls an enqueue function with the input buffer and the buffer in which TensorRT should put the result.

To optimize your model for inference, TensorRT takes your network definition, performs network-specific and platform-specific optimizations, and generates the inference engine. This process is referred to as the build phase. The build phase can take considerable time, especially when running on embedded platforms. Therefore, a typical application builds an
engine once and then serializes it as a plan file for later use.

The build phase optimizes the network graph by eliminating dead computations, folding constants, and reordering and combining operations to run more efficiently on the GPU. The builder can also be configured to reduce the precision of computations. It can automatically reduce 32-bit floating-point calculations to 16-bit and supports quantization of floating-point values so that calculations can be performed using 8-bit integers. Quantization requires dynamic range information, which can be provided by the application, or calculated by TensorRT using representative network inputs, a process called calibration. The build phase also runs multiple implementations of operators to find those which, when combined with any intermediate precision conversions and layout transformations, yield the fastest overall implementation of the network.

The Network Definition interface provides methods for the application to define a network. Input and output tensors can be specified, and layers can be added and configured. As well as layer types, such as convolutional and recurrent layers, a Plugin layer type allows the application to implement functionality not natively supported by TensorRT. For more information, refer to the Network Definition API. https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_network_definition.html

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值