Using The CuDLA API To Run A TensorRT Engine

Using The CuDLA API To Run A TensorRT Engine

Table Of Contents

Description

This sample, sampleCudla, uses an API to construct a network of a single ElementWise layer and builds the engine. The engine runs in DLA standalone mode using cuDLA runtime. In order to do that, the sample uses cuDLA APIs to do engine conversion and cuDLA runtime preparation, as well as inference.

How does this sample work?

After the construction of a network, the module with cuDLA is loaded from the network data. The input and output tensors are then allocated and registered with cuDLA. When the input tensors are copied from CPU to GPU, the cuDLA task can be submitted and executed. Then we wait for stream operations to finish and bring output buffer to CPU to be verified for correctness.

Specifically:

  • The single-layered network is built by TensorRT.
  • cudlaCreateDevice is called to create DLA device.
  • cudlaModuleLoadFromMemory is called to load the engine memory for DLA use.
  • cudaMalloc and cudlaMemRegister are called to first allocate memory on GPU, then let the CUDA pointer be registered with the DLA.
  • cudlaModuleGetAttributes is called to get module attributes from the loaded module.
  • cudlaSubmitTask is called to submit the inference task.

TensorRT API layers and ops

In this sample, the ElementWise layer is used. For more information, see the TensorRT Developer Guide: Layers documentation.

Prerequisites

This sample needs to be compiled with macro ENABLE_DLA=1, otherwise, this sample will print the following error message:

Unsupported platform, please make sure it is running on aarch64, QNX or android.

and quit.

Running the sample

  1. Compile this sample by running make in the <TensorRT root directory>/samples/sampleCudla directory. The binary named sample_cudla will be created in the <TensorRT root directory>/bin directory.
    cd <TensorRT root directory>/samples/sampleCudla make ENABLE_DLA=1

    Where `<TensorRT root directory>` is where you installed TensorRT.
    
  2. Run the sample to perform inference on DLA.
    ./sample_cudla

  3. Verify that the sample ran successfully. If the sample runs successfully you should see an output similar to the following:
    &&&& RUNNING TensorRT.sample_cudla # ./sample_cudla [I] [TRT] [I] [TRT] --------------- Layers running on DLA: [I] [TRT] [DlaLayer] {ForeignNode[(Unnamed Layer* 0) [ElementWise]]}, [I] [TRT] --------------- Layers running on GPU: [I] [TRT] …(omit messages) &&&& PASSED TensorRT.sample_cudla

     This output shows that the sample ran successfully; `PASSED`.
    

Sample --help options

To see the full list of available options and their descriptions, use the ./sample_cudla -h command line option.

Additional resources

The following resources provide a deeper understanding of sampleCudla.

Documentation

License

For terms and conditions for use, reproduction, and distribution, see the TensorRT Software License Agreement documentation.

Changelog

June 2022
This is the first release of the README.md file.

Known issues

There are no known issues with this tool.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值