Using The CuDLA API To Run A TensorRT Engine_cudlamoduleloadfrommemory-优快云博客

本文链接：https://blog.youkuaiyun.com/u014647208/article/details/134326597

Using The CuDLA API To Run A TensorRT Engine

Table Of Contents

Description
How does this sample work?
- TensorRT API layers and ops
Prerequisites
Running the sample
- Sample --help options
Additional resources
License
Changelog
Known issues

Description

This sample, sampleCudla, uses an API to construct a network of a single ElementWise layer and builds the engine. The engine runs in DLA standalone mode using cuDLA runtime. In order to do that, the sample uses cuDLA APIs to do engine conversion and cuDLA runtime preparation, as well as inference.

How does this sample work?

After the construction of a network, the module with cuDLA is loaded from the network data. The input and output tensors are then allocated and registered with cuDLA. When the input tensors are copied from CPU to GPU, the cuDLA task can be submitted and executed. Then we wait for stream operations to finish and bring output buffer to CPU to be verified for correctness.

Specifically:

The single-layered network is built by TensorRT.
cudlaCreateDevice is called to create DLA device.
cudlaModuleLoadFromMemory is called to load the engine memory for DLA use.
cudaMalloc and cudlaMemRegister are called to first allocate memory on GPU, then let the CUDA pointer be registered with the DLA.
cudlaModuleGetAttributes is called to get module attributes from the loaded module.
cudlaSubmitTask is called to submit the inference task.

TensorRT API layers and ops

In this sample, the ElementWise layer is used. For more information, see the TensorRT Developer Guide: Layers documentation.

Prerequisites

This sample needs to be compiled with macro ENABLE_DLA=1, otherwise, this sample will print the following error message:

Unsupported platform, please make sure it is running on aarch64, QNX or android.

and quit.

Running the sample

Compile this sample by running make in the <TensorRT root directory>/samples/sampleCudla directory. The binary named sample_cudla will be created in the <TensorRT root directory>/bin directory.
cd <TensorRT root directory>/samples/sampleCudla make ENABLE_DLA=1
```
Where `<TensorRT root directory>` is where you installed TensorRT.
```
Run the sample to perform inference on DLA.
./sample_cudla
Verify that the sample ran successfully. If the sample runs successfully you should see an output similar to the following:
&&&& RUNNING TensorRT.sample_cudla # ./sample_cudla [I] [TRT] [I] [TRT] --------------- Layers running on DLA: [I] [TRT] [DlaLayer] {ForeignNode[(Unnamed Layer* 0) [ElementWise]]}, [I] [TRT] --------------- Layers running on GPU: [I] [TRT] …(omit messages) &&&& PASSED TensorRT.sample_cudla
```
 This output shows that the sample ran successfully; `PASSED`.
```