Using The CuDLA API To Run A TensorRT Engine
Table Of Contents
- Description
- How does this sample work?
- Prerequisites
- Running the sample
- Additional resources
- License
- Changelog
- Known issues
Description
This sample, sampleCudla, uses an API to construct a network of a single ElementWise layer and builds the engine. The engine runs in DLA standalone mode using cuDLA runtime. In order to do that, the sample uses cuDLA APIs to do engine conversion and cuDLA runtime preparation, as well as inference.
How does this sample work?
After the construction of a network, the module with cuDLA is loaded from the network data. The input and output tensors are then allocated and registered with cuDLA. When the input tensors are copied from CPU to GPU, the cuDLA task can be submitted and executed. Then we wait for stream operations to finish and bring output buffer to CPU to be verified for correctness.
Specifically:
- The single-layered network is built by TensorRT.
cudlaCreateDevice
is called to create DLA device.cudlaModuleLoadFromMemory
is called to load the engine memory for DLA use.cudaMalloc
andcudlaMemRegister
are called to first allocate memory on GPU, then let the CUDA pointer be registered with the DLA.cudlaModuleGetAttributes
is called to get module attributes from the loaded module.cudlaSubmitTask
is called to submit the inference task.
TensorRT API layers and ops
In this sample, the ElementWise layer is used. For more information, see the TensorRT Developer Guide: Layers documentation.
Prerequisites
This sample needs to be compiled with macro ENABLE_DLA=1
, otherwise, this sample will print the following error message:
Unsupported platform, please make sure it is running on aarch64, QNX or android.
and quit.
Running the sample
-
Compile this sample by running
make
in the<TensorRT root directory>/samples/sampleCudla
directory. The binary namedsample_cudla
will be created in the<TensorRT root directory>/bin
directory.
cd <TensorRT root directory>/samples/sampleCudla make ENABLE_DLA=1
Where `<TensorRT root directory>` is where you installed TensorRT.
-
Run the sample to perform inference on DLA.
./sample_cudla
-
Verify that the sample ran successfully. If the sample runs successfully you should see an output similar to the following:
&&&& RUNNING TensorRT.sample_cudla # ./sample_cudla [I] [TRT] [I] [TRT] --------------- Layers running on DLA: [I] [TRT] [DlaLayer] {ForeignNode[(Unnamed Layer* 0) [ElementWise]]}, [I] [TRT] --------------- Layers running on GPU: [I] [TRT] …(omit messages) &&&& PASSED TensorRT.sample_cudla
This output shows that the sample ran successfully; `PASSED`.
Sample --help
options
To see the full list of available options and their descriptions, use the ./sample_cudla -h
command line option.
Additional resources
The following resources provide a deeper understanding of sampleCudla.
Documentation
- Introduction To NVIDIA’s TensorRT Samples
- Working With TensorRT Using The C++ API
- NVIDIA’s TensorRT Documentation Library
- Developer Guide for cuDLA APIs
License
For terms and conditions for use, reproduction, and distribution, see the TensorRT Software License Agreement documentation.
Changelog
June 2022
This is the first release of the README.md
file.
Known issues
There are no known issues with this tool.