Python多线程中使用TensorRT

最新推荐文章于 2025-04-17 00:28:16 发布

卑微小岳在线debug

最新推荐文章于 2025-04-17 00:28:16 发布

阅读量5.1k

点赞数 5

分类专栏： TensorRT 文章标签：机器学习 python

本文链接：https://blog.youkuaiyun.com/weixin_39739042/article/details/112554503

版权

TensorRT 专栏收录该内容

1 篇文章

订阅专栏

Python多线程中使用TensorRT加速YOLO

Python多线程Threading模块中使用TensorRT遇到的问题以及解决方法

Python多线程Threading模块中使用TensorRT遇到的问题以及解决方法

由于需要在Python程序中并行运行TensorRT，因此使用Threading模块调用TensorRT，但是程序出现了报错。

1.遇到的问题

不改动任何原有TensorRT程序直接在多线程中调用会出现如下报错：

ERROR: …/rtSafe/cuda/reformat.cu (925) - Cuda Error in NCHWToNCHHW2: 400 (invalid resource handle)

2.解决方案

首先要用pycuda对cuda进行初始化，以防出现cuda驱动初始化的一些莫名其妙的错误

import pycuda.autoinit

(1) 在调用TensorRT的类的__init__()的首行加入下面语句：

self.cfx= cuda.Devvice(0).make_context()

（我自己的程序是在类中调用的TnsorRT以及进行相关操作，如果不是在类中进行操作，那就把上面那句放在所有关于TensorRT操作语句之前，记得去掉self）。
如果上述语句放置位置不对，程序还可能报以下错误：

"ERROR: ../rtSafe/safeContext.cpp (133) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)

(2) 此外还需要在推理函数（inference）的首行添加如下代码：

self.cfx.push()

(3) 最后还要在所有关于TensorRT操作结束后加上：

self.cfx.pop()

3.参考代码

附上我自己项目中关于这部分的代码，自行参考：

class TrtYOLOv3(object):
    """TrtYOLOv3 class encapsulates things needed to run TRT YOLOv3."""
    def allocate_buffers(self, engine):
        
        """Allocates all host/device in/out buffers required for an engine."""
        inputs = []
        outputs = []
        bindings = []
        stream = cuda.Stream()
        for binding in engine:
            size = trt.volume(engine.get_binding_shape(binding)) * \
                engine.max_batch_size
            dtype = trt.nptype(engine.get_binding_dtype(binding))
            # Allocate host and device buffers
            host_mem = cuda.pagelocked_empty(size, dtype)
            device_mem = cuda.mem_alloc(host_mem.nbytes)
            # Append the device buffer to device bindings.
            bindings.append(int(device_mem))
            # Append to the appropriate list.
            if engine.binding_is_input(binding):
                inputs.append(HostDeviceMem(host_mem, device_mem))
            else:
                outputs.append(HostDeviceMem(host_mem, device_mem))
        return inputs, outputs, bindings, stream


    def do_inference(self, context, bindings, inputs, outputs, stream, batch_size=1):
        """do_inference (for TensorRT 6.x or lower)
        
        This function is generalized for multiple inputs/outputs.
        Inputs and outputs are expected to be lists of HostDeviceMem objects.
        """
        self.cfx.push()
        # Transfer input data to the GPU.
        [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
        # Run inference.
        context.execute_async(batch_size=batch_size,
                            bindings=bindings,
                            stream_handle=stream.handle)
        # Transfer predictions back from the GPU.
        [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
        # Synchronize the stream
        stream.synchronize()
        # Return only the host outputs.
        return [out.host for out in outputs]


    def do_inference_v2(self, context, bindings, inputs, outputs, stream):
        """do_inference_v2 (for TensorRT 7.0+)

        This function is generalized for multiple inputs/outputs for full
        dimension networks.
        Inputs and outputs are expected to be lists of HostDeviceMem objects.
        """
        self.cfx.push()
        # Transfer input data to the GPU.
        [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
        # Run inference.
        context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)
        # Transfer predictions back from the GPU.
        [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
        # Synchronize the stream
        stream.synchronize()
        # Return only the host outputs.
        return [out.host for out in outputs]

    def _load_engine(self):
        TRTbin = '%s.trt' % self.model
        with open(TRTbin, 'rb') as f, trt.Runtime(self.trt_logger) as runtime:
            return runtime.deserialize_cuda_engine(f.read())

    def _create_context(self):
        return self.engine.create_execution_context()

    def __init__(self, model, input_shape=(416, 416)):
        self.cfx = cuda.Device(0).make_context()
        """Initialize TensorRT plugins, engine and conetxt."""
        self.model = model
        self.input_shape = input_shape
        h, w = input_shape
        if 'tiny' in model:
            self.output_shapes = [(1, 36, h // 32, w // 32),
                                  (1, 36, h // 16, w // 16)]
        else:
            self.output_shapes = [(1, 36, h // 32, w // 32),
                                  (1, 36, h // 16, w // 16),
                                  (1, 36, h //  8, w //  8)]
        if 'tiny' in model:
            postprocessor_args = {
                # A list of 2 three-dimensional tuples for the Tiny YOLO masks
                'yolo_masks': [(3, 4, 5), (0, 1, 2)],
                # A list of 6 two-dimensional tuples for the Tiny YOLO anchors
                'yolo_anchors': [(10, 14), (23, 27), (37, 58),
                                 (81, 82), (135, 169), (344, 319)],
                # Threshold for non-max suppression algorithm, float
                # value between 0 and 1
                'nms_threshold': 0.5,
                'yolo_input_resolution': input_shape
            }
        else:
            postprocessor_args = {
                # A list of 3 three-dimensional tuples for the YOLO masks
                'yolo_masks': [(6, 7, 8), (3, 4, 5), (0, 1, 2)],
                # A list of 9 two-dimensional tuples for the YOLO anchors
                'yolo_anchors': [(10, 13), (16, 30), (33, 23),
                                 (30, 61), (62, 45), (59, 119),
                                 (116, 90), (156, 198), (373, 326)],
                # Threshold for non-max suppression algorithm, float
                # value between 0 and 1
                # between 0 and 1
                'nms_threshold': 0.5,
                'yolo_input_resolution': input_shape
            }
        self.postprocessor = PostprocessYOLO(**postprocessor_args)

        self.trt_logger = trt.Logger(trt.Logger.INFO)
        self.engine = self._load_engine()
        self.context = self._create_context()
        self.inputs, self.outputs, self.bindings, self.stream = \
            self.allocate_buffers(self.engine)
        self.inference_fn = self.do_inference if trt.__version__[0] < '7' \
                                         else self.do_inference_v2

    def __del__(self):
        """Free CUDA memories."""
        del self.stream
        del self.outputs
        del self.inputs

    def detect(self, img, conf_th=0.3):
        """Detect objects in the input image."""
        shape_orig_WH = (img.shape[1], img.shape[0])
        # print(str(img.shape[1])+str(img.shape[0]))
        img_resized = _preprocess_yolov3(img, self.input_shape)
        # print(str(img_resized.shape[1])+str(img_resized.shape[0]))
        # Set host input to the image. The do_inference() function
        # will copy the input to the GPU before executing.
        self.inputs[0].host = np.ascontiguousarray(img_resized)
        trt_outputs = self.inference_fn(
            context=self.context,
            bindings=self.bindings,
            inputs=self.inputs,
            outputs=self.outputs,
            stream=self.stream)

        # Before doing post-processing, we need to reshape the outputs
        # as do_inference() will give us flat arrays.
        
        trt_outputs = [output.reshape(shape) for output, shape
                       in zip(trt_outputs, self.output_shapes)]

        # Run the post-processing algorithms on the TensorRT outputs
        # and get the bounding box details of detected objects
        boxes, classes, scores = self.postprocessor.process(
            trt_outputs, shape_orig_WH, conf_th)
        self.cfx.pop()
        return boxes, scores, classes

4.总结

概括一下，要在多线程中使用TensorRT主要是通过添加【2】中提到的三个语句，这三个语句的位置非常重要，但是每个人的代码可能不完全一样，所以请自行斟酌参考。

5.参考链接

1]: https://forums.developer.nvidia.com/t/how-to-use-tensorrt-by-the-multi-threading-package-of-python/123085
2]: https://github.com/NVIDIA/TensorRT/issues/303