Python多线程Threading模块中使用TensorRT遇到的问题以及解决方法
由于需要在Python程序中并行运行TensorRT,因此使用Threading模块调用TensorRT,但是程序出现了报错。
1.遇到的问题
不改动任何原有TensorRT程序直接在多线程中调用会出现如下报错:
ERROR: …/rtSafe/cuda/reformat.cu (925) - Cuda Error in NCHWToNCHHW2: 400 (invalid resource handle)
2.解决方案
首先要用pycuda对cuda进行初始化,以防出现cuda驱动初始化的一些莫名其妙的错误
import pycuda.autoinit
(1) 在调用TensorRT的类的__init__()的首行加入下面语句:
self.cfx= cuda.Devvice(0).make_context()
(我自己的程序是在类中调用的TnsorRT以及进行相关操作,如果不是在类中进行操作,那就把上面那句放在所有关于TensorRT操作语句之前,记得去掉self)。
如果上述语句放置位置不对,程序还可能报以下错误:
"ERROR: ../rtSafe/safeContext.cpp (133) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)
(2) 此外还需要在推理函数(inference)的首行添加如下代码:
self.cfx.push()
(3) 最后还要在所有关于TensorRT操作结束后加上:
self.cfx.pop()
3.参考代码
附上我自己项目中关于这部分的代码,自行参考:
class TrtYOLOv3(object):
"""TrtYOLOv3 class encapsulates things needed to run TRT YOLOv3."""
def allocate_buffers(self, engine):
"""Allocates all host/device in/out buffers required for an engine."""
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()
for binding in engine:
size = trt.volume(engine.get_binding_shape(binding)) * \
engine.max_batch_size
dtype = trt.nptype(engine.get_binding_dtype(binding))
# Allocate host and device buffers
host_mem = cuda.pagelocked_empty(size, dtype)
device_mem = cuda.mem_alloc(host_mem.nbytes)
# Append the device buffer to device bindings.
bindings.append(int(device_mem))
# Append to the appropriate list.
if engine.binding_is_input(binding):
inputs.append(HostDeviceMem(host_mem, device_mem))
else:
outputs.append(HostDeviceMem(host_mem, device_mem))
return inputs, outputs, bindings, stream
def do_inference(self, context, bindings, inputs, outputs, stream, batch_size=1):
"""do_inference (for TensorRT 6.x or lower)
This function is generalized for multiple inputs/outputs.
Inputs and outputs are expected to be lists of HostDeviceMem objects.
"""
self.cfx.push()
# Transfer input data to the GPU.
[cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
# Run inference.
context.execute_async(batch_size=batch_size,
bindings=bindings,
stream_handle=stream.handle)
# Transfer predictions back from the GPU.
[cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
# Synchronize the stream
stream.synchronize()
# Return only the host outputs.
return [out.host for out in outputs]
def do_inference_v2(self, context, bindings, inputs, outputs, stream):
"""do_inference_v2 (for TensorRT 7.0+)
This function is generalized for multiple inputs/outputs for full
dimension networks.
Inputs and outputs are expected to be lists of HostDeviceMem objects.
"""
self.cfx.push()
# Transfer input data to the GPU.
[cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
# Run inference.
context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)
# Transfer predictions back from the GPU.
[cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
# Synchronize the stream
stream.synchronize()
# Return only the host outputs.
return [out.host for out in outputs]
def _load_engine(self):
TRTbin = '%s.trt' % self.model
with open(TRTbin, 'rb') as f, trt.Runtime(self.trt_logger) as runtime:
return runtime.deserialize_cuda_engine(f.read())
def _create_context(self):
return self.engine.create_execution_context()
def __init__(self, model, input_shape=(416, 416)):
self.cfx = cuda.Device(0).make_context()
"""Initialize TensorRT plugins, engine and conetxt."""
self.model = model
self.input_shape = input_shape
h, w = input_shape
if 'tiny' in model:
self.output_shapes = [(1, 36, h // 32, w // 32),
(1, 36, h // 16, w // 16)]
else:
self.output_shapes = [(1, 36, h // 32, w // 32),
(1, 36, h // 16, w // 16),
(1, 36, h // 8, w // 8)]
if 'tiny' in model:
postprocessor_args = {
# A list of 2 three-dimensional tuples for the Tiny YOLO masks
'yolo_masks': [(3, 4, 5), (0, 1, 2)],
# A list of 6 two-dimensional tuples for the Tiny YOLO anchors
'yolo_anchors': [(10, 14), (23, 27), (37, 58),
(81, 82), (135, 169), (344, 319)],
# Threshold for non-max suppression algorithm, float
# value between 0 and 1
'nms_threshold': 0.5,
'yolo_input_resolution': input_shape
}
else:
postprocessor_args = {
# A list of 3 three-dimensional tuples for the YOLO masks
'yolo_masks': [(6, 7, 8), (3, 4, 5), (0, 1, 2)],
# A list of 9 two-dimensional tuples for the YOLO anchors
'yolo_anchors': [(10, 13), (16, 30), (33, 23),
(30, 61), (62, 45), (59, 119),
(116, 90), (156, 198), (373, 326)],
# Threshold for non-max suppression algorithm, float
# value between 0 and 1
# between 0 and 1
'nms_threshold': 0.5,
'yolo_input_resolution': input_shape
}
self.postprocessor = PostprocessYOLO(**postprocessor_args)
self.trt_logger = trt.Logger(trt.Logger.INFO)
self.engine = self._load_engine()
self.context = self._create_context()
self.inputs, self.outputs, self.bindings, self.stream = \
self.allocate_buffers(self.engine)
self.inference_fn = self.do_inference if trt.__version__[0] < '7' \
else self.do_inference_v2
def __del__(self):
"""Free CUDA memories."""
del self.stream
del self.outputs
del self.inputs
def detect(self, img, conf_th=0.3):
"""Detect objects in the input image."""
shape_orig_WH = (img.shape[1], img.shape[0])
# print(str(img.shape[1])+str(img.shape[0]))
img_resized = _preprocess_yolov3(img, self.input_shape)
# print(str(img_resized.shape[1])+str(img_resized.shape[0]))
# Set host input to the image. The do_inference() function
# will copy the input to the GPU before executing.
self.inputs[0].host = np.ascontiguousarray(img_resized)
trt_outputs = self.inference_fn(
context=self.context,
bindings=self.bindings,
inputs=self.inputs,
outputs=self.outputs,
stream=self.stream)
# Before doing post-processing, we need to reshape the outputs
# as do_inference() will give us flat arrays.
trt_outputs = [output.reshape(shape) for output, shape
in zip(trt_outputs, self.output_shapes)]
# Run the post-processing algorithms on the TensorRT outputs
# and get the bounding box details of detected objects
boxes, classes, scores = self.postprocessor.process(
trt_outputs, shape_orig_WH, conf_th)
self.cfx.pop()
return boxes, scores, classes
4.总结
概括一下,要在多线程中使用TensorRT主要是通过添加【2】中提到的三个语句,这三个语句的位置非常重要,但是每个人的代码可能不完全一样,所以请自行斟酌参考。
5.参考链接
1]: https://forums.developer.nvidia.com/t/how-to-use-tensorrt-by-the-multi-threading-package-of-python/123085
2]: https://github.com/NVIDIA/TensorRT/issues/303