YOLO V8-Pose 【批量图片推理】推理详解及部署实现

最新推荐文章于 2025-04-22 16:39:32 发布

牧锦程

最新推荐文章于 2025-04-22 16:39:32 发布

阅读量1.6k

点赞数 23

分类专栏： YOLOV8 批次与单张图片推理文章标签： YOLO

本文链接：https://blog.youkuaiyun.com/qq_48764574/article/details/138359114

版权

前言

在实际处理过程中，我们使用YOLO V8进行推理时，通常会针对一张图片进行推理。如果需要对多张图片进行推理，则可以通过一个循环来实现对图片逐张进行推理。

单张图片推理时，需要注意图片的尺寸必须是32的倍数，否则可能导致推理失败。在下面的示例中，我们展示了如何使用PyTorch和Ultralytics库进行单张图片的推理：

import torch
from ultralytics import YOLO

# Load a pretrained YOLOv8n model
model = YOLO('yolov8n-pose.pt')

# Create a random torch tensor of BCHW shape (1, 3, 640, 640) with values in range [0, 1] and type float32
source = torch.rand(1, 3, 640, 640, dtype=torch.float32)

# Run inference on the source
results = model(source)  # list of Results objects

批量图片推理时，也需要注意图片的尺寸必须是32的倍数。在下面的示例中，我们展示了如何使用PyTorch和Ultralytics库进行多张图片的批量推理：

import torch
from ultralytics import YOLO

# Load a pretrained YOLOv8n model
model = YOLO('yolov8n-pose.pt')

# Create a random torch tensor of BCHW shape (1, 3, 640, 640) with values in range [0, 1] and type float32
source = torch.rand(4, 3, 640, 640, dtype=torch.float32)

# Run inference on the source
results = model(source)  # list of Results objects

需要注意的是，在批量推理时，虽然一次推理了多张图片，但实际处理方式仍然是通过循环进行的。在下面的文章中，我们将介绍如何使用更高效的方式进行批量推理，以获得更快的推理速度和更好的性能。

下面我们介绍如何将【单张图片推理】检测代码给修改成【批量图片推理】代码，进行批量推理。

一、批量推理的前处理

原始代码

@staticmethod
def letterbox(im, new_shape=(640, 640), color=(114, 114, 114), scaleup=True, stride=32):
    # Resize and pad image while meeting stride-multiple constraints
    shape = im.shape[:2]  # current shape [height, width]
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)

    # Scale ratio (new / old)
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
    if not scaleup:  # only scale down, do not scale up (for better val mAP)
        r = min(r, 1.0)

    # Compute padding
    ratio = r, r  # width, height ratios
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding
    # minimum rectangle
    dw, dh = np.mod(dw, stride), np.mod(dh, stride)  # wh padding
    dw /= 2  # divide padding into 2 sides
    dh /= 2

    if shape[::-1] != new_unpad:  # resize
        im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)

    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border
    return im, ratio, (dw, dh)

def precess_image(self, img_src, img_size, half, device):
    # Padded resize
    img = self.letterbox(img_src, img_size)[0]
    # Convert
    img = img.transpose((2, 0, 1))[::-1]  # HWC to CHW, BGR to RGB
    img = np.ascontiguousarray(img)
    img = torch.from_numpy(img).to(device)

    img = img.half() if half else img.float()  # uint8 to fp16/32
    img = img / 255  # 0 - 255 to 0.0 - 1.0
    if len(img.shape) == 3:
        img = img[None]  # expand for batch dim
    return img

处理方式

我们要先知道在原始处理方式中是如何操作的：

它包含以下步骤：

self.pre_transform：即 letterbox 添加灰条
img.transpose((2, 0, 1))[::-1]：HWC to CHW, BGR to RGB
torch.from_numpy：to Tensor
img.float() ：uint8 to fp32
im /= 255：除以 255，归一化
img[None]：增加维度

在上述处理过程中我们最主要进行修改的就是 self.pre_transform 里面的操作，其余部分都是可以直接进行批量操作的。

在 letterbox 中最主要的操作就是下面两个函数，使用 opencv 进行实现的。我们要进行批量操作，那么 opencv 库是不能实现的，进行批量操作一般会用广播机制或者 tensor操作来实现。

im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)

由于最终输入到模型里面的是一个tensor，所以在这里我们使用 tensor的操作方式进行实现。

尺寸修改

原始方法：
im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)

现在方法：
resized_tensor = F.interpolate(image_tensor, size=new_unpad, mode='bilinear', align_corners=False)

两者的实现效果：

原始方法：(1176, 1956, 3) --》(385, 640, 3)

现在方法：(1176, 1956, 3) --》(385, 640, 3)

添加边框

原始方式：
im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) 

现在方法：
padded_tensor = F.pad(resized_tensor, (top, bottom, left, right), mode='constant', value=padding_value)

两者的实现效果：

原始方法：(385, 640, 3) --》(416, 640, 3)

现在方法：(385, 640, 3) --》(416, 640, 3)

修改后的代码

def tensor_process(self, image_cv):
    img_shape = image_cv.shape[1:]
    new_shape = [640, 640]
    r = min(new_shape[0] / img_shape[0], new_shape[1] / img_shape[1])
    # Compute padding
    new_unpad = int(round(img_shape[0] * r)), int(round(img_shape[1] * r))

    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding
    dw, dh = np.mod(dw, 32), np.mod(dh, 32)  # wh padding
    dw /= 2  # divide padding into 2 sides
    dh /= 2

    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))

    padding_value = 114

    image_tensor = torch.from_numpy(image_cv).permute(0, 3, 1, 2).float()
    image_tensor = image_tensor.to(self.device)

    resized_tensor = F.interpolate(image_tensor, size=new_unpad, mode='bilinear', align_corners=False)

    padded_tensor = F.pad(resized_tensor, (top, bottom, left, right), mode='constant', value=padding_value)
    infer_tensor = padded_tensor / 255.0

    return infer_tensor

二、批量推理的后处理

原始代码

def non_max_suppression(
    prediction,
    conf_thres=0.25,
    iou_thres=0.45,
    classes=None,
    agnostic=False,
    multi_label=False,
    labels=(),
    max_det=300,
    nc=0,  # number of classes (optional)
    max_time_img=0.05,
    max_nms=30000,
    max_wh=7680,