Atlas800昇腾服务器（型号：3000）—YOLO全系列NPU推理【实例分割】（六）

你的陈某某

已于 2024-11-27 09:41:34 修改

阅读量1.1k

点赞数 6

分类专栏： Atlas800国产服务器om推理文章标签： YOLOv8 NPU Atlas800 A300I pro ais_bench

于 2024-10-19 23:22:58 首次发布

本文链接：https://blog.youkuaiyun.com/weixin_45679938/article/details/143085868

版权

Atlas800国产服务器om推理专栏收录该内容

10 篇文章

订阅专栏

服务器配置如下：

CPU/NPU：鲲鹏 CPU（ARM64）+A300I pro推理卡
系统：Kylin V10 SP1【下载链接】【安装链接】
驱动与固件版本版本：
Ascend-hdk-310p-npu-driver_23.0.1_linux-aarch64.run【下载链接】
Ascend-hdk-310p-npu-firmware_7.1.0.4.220.run【下载链接】
MCU版本：Ascend-hdk-310p-mcu_23.2.3【下载链接】
CANN开发套件：版本7.0.1【Toolkit下载链接】【Kernels下载链接】

测试om模型环境如下：

Python：版本3.8.11
推理工具：ais_bench
测试YOLO系列：v5/6/7/8/9/10/11

专栏其他文章：
Atlas800昇腾服务器（型号：3000）—驱动与固件安装（一）
Atlas800昇腾服务器（型号：3000）—CANN安装（二）
Atlas800昇腾服务器（型号：3000）—YOLO全系列om模型转换测试（三）
Atlas800昇腾服务器（型号：3000）—AIPP加速前处理（四）
Atlas800昇腾服务器（型号：3000）—YOLO全系列NPU推理【检测】（五）
Atlas800昇腾服务器（型号：3000）—YOLO全系列NPU推理【实例分割】（六）
Atlas800昇腾服务器（型号：3000）—YOLO全系列NPU推理【关键点】（七）
Atlas800昇腾服务器（型号：3000）—YOLO全系列NPU推理【跟踪】（八）

全部代码github：https://github.com/Bigtuo/NPU-ais_bench

1 基础环境安装

详情见第（三）章环境安装：https://blog.youkuaiyun.com/weixin_45679938/article/details/142966255

2 ais_bench编译安装

注意：目前ais_bench工具只支持单个input的带有动态AIPP配置的模型，只支持静态shape、动态batch、动态宽高三种场景，不支持动态shape场景。
参考链接：https://gitee.com/ascend/tools/tree/master/ais-bench_workload/tool/ais_bench

2.1 安装aclruntime包

在安装环境执行如下命令安装aclruntime包：
说明：若为覆盖安装，请增加“–force-reinstall”参数强制安装.

pip3 install -v 'git+https://gitee.com/ascend/tools.git#egg=aclruntime&subdirectory=ais-bench_workload/tool/ais_bench/backend' -i https://pypi.tuna.tsinghua.edu.cn/simple

在这里插入图片描述

2.2 安装ais_bench推理程序包

在安装环境执行如下命令安装ais_bench推理程序包：

 pip3 install -v 'git+https://gitee.com/ascend/tools.git#egg=ais_bench&subdirectory=ais-bench_workload/tool/ais_bench' -i https://pypi.tuna.tsinghua.edu.cn/simple

在这里插入图片描述
卸载和更新【忽略】：

# 卸载aclruntime
pip3 uninstall aclruntime
# 卸载ais_bench推理程序
pip3 uninstall ais_bench

3 裸代码推理测试

# 1.进入运行环境yolo【普通用户】
conda activate yolo
# 2.激活atc【atc --help测试是否可行】
source ~/.bashrc

注意：ais_bench调用和使用方式与onnx-runtime几乎一致，因此可参考进行撰写脚本！

代码逻辑如下：
下面代码整个处理过程主要包括：预处理—>推理（框/mask）—>后处理（框/mask）—>画图。
假设图像resize为640×640，
前处理输出结果维度：(1, 3, 640, 640)；
YOLOv5推理输出结果维度：检测头的输出(1, 8400*3, 85+32), 分割头的输出(1, 32, 160, 160)；
YOLOv8/11推理输出结果维度：检测头的输出(1, 116, 8400), 分割头的输出(1, 32, 160, 160)；
后处理输出结果维度：boxes(-1, 6)，其中第一个表示图bus.jpg检出的目标，第二个维度6表示(x1, y1, x2, y2, conf, cls)；segments是各掩码轮廓点集。

分割头具体处理如下：

首先对于一张 640x640 的图片来说，YOLOv8-Seg 模型存在两个输出，一个是 output0 可以理解为检测头的输出，它的维度是 1x116x8400；另一个是 output1 可以理解为分割头的输出，它的维度是 1x32x160x160，我们一个个来分析。

针对于检测头的输出 1x116x8400 我们应该已经非常熟悉了，它代表预测框的总数量是 8400，每个预测框的维度是 116（针对 COCO 数据集的 80 个类别而言）
8400 × 116 = 80 × 80 × 116 + 40 × 40 × 116 + 20 × 20 × 116 = 80 × 80 × ( 84 + 32 ) + 40 × 40 × ( 84 + 32 ) + 20 × 20 × ( 84 + 32 ) = 80 × 80 × ( 4 + 80 + 32 ) + 40 × 40 × ( 4 + 80 + 32 ) + 20 × 20 × ( 4 + 80 + 32 )

8400×116=80×80×116+40×40×116+20×20×116=80×80×(84+32)+40×40×(84+32)+20×20×(84+32)=80×80×(4+80+32)+40×40×(4+80+32)+20×20×(4+80+32) 8400×116=80×80×116+40×40×116+20×20×116=80×80×(84+32)+40×40×(84+32)+20×20×(84+32)=80×80×(4+80+32)+40×40×(4+80+32)+20×20×(4+80+32)
其中的 4 对应的是 cx, cy, w, h，分别代表的含义是边界框中心点坐标、宽高；80 对应的是 COCO 数据集中的 80 个类别置信度。32 维的向量可以看作是与每个检测框关联的分割 mask 的系数或权重。

针对于分割头的输出 1x32x160x160，一个关键的概念是 prototype masks。它是一个固定数量（32）的基础 mask，每个 mask 的尺寸为 160x160。这些基础 mask 并不直接对应于任何特定的物体或类别，而是被设计为可以线性组合来表示任何可能的物体 mask。

简单来说，模型不直接预测每个物体的完整 mask，而是预测一组基本的 masks（称为 prototype masks）以及每个物体如何组合这些 masks（权重/系数）。这种方法的好处是，模型只需要预测一个较小的 mask 张量，然后可以通过简单的矩阵乘法将这些小 mask 组合成完整的物体 masks。

大家可以把它类比于线性代数中基向量的概念，空间中的任何一个向量是不是都可以表示为一组基向量的线性组合，那么其中的 prototype masks 即 32x160x160 的 mask 张量可以把它理解为一组基向量，而之前在检测框中的 32 维向量可以理解为组合这一组基向量的权重或者说系数。

当我们从检测头得到一个 32 维的向量，分割头得到 32 个基础 masks 时，这个 32 维的向量实际上表示了如何组合这些基础 masks 来得到一个特定物体的 mask。具体来说，我们用这个 32 维向量对 32 个基础 masks 进行线性组合，从而得到与检测框关联的最终 mask。简单来说，这就像你现在有 32 种不同的颜料，检测头给你一个配方（32 维向量），告诉你如何混合这些颜料来得到一个特定的颜色（最终的 mask）。

这样做的优点是我们不需要为每个检测框都预测一个完整的 mask，这个非常消耗内存和计算资源。相反，我们只需要预测一个相对较小的 32 维向量和一个固定数量的基础 masks，然后在后处理中进行组合即可。

完整代码如下：
新建YOLO_ais_bench_seg_aipp.py，内容如下：

import argparse
import time 
import cv2
import numpy as np
import os

from ais_bench.infer.interface import InferSession


class YOLO:
    """YOLO segmentation model class for handling inference"""

    def __init__(self, om_model, imgsz=(640, 640), device_id=0, model_ndtype=np.single, mode="static", postprocess_type="v8", aipp=False):
        """
        Initialization.

        Args:
            om_model (str): Path to the om model.
        """
        
        # 构建ais_bench推理引擎
        self.session = InferSession(device_id=device_id, model_path=om_model)
        
        # Numpy dtype: support both FP32(np.single) and FP16(np.half) om model
        self.ndtype = model_ndtype
        self.mode = mode
        self.postprocess_type = postprocess_type
        self.aipp = aipp 
       
        self.model_height, self.model_width = imgsz[0], imgsz[1]  # 图像resize大小
     

    def __call__(self, im0, conf_threshold=0.4, iou_threshold=0.45):
        """
        The whole pipeline: pre-process -> inference -> post-process.

        Args:
            im0 (Numpy.ndarray): original input image.
            conf_threshold (float): confidence threshold for filtering predictions.
            iou_threshold (float): iou threshold for NMS.

        Returns:
            boxes (List): list of bounding boxes.
        """
        # 前处理Pre-process
        t1 = time.time()
        im, ratio, (pad_w, pad_h) = self.preprocess(im0)
        pre_time = round(time.time() - t1, 3)
        
        # 推理 inference
        t2 = time.time()
        preds = self.session.infer([im], mode=self.mode)  # mode有动态"dymshape"和静态"static"等
        det_time = round(time.time() - t2, 3)
        
        # 后处理Post-process
        t3 = time.time()
        if self.postprocess_type == "v5":
            boxes, segments, masks = self.postprocess_v5(preds,
                                    im0=im0,
                                    ratio=ratio,
                                    pad_w=pad_w,
                                    pad_h=pad_h,
                                    conf_threshold=conf_threshold,
                                    iou_threshold=iou_threshold,
                                    )
            
        elif self.postprocess_type == "v8":
            boxes, segments, masks = self.postprocess_v8(preds,
                                    im0=im0,
                                    ratio=ratio,
                                    pad_w=pad_w,
                                    pad_h=pad_h,
                                    conf_threshold=conf_threshold,
                                    iou_threshold=iou_threshold,
                                    )
        
        else:
            boxes=[], segments=[], masks=[]

        post_time = round(time.time() - t3, 3)

        return boxes, segments, masks, (pre_time, det_time, post_time)
        
    # 前处理，包括：resize, pad, 其中HWC to CHW，BGR to RGB，归一化，增加维度CHW -> BCHW可选择是否开启AIPP加速处理
    def preprocess(self, img):
        """
        Pre-processes the input image.

        Args:
            img (Numpy.ndarray): image about to be processed.

        Returns:
            img_process (Numpy.ndarray): image preprocessed for inference.
            ratio (tuple): width, height ratios in letterbox.
            pad_w (float): width padding in letterbox.
            pad_h (float): height padding in letterbox.
        """
        # Resize and pad input image using letterbox() (Borrowed from Ultralytics)
        shape = img.shape[:2]  # original image shape
        new_shape = (self.model_height, self.model_width)
        r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
        ratio = r, r
        new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
        pad_w, pad_h = (new_shape[1] - new_unpad[0]) / 2, (new_shape[0] - new_unpad[1]) / 2  # wh padding
        if shape[::-1] != new_unpad:  # resize
            img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
            
        top, bottom = int(round(pad_h - 0.1)), int(round(pad_h + 0.1))
        left, right = int(round(pad_w - 0.1)), int(round(pad_w + 0.1))
        img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=(114, 114, 114))  # 填充

        # 是否开启aipp加速预处理，需atc中完成
        if self.aipp:
            return img, ratio, (pad_w, pad_h)
        
        # Transforms: HWC to CHW -> BGR to RGB -> div(255) -> contiguous -> add axis(optional)
        img = np.ascontiguousarray(np.einsum('HWC->CHW', img)[::-1], dtype=self.ndtype) / 255.0
        img_process = img[None] if len(img.shape) == 3 else img
        return img_process, ratio, (pad_w, pad_h)
    
    # YOLOv5/6/7通用后处理，包括：阈值过滤与NMS+masks处理
    def postprocess_v5(self, preds, im0, ratio, pad_w, pad_h, conf_threshold, iou_threshold, nm=32):
        """
        Post-process the prediction.

        Args:
            preds (Numpy.ndarray): predictions come from ort.session.run().
            im0 (Numpy.ndarray): [h, w, c] original input image.
            ratio (tuple): width, height ratios in letterbox.
            pad_w (float): width padding in letterbox.
            pad_h (float): height padding in letterbox.
            conf_threshold (float): conf threshold.
            iou_threshold (float): iou threshold.
            nm (int): the number of masks.

        Returns:
            boxes (List): list of bounding boxes.
            segments (List): list of segments.
            masks (np.ndarray): [N, H, W], output masks.
        """
        # (Batch_size, Num_anchors, xywh_score_conf_cls), v5和v6_1.0的[..., 4]是置信度分数，v8v9采用类别里面最大的概率作为置信度score
        x, protos = preds[0], preds[1]  # 与bbox区别：Two outputs: 检测头的输出(1, 8400*3, 117), 分割头的输出(1, 32, 160, 160)
   
        # Predictions filtering by conf-threshold
        x = x[x[..., 4] > conf_threshold]

        # Create a new matrix which merge these(box, score, cls, nm) into one
        # For more details about `numpy.c_()`: https://numpy.org/doc/1.26/reference/generated/numpy.c_.html
        x = np.c_[x[..., :4], x[..., 4], np.argmax(x[..., 5:-nm], axis=-1), x[..., -nm:]]

        # NMS filtering
        # 经过NMS后的值, np.array([[x, y, w, h, conf, cls, nm], ...]), shape=(-1, 4 + 1 + 1 + 32)
        x = x[cv2.dnn.NMSBoxes(x[:, :4], x[:, 4], conf_threshold, iou_threshold)]
       
        # 重新缩放边界框，为画图做准备
        if len(x) > 0:
            # Bounding boxes format change: cxcywh -> xyxy
            x[..., [0, 1]] -= x[..., [2, 3]] / 2
            x[..., [2, 3]] += x[..., [0, 1]]

            # Rescales bounding boxes from model shape(model_height, model_width) to the shape of original image
            x[..., :4] -= [pad_w, pad_h, pad_w, pad_h]
            x[..., :4] /= min(ratio)

            # Bounding boxes boundary clamp
            x[..., [0, 2]] = x[:, [0, 2]].clip(0, im0.shape[1])
            x[..., [1, 3]] = x[:, [1, 3]].clip(0, im0.shape[0])

            # 与bbox区别：增加masks处理
            # Process masks
            masks = self.process_mask(protos[0], x[:, 6:], x[:, :4], im0.shape)
            # Masks -> Segments(contours)
            segments = self.masks2segments(masks)
            
            return x[..., :6], segments, masks  # boxes, segments, masks
        else:
            return [], [], []

    # YOLOv8/9/11通用后处理，包括：阈值过滤与NMS+masks处理
    def postprocess_v8(self, preds, im0, ratio, pad_w, pad_h, conf_threshold, iou_threshold, nm=32):
        """
        Post-process the prediction.

        Args:
            preds (Numpy.ndarray): predictions come from ort.session.run().
            im0 (Numpy.ndarray): [h, w, c] original input image.
            ratio (tuple): width, height ratios in letterbox.
            pad_w (float): width padding in letterbox.
            pad_h (float): height padding in letterbox.
            conf_threshold (float): conf threshold.
            iou_threshold (float): iou threshold.
            nm (int): the number of masks.

        Returns:
            boxes (List): list of bounding boxes.
            segments (List): list of segments.
            masks (np.ndarray): [N, H, W], output masks.
        """
        x, protos = preds[0], preds[1]  # 与bbox区别：Two outputs: 检测头的输出(1, 116, 8400), 分割头的输出(1, 32, 160, 160)

        # Transpose the first output: (Batch_size, xywh_conf_cls_nm, Num_anchors) -> (Batch_size, Num_anchors, xywh_conf_cls_nm)
        x = np.einsum('bcn->bnc', x)  # (1, 8400, 116)
   
        # Predictions filtering by conf-threshold，不包括后32维的向量（32维的向量可以看作是与每个检测框关联的分割 mask 的系数或权重）
        x = x[np.amax(x[..., 4:-nm], axis=-1) > conf_threshold]

        # Create a new matrix which merge these(box, score, cls, nm) into one
        # For more details about `numpy.c_()`: https://numpy.org/doc/1.26/reference/generated/numpy.c_.html
        x = np.c_[x[..., :4], np.amax(x[..., 4:-nm], axis=-1), np.argmax(x[..., 4:-nm], axis=-1), x[..., -nm:]]

        # NMS filtering
        # 经过NMS后的值, np.array([[x, y, w, h, conf, cls, nm], ...]), shape=(-1, 4 + 1 + 1 + 32)
        x = x[cv2.dnn.NMSBoxes(x[:, :4], x[:, 4], conf_threshold, iou_threshold)]
        
        # 重新缩放边界框，为画图做准备
        if len(x) > 0:
            # Bounding boxes format change: cxcywh -> xyxy
            x[..., [0, 1]] -= x[..., [2, 3]] / 2
            x[..., [2, 3]] += x[..., [0, 1]]

            # Rescales bounding boxes from model shape(model_height, model_width) to the shape of original image
            x[..., :4] -= [pad_w, pad_h, pad_w, pad_h]
            x[..., :4] /= min(ratio)

            # Bounding boxes boundary clamp
            x[..., [0, 2]] = x[:, [0, 2]].clip(0, im0.shape[1])
            x[..., [1, 3]] = x[:, [1, 3]].clip(0, im0.shape[0])

            # 与bbox区别：增加masks处理
            # Process masks
            masks = self.process_mask(protos[0], x[:, 6:], x[:, :4], im0.shape)
            # Masks -> Segments(contours)
            segments = self.masks2segments(masks)
            
            return x[..., :6], segments, masks  # boxes, segments, masks
        else:
            return [], [], []

    @staticmethod
    def masks2segments(masks):
        """
        It takes a list of masks(n,h,w) and returns a list of segments(n,xy) (Borrowed from
        https://github.com/ultralytics/ultralytics/blob/465df3024f44fa97d4fad9986530d5a13cdabdca/ultralytics/utils/ops.py#L750)

        Args:
            masks (numpy.ndarray): the output of the model, which is a tensor of shape (batch_size, 160, 160).

        Returns:
            segments (List): list of segment masks.
        """
        segments = []
        for x in masks.astype('uint8'):
            c = cv2.findContours(x, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[0]  # CHAIN_APPROX_SIMPLE  该函数用于查找二值图像中的轮廓。
            if c:
                # 这段代码的目的是找到图像x中的最外层轮廓，并从中选择最长的轮廓，然后将其转换为NumPy数组的形式。
                c = np.array(c[np.array([len(x) for x in c]).argmax()]).reshape(-1, 2)
            else:
                c = np.zeros((0, 2))  # no segments found
            segments.append(c.astype('float32'))
        return segments

    def process_mask(self, protos, masks_in, bboxes, im0_shape):
        """
        Takes the output of the mask head, and applies the mask to the bounding boxes. This produces masks of higher quality
        but is slower. (Borrowed from https://github.com/ultralytics/ultralytics/blob/465df3024f44fa97d4fad9986530d5a13cdabdca/ultralytics/utils/ops.py#L618)

        Args:
            protos (numpy.ndarray): [mask_dim, mask_h, mask_w].
            masks_in (numpy.ndarray): [n, mask_dim], n is number of masks after nms.
            bboxes (numpy.ndarray): bboxes re-scaled to original image shape.
            im0_shape (tuple): the size of the input image (h,w,c).

        Returns:
            (numpy.ndarray): The upsampled masks.
        """
        c, mh, mw = protos.shape
        masks = np.matmul(masks_in, protos.reshape((c, -1))).reshape((-1, mh, mw)).transpose(1, 2, 0)  # HWN
        masks = np.ascontiguousarray(masks)
        masks = self.scale_mask(masks, im0_shape)  # re-scale mask from P3 shape to original input image shape
        masks = np.einsum('HWN -> NHW', masks)  # HWN -> NHW
        masks = self.crop_mask(masks, bboxes)
        return np.greater(masks, 0.5)

    @staticmethod
    def scale_mask(masks, im0_shape, ratio_pad=None):
        """
        Takes a mask, and resizes it to the original image size. (Borrowed from
        https://github.com/ultralytics/ultralytics/blob/465df3024f44fa97d4fad9986530d5a13cdabdca/ultralytics/utils/ops.py#L305)

        Args:
            masks (np.ndarray): resized and padded masks/images, [h, w, num]/[h, w, 3].
            im0_shape (tuple): the original image shape.
            ratio_pad (tuple): the ratio of the padding to the original image.

        Returns:
            masks (np.ndarray): The masks that are being returned.
        """
        im1_shape = masks.shape[:2]
        if ratio_pad is None:  # calculate from im0_shape
            gain = min(im1_shape[0] / im0_shape[0], im1_shape[1] / im0_shape[1])  # gain  = old / new
            pad = (im1_shape[1] - im0_shape[1] * gain) / 2, (im1_shape[0] - im0_shape[0] * gain) / 2  # wh padding
        else:
            pad = ratio_pad[1]

        # Calculate tlbr of mask
        top, left = int(round(pad[1] - 0.1)), int(round(pad[0] - 0.1))  # y, x
        bottom, right = int(round(im1_shape[0] - pad[1] + 0.1)), int(round(im1_shape[1] - pad[0] + 0.1))
        if len(masks.shape) < 2:
            raise ValueError(f'"len of masks shape" should be 2 or 3, but got {len(masks.shape)}')
        masks = masks[top:bottom, left:right]
        masks = cv2.resize(masks, (im0_shape[1], im0_shape[0]),
                           interpolation=cv2.INTER_LINEAR)  # INTER_CUBIC would be better
        if len(masks.shape) == 2:
            masks = masks[:, :, None]
        return masks
    
    @staticmethod
    def crop_mask(masks, boxes):
        """
        It takes a mask and a bounding box, and returns a mask that is cropped to the bounding box. (Borrowed from
        https://github.com/ultralytics/ultralytics/blob/465df3024f44fa97d4fad9986530d5a13cdabdca/ultralytics/utils/ops.py#L599)

        Args:
            masks (Numpy.ndarray): [n, h, w] tensor of masks.
            boxes (Numpy.ndarray): [n, 4] tensor of bbox coordinates in relative point form.

        Returns:
            (Numpy.ndarray): The masks are being cropped to the bounding box.
        """
        n, h, w = masks.shape
        x1, y1, x2, y2 = np.split(boxes[:, :, None], 4, 1)
        r = np.arange(w, dtype=x1.dtype)[None, None, :]
        c = np.arange(h, dtype=x1.dtype)[None, :, None]
        return masks * ((r >= x1) * (r < x2) * (c >= y1) * (c < y2))
        

if __name__ == '__main__':
    # Create an argument parser to handle command-line arguments
    parser = argparse.ArgumentParser()
    parser.add_argument('--seg_model', type=str, default=r"yolov8s-seg.om", help='Path to OM model')
    parser.add_argument('--source', type=str, default=r'images', help='Path to input image')
    parser.add_argument('--out_path', type=str, default=r'results', help='结果保存文件夹')
    parser.add_argument('--imgsz_seg', type=tuple, default=(640, 640), help='Image input size')
    parser.add_argument('--classes', type=list, default=['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
            'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
              'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
                'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
                  'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich',
                    'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',
                      'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
                        'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'], help='类别')

    parser.add_argument('--conf', type=float, default=0.25, help='Confidence threshold')
    parser.add_argument('--iou', type=float, default=0.6, help='NMS IoU threshold')
    parser.add_argument('--device_id', type=int, default=0, help='device id')
    parser.add_argument('--mode', default='static', help='om是动态dymshape或静态static')
    parser.add_argument('--model_ndtype', default=np.single, help='om是fp32或fp16')
    parser.add_argument('--postprocess_type', type=str, default='v8', help='后处理方式, 对应v5/v8两种后处理')
    parser.add_argument('--aipp', default=False, action='store_true', help='是否开启aipp加速YOLO预处理, 需atc中完成om集成')
    args = parser.parse_args()

    # 创建结果保存文件夹
    if not os.path.exists(args.out_path):
        os.mkdir(args.out_path)
    
    print('开始运行：')
    # Build model
    seg_model = YOLO(args.seg_model, args.imgsz_seg, args.device_id, args.model_ndtype, args.mode, args.postprocess_type, args.aipp)
    color_palette = np.random.uniform(0, 255, size=(len(args.classes), 3))  # 为每个类别生成调色板
    
    for i, img_name in enumerate(os.listdir(args.source)):
        try:
            t1 = time.time()
            # Read image by OpenCV
            img = cv2.imread(os.path.join(args.source, img_name))

            # 检测Inference
            boxes, segments, _ , (pre_time, det_time, post_time) = seg_model(img, conf_threshold=args.conf, iou_threshold=args.iou)
            print('{}/{} ==>总耗时间: {:.3f}s, 其中, 预处理: {:.3f}s, 推理: {:.3f}s, 后处理: {:.3f}s, 识别{}个目标'.format(i+1, len(os.listdir(args.source)), time.time() - t1, pre_time, det_time, post_time, len(boxes)))

            # Draw rectangles and polygons
            im_canvas = img.copy()
            for (*box, conf, cls_), segment in zip(boxes, segments):
                # draw contour and fill mask
                cv2.polylines(img, np.int32([segment]), True, (255, 255, 255), 2)  # white borderline
                cv2.fillPoly(im_canvas, np.int32([segment]), (255, 0, 0))

                # draw bbox rectangle
                cv2.rectangle(img, (int(box[0]), int(box[1])), (int(box[2]), int(box[3])),
                            color_palette[int(cls_)], 1, cv2.LINE_AA)
                cv2.putText(img, f'{args.classes[int(cls_)]}: {conf:.3f}', (int(box[0]), int(box[1] - 9)),
                            cv2.FONT_HERSHEY_SIMPLEX, 0.7, color_palette[int(cls_)], 2, cv2.LINE_AA)

            # Mix image
            img = cv2.addWeighted(im_canvas, 0.3, img, 0.7, 0)

            cv2.imwrite(os.path.join(args.out_path, img_name), img)
            
        except Exception as e:
            print(e)