YOLO11 实例分割模型做行人分割

点击下方卡片,关注“小白玩转Python”公众号

本文教程将一起学习如何利用 YOLO11 分割模型在图像中准确隔离和识别人物。

3323d78a8d77cbc054132b2c12cbbec7.jpeg

YOLO11 实例分割模型在 Unsplash 图片上的结果

引言

实例分割是检测和隔离图像中单个对象的关键技术,YOLO11 是执行这项任务的最佳模型之一。在本文中,你将学习如何使用 YOLO11 分割模型有效地在图像中分割人物。我们将涵盖从设置 Python 环境和安装必要的库,到下载测试图像和可视化分割结果的所有内容。通过本教程的学习,你将清楚地了解如何应用 YOLO11 进行准确的人物分割。

1. 创建 Python 环境

我们首先将设置一个 Python 虚拟环境来管理依赖项。打开你的终端并运行:

python -m venv env

激活虚拟环境

然后我们需要激活我们的虚拟环境。根据你的机器,你可能需要使用不同的命令。这里我展示了在 Windows 和 Mac/Linux 上的两种方式。Windows:

.\env\Scripts\activate

Mac/Linux:

source env/bin/activate

2. 安装 Ultralytics 库

在我们的虚拟环境激活后,我们需要安装 ultralytics 库,这将允许我们使用 YOLO11 实例分割模型。运行以下命令在你的环境中安装库:

pip install ultralytics

3. 下载测试图像

现在让我们从 Unsplash 下载一张测试图像,你可以使用你选择的任何图像。我为我们的测试目的选择了以下图像:

e6f205d75cd678f0deeb01d1d38ddc9c.jpeg

在 .py 文件中,添加以下代码来下载图像:

import cv2
import urllib.request


url, filename = ("https://images.unsplash.com/photo-1484353371297-d8cfd2895020?w=600&auto=format&fit=crop&q=60&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxzZWFyY2h8NTUwfHxwZW9wbGV8ZW58MHx8MHx8fDA%3D", "scene.jpg")
urllib.request.urlretrieve(url, filename)    # Download the image


# Load the input image using OpenCV
image = cv2.imread(filename)

4. 加载模型并生成推理结果

下一步是加载我们的分割模型并在测试图像上运行推理。在本教程中,我们将使用 yolo11n-seg.pt 模型,但你可以使用 Ultralytics YOLO11 文档中的任何你喜欢的模型。一旦我们的模型加载完毕,我们使用 results = model(filename) 在测试图像上运行推理,然后创建一个空的分割掩码。

from ultralytics import YOLO
import numpy as np


# Load the model
model = YOLO("yolo11n-seg.pt")  # load an official YOLO model


# Predict with the model
results = model(filename)  # predict on an image


# Create an empty mask for segmentation
segmentation_mask = np.zeros_like(image, dtype=np.uint8)

5. 可视化人物分割掩码

最后一步是可视化我们的模型生成的分割掩码。YOLO11 模型支持同时分割多个类别,如人物、自行车和汽车等。由于我们只对人物类别感兴趣,类别标签为 0,我们只可视化具有此类别的掩码。在下面的代码中,我们遍历结果并过滤人物掩码。然后我们将掩码叠加在图像上进行清晰的可视化,然后使用 matplotlib 保存并显示结果。

# Iterate over the results
for i, r in enumerate(results):
    # Iterate through the detected masks
    for j, mask in enumerate(r.masks.xy):
        # Convert the class tensor to an integer
        class_id = int(r.boxes.cls[j].item())  # Extract the class ID as an integer
        
        # Check if the detected class corresponds to 'person' (class ID 0)
        if class_id == 0:
            # Convert mask coordinates to an integer format for drawing
            mask = np.array(mask, dtype=np.int32)
            
            # Fill the segmentation mask with color
            cv2.fillPoly(segmentation_mask, [mask], (0, 255, 0))


# Combine the original image with the segmentation mask
segmentation_result = cv2.addWeighted(image, 1, segmentation_mask, 0.7, 0)


# Save the output image with segmentation
cv2.imwrite("output_segmentation.jpg", segmentation_result)


# Optionally display the image (make sure you're running in a GUI environment)
cv2.imshow("Segmentation Result", segmentation_result)
cv2.waitKey(0)
cv2.destroyAllWindows()

如果所有代码都运行良好,你应该得到与下面类似的输出。这显然会有所不同,如果你使用了不同的测试图像。

b6d58a9497902adcd720b7789f8ea508.jpeg

分割结果

完整代码:https://github.com/Brianhulela/yolo11_segmentation

·  END  ·

🌟 想要变身计算机视觉小能手?快来「小白玩转Python」公众号!

回复Python视觉实战项目,解锁31个超有趣的视觉项目大礼包!🎁

c98a5720836b8f4470c6bacac02fc8d5.png

本文仅供学习交流使用,如有侵权请联系作者删除

Human parsing has been extensively studied recently (Yamaguchi et al. 2012; Xia et al. 2017) due to its wide applications in many important scenarios. Mainstream fashion parsing models (i.e., parsers) focus on parsing the high-resolution and clean images. However, directly applying the parsers trained on benchmarks of high-quality samples to a particular application scenario in the wild, e.g., a canteen, airport or workplace, often gives non-satisfactory performance due to domain shift. In this paper, we explore a new and challenging cross-domain human parsing problem: taking the benchmark dataset with extensive pixel-wise labeling as the source domain, how to obtain a satisfactory parser on a new target domain without requiring any additional manual labeling? To this end, we propose a novel and efficient crossdomain human parsing model to bridge the cross-domain differences in terms of visual appearance and environment conditions and fully exploit commonalities across domains. Our proposed model explicitly learns a feature compensation network, which is specialized for mitigating the cross-domain differences. A discriminative feature adversarial network is introduced to supervise the feature compensation to effectively reduces the discrepancy between feature distributions of two domains. Besides, our proposed model also introduces a structured label adversarial network to guide the parsing results of the target domain to follow the high-order relationships of the structured labels shared across domains. The proposed framework is end-to-end trainable, practical and scalable in real applications. Extensive experiments are conducted where LIP dataset is the source domain and 4 different datasets including surveillance videos, movies and runway shows without any annotations, are evaluated as target domains. The results consistently confirm data efficiency and performance advantages of the proposed method for the challenging cross-domain human parsing problem. Abstract—This paper presents a robust Joint Discriminative appearance model based Tracking method using online random forests and mid-level feature (superpixels). To achieve superpixel- wise discriminative ability, we propose a joint appearance model that consists of two random forest based models, i.e., the Background-Target discriminative Model (BTM) and Distractor- Target discriminative Model (DTM). More specifically, the BTM effectively learns discriminative information between the target object and background. In contrast, the DTM is used to suppress distracting superpixels which significantly improves the tracker’s robustness and alleviates the drifting problem. A novel online random forest regression algorithm is proposed to build the two models. The BTM and DTM are linearly combined into a joint model to compute a confidence map. Tracking results are estimated using the confidence map, where the position and scale of the target are estimated orderly. Furthermore, we design a model updating strategy to adapt the appearance changes over time by discarding degraded trees of the BTM and DTM and initializing new trees as replacements. We test the proposed tracking method on two large tracking benchmarks, the CVPR2013 tracking benchmark and VOT2014 tracking challenge. Experimental results show that the tracker runs at real-time speed and achieves favorable tracking performance compared with the state-of-the-art methods. The results also sug- gest that the DTM improves tracking performance significantly and plays an important role in robust tracking.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值