[paper] multi-human parsing (MHP) (Zhao et al., 2018) dataset.

介绍了一种新的多人语义分割数据集MHP,包含4980张图片,每张图片有2到16人,共有18个语义标签。提出了MH-Parser模型,包括特征学习器、全局解析器、候选提名者、局部解析器和全局-局部聚合器,用于解决复杂场景下的人体解析问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Towards Real World Human Parsing: Multiple-Human Parsing in the Wild
Paper: https://arxiv.org/pdf/1705.07206.pdf

提出多人语义分割数据集:4980张图片(训练/验证/测试:3000/1000/980),每张包含2-16人,18个语义标签。

多人分割模型MH-Parser包含5个组件:
Representation learner (FCN提特征)
Global parser (用特征生成分割图)
Candidate nominator (RPN生成bbox)
Local parser (使用特征和bbox生成局部分割)
Global-local aggregator (结合全局和局部信息得到最终每个人的分割)

we introduce the Multiple-Human Parsing (MHP) dataset, which contains multiple persons in a real world scene per single image.

The MHP dataset contains various numbers of persons (from 2 to 16) per image with 18 semantic classes for each parsing annotation. Persons appearing in the MHP images present sufficient variations in pose, occlusion and interaction.

To tackle the multiple-human parsing problem, we also propose a novel Multiple-Human Parser (MH-Parser), which considers both the global context and local cues for each person in the parsing process.

Introduction

all the human parsing datasets only contain one person per image, while usually multiple persons appear simultaneously in a realistic scene.

Previous work on human parsing mainly focuses on the problem of parsing in controlled and simplified conditions.

simultaneous presence of multiple persons.

we tackle the problem of person detection and human parsing simultaneously so that both the global information and the local information are employed.

contributions:

  • We introduce the multiple-human parsing problem that extends the research scope of human parsing and matches real world scenarios better in various applications.

  • We construct a new large-scale benchmark, named Multiple-Human Parsing (MHP) dataset, to advance the development of relevant techniques.

  • We propose a novel MH-Parser model for multiple-human parsing, which integrates global context as well as local cues for human parsing and significantly outperforms the naive “detect-and-parse” approach.

Human parsing

Instance-aware object segmentation

The MHP dataset

this is the first large scale dataset focusing on multiple-human parsing.

4980 images, each image contains 2 to 16 humans, totally there are 14969 person level annotations.

Image collection and annotation methodology

we manually specify several underlying relationships (e.g., family, couple, team, etc.), and several possible scenes (e.g., sports, conferences, banquets, etc.)

The first task is manually counting the number of foreground persons and duplicating each image into several copies according to that number.

the second is to assign the fine-grained pixel-wise label for each instance.

Dataset statistics

training/validation/test: 3000/1000/980 (randomly choose)

The images in the MHP dataset contain diverse human numbers, appearances, viewpoints and relationships (see Figure 1).

Multiple-Human Parsing Methods

MH-Parser

The proposed MH-Parser has five components:

  • Representation learner

    We use a trunk network to learn rich and discriminative representations. we preserve the spatial information of the image by employing fully convolutional neural networks.

    images and annotations => representations

  • Global parser

    capture the global information of the whole image. The global parser takes the representation from the representation learner and generates a semantic parsing map of the whole image.

    representations => a semantic parsing map of the whole image

  • Candidate nominator

    We use a candidate nominator to generate local regions of interest. The candidate nominator consists of a Region Proposal Network (RPN).

    representations => candidate box

  • Local parser

    give a fine-grained prediction of the semantic parsing labels for each person in the image.

    representations, candidate box => semantic parsing labels for each person

  • Global-local aggregator

    leverages both the global and local information when performing the parsing task of each person.

    the hidden representations from both the local parser and the global parser => a set of semantic parsing predictions for each candidate box

Detect-and-parse baseline

In the detection stage, we use the representation learner and the candidate nominator as the detection model.

In the parsing stage, we use the representation learner and the local prediction as the parsing model.

Experiments

Performance evaluation

The goal of multiple-human parsing is to accurately detect the persons in one image and generate semantic category predictions for each pixel in the detected regions.

Mean average precision based on pixel (mAPpmAPp)

we adopt pixel-level IOU of different semantic categories on a person.

Percentage of correctly segmented body parts (PCP)

evaluate how well different semantic categories on a human are segmented.

Global Mean IOU

evaluates how well the overall parsing predictions match the overall global parsing labels.

Implementation details

  • representation learner

    adopt a residual network [19] with 50 layers, contains all the layers in a standard residual network except the fully connected layers.

    input: an image with the shorter side resized to 600 pixels and the longer side no larger than 1000 pixels

    output: 1/16 of the spatial dimension of the input image

  • global parser

    add a deconvolution layer after the representation learner.

    output: a feature map with spatial dimension 1/8 of the input image

  • candidate nominator

    use region proposal network (RPN) to generate region proposals.

    output: region proposals

  • local parser

    based on the region after Region of Interest (ROI) pooling from the representation learner and the size after pooling is 40.

  • global-local aggregator

    the local part is from the hidden layer in the local parser, and the global part uses the feature after ROI pooling from the hidden layer of the global parser with the same pooled size.

The network is optimized with one image per batch and the optimizer used is Adam [20].

Experimental analysis

Overall performance evaluation

RL stands for the representation learner, G means the global parser, L denotes the local parser, A for aggregator.

Qualitative comparison

We can see that the MH-Parser captures more fine-grained details compared to the global parser, as some categories with a small number of pixels are accurately predicted.

Conclusion and future work

In this paper, we introduced the multiple-human parsing problem and a new large-scale MHP dataset for developing and evaluating multiple-human parsing models.

We also proposed a novel MH-Parser algorithm to address this new challenging problem and performed detailed evaluations of the proposed method with different baselines on the new benchmark dataset.

--------------------- 作者:lijiancheng0614 来源:优快云 原文:https://blog.youkuaiyun.com/lijiancheng0614/article/details/73195221?utm_source=copy 版权声明:本文为博主原创文章,转载请附上博文链接!

<think>好的,我现在需要帮助用户解决在CUDA 12.0环境下安装PyTorch时出现的“torch模块未找到”的问题。用户之前已经得到过关于使用conda和pip配置环境的指导,现在遇到了模块缺失的问题,可能需要重新安装PyTorch。 首先,我需要回顾用户的历史对话。用户之前的问题是关于在CUDA 12.0下配置conda环境并安装PyTorch的步骤,现在的问题则是安装后torch模块无法找到。可能的原因包括安装路径问题、环境未激活、或者安装命令不正确。 根据用户提供的引用内容,用户已经按照指导创建了conda环境,并尝试安装对应CUDA 12.0的PyTorch。用户现在的问题可能出在安装过程中某些步骤未正确执行,或者环境未正确激活。 接下来,我需要参考用户提到的站内引用,尤其是引用[1]、[3]和[4]。引用[1]提到用户下载了miniconda并创建了Python3.8的环境,然后在PyTorch官网找到对应CUDA12.0的版本安装。引用[3]提到使用conda安装不同版本的CUDA、cudnn和PyTorch,并强调激活虚拟环境的重要性。引用[4]解释了Python版本和解释器的关系,这可能影响包的选择。 用户当前的报错是“torch模块未找到”,这可能的原因包括: 1. 未正确安装PyTorch,或安装的版本不兼容CUDA 12.0。 2. 安装的环境未激活,导致安装到了全局环境或其他环境中。 3. 路径问题,conda环境未正确配置,导致Python无法找到安装的包。 接下来,我需要给出解决方案。根据用户之前的回答,正确的安装命令是使用conda或pip安装适配CUDA12.0的PyTorch。可能需要检查用户是否正确执行了这些命令,或者是否有网络问题导致安装失败。 另外,用户可能需要验证CUDA和PyTorch的兼容性,确保安装的PyTorch版本确实支持CUDA 12.0。例如,PyTorch的官方文档指出,从2.0版本开始支持CUDA 12.x,但需要确认具体的版本号。 用户还可能混淆了conda和pip的安装源,导致安装的包不兼容。需要强调使用正确的渠道(如pytorch官方和nvidia的渠道)以及镜像源可能影响包的获取。 最后,需要指导用户进行安装后的验证,包括检查torch是否可导入,CUDA是否可用,以及版本是否正确。如果验证失败,可能需要重新安装或调整安装命令。 总结步骤: 1. 确认conda环境已激活。 2. 使用正确的命令安装PyTorch(conda或pip)。 3. 验证安装,检查torch是否存在,CUDA版本是否匹配。 4. 处理可能的依赖冲突或路径问题。</think>根据您CUDA 12.0的环境需求,安装PyTorch的步骤如下(包含问题排查流程): ### 一、确认虚拟环境状态 1. 检查是否已激活环境 ```bash conda activate human_parsing # 使用您创建的环境名 conda info --envs # 当前环境前会显示* ``` 2. 若未创建环境,按以下命令创建(根据引用[2][^2]): ```bash conda create -n torch_env python=3.9 -y # 新建环境示例 conda activate torch_env ``` ### 二、安装CUDA 12.0适配的PyTorch #### 方案1:conda官方渠道安装(推荐) ```bash conda install pytorch torchvision torchaudio pytorch-cuda=12.0 -c pytorch -c nvidia ``` #### 方案2:pip安装(备用) ```bash pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118 ``` ⚠️ 注意:当前PyTorch官方暂未提供CUDA 12.0的预编译包,建议使用`cu118`(CUDA 11.8)兼容版本,实际测试可在CUDA 12.0环境运行[^1] ### 三、验证安装结果 ```python import torch print(torch.__version__) # 应显示2.x.x+cu118 print(torch.cuda.is_available()) # 应输出True print(torch.version.cuda) # 应显示12.0(向下兼容) ``` ### 四、常见问题排查 1. **模块未找到错误**: - 确认在正确的conda环境中执行python - 使用`conda list | grep torch`检查是否安装成功 - 尝试`python -m pip install --upgrade pip`升级pip 2. **CUDA版本不匹配**: 若出现`CUDA driver version is insufficient`错误: ```bash nvidia-smi # 检查驱动版本 nvcc --version # 检查CUDA编译器版本 conda install cudatoolkit=12.0 ``` 3. **镜像源问题**(根据引用[1][^1]): ```bash # 临时禁用镜像源 pip install torch --index-url https://download.pytorch.org/whl/cu118 --no-cache-dir ``` ### 五、强制重装命令 ```bash conda uninstall pytorch torchvision torchaudio pip uninstall torch torchvision torchaudio conda clean --all # 然后重新执行第二步安装 ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值