Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields.md

本文链接：https://blog.youkuaiyun.com/JabberPengins/article/details/78034457

提出一种高效检测图像中多人2D姿态的方法。使用非参数表示Part Affinity Fields (PAFs)关联个体身体部位。此方法编码全局上下文，实现快速准确的姿态估计，不受人数限制。在COCO2016关键点挑战赛中排名第一。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

CVPR Oral 2017

Zhe Cao

Tomas Simon

Shih-En Wei

Yaser Sheikh

The Robotics Institute, Carnegie Mellon University

We present an approach to efficiently detect the 2D pose of multiple people in an image. The approach uses a non-parametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image. The architecture encodes global context, allowing a greedy bottom-up parsing step that maintains high accuracy while achieving realtime performance, irrespective of the number of people in the image. The architecture is designed to jointly learn part locations and their association via two branches of the same sequential prediction process. Our method placed first in the inaugural COCO 2016 keypoints challenge, and significantly exceeds the previous state-of-the-art result on the MPII Multi-Person benchmark, both in performance and efficiency.
- 目标：通过将图片中已检测到的人体关键点正确的联系起来估计人体姿态。
- 方法：PAF（Part Affinity Fields)—-> 部分区域亲和。该架构编码全局上下文，贪婪的down-to-top的解析步骤，达到实时，性能与图片中的人数无关。该架构被设计用来，通过两个分支的相同的顺序预测过程，联合学习关键点位置和他们之间的联系。

分为两步

方法缺陷

不足：没能利用其他身体部位和其他人的全局线索，导致检测效率并没有降低全局推断（如何连接）的时间。

**创新：**Jointly Learning Parts Detection and Parts Association

同时学习，PAFs 和 Key Positions。
- PAFs:　在图像域编码着四肢位置和方向的2D矢量
- CMP:　Part Detection Comfidence Maps
Greedy pasing Algorithm
1. 二分配（图论算法）
2. 姿态解析

Each branch is an iterative prediction architecture, to refine the predictions over successive stages, t ∈ {1, … , T }, with intermediate supervision at each stage.

* each stage predict one part or limb *

和肢体都有标签，mask就是用来handle这个的)

part affinity fields that preserves both location and orientation information across the region of support of the limb

简化连接问题为最大匹配问题，之后结合detectio能得到的关键点联合决策该连接是否正确。
可以做这样的简化的原因在于，该文章提出的PAFS。首先学习到的PAFs，具有大范围的感受野，再者，PAFs得到的是一个向量的集合，向量提供了方向的信息，对于简化解析匹配很重要。
assemble the connections that share the same part detection candidates into full-body poses of multiple people.