MPII姿态估计性能评价标准-PCK

最新推荐文章于 2025-09-20 04:23:53 发布

原创最新推荐文章于 2025-09-20 04:23:53 发布 · 6.9k 阅读

17 ·

CC 4.0 BY-SA版权

文章标签：

#深度学习 #PCK #MPII #姿态估计

CV tasks 同时被 2 个专栏收录

68 篇文章

订阅专栏

Key-points Estimation

2 篇文章

订阅专栏

PCK（Percentage of Correct Keypoints）是衡量人体关键点检测准确性的标准，尤其在mpii、deepfashion和fashionAI等项目中广泛使用。它计算预测关键点与真实关键点归一化距离小于阈值（如0.5）的比例。归一化通常基于人体头部尺寸，被称为PCKh。计算过程中，先一对一匹配预测和真实关键点，然后对所有人进行平均。提供的代码展示了如何计算PCK并得到每个关键点及整体的精度。

Overview

PCK是mpii使用的人体关键点估计评价标准，在coco之前，PCK一直是比较主流的metric，包括deepfashion，fashionAI等，都是使用的此标准。

PCK

PCK（Percentage of Correct Keypoints）定义为正确估计出关键点的比例，计算检测的关键点与其对应的groundtruth间的归一化距离小于设定阈值的比例(the percentage of detections that fall within a normalized distance of the ground truth)。
于是就有了PCK@0.5，也就是设定的阈值是0.5。
归一化距离是关键点预测值与人工标注值的欧式距离，进行人体尺度因子的归一化，MPII数据集是以当前人的头部直径作为尺度因子，即头部矩形框的左上点与右下点的欧式距离，使用此尺度因子的姿态估计指标也称PCKh。
需要注意的是PCK是针对于一个人joints的predict和gt，也就是说不存在多么预测结果与gt之前对应的问题，或者说这个对应问题在PCK计算之前就应该解决了，而PCK解决多人姿态估计时使用的方式是在人的维度上进行平均。
从下面的代码也可以看出，距离的计算是一一对应的，而多人的PCK就是求平均值。

code

from mmpose

def keypoint_pck_accuracy(pred, gt, mask, thr, normalize):
    """Calculate the pose accuracy of PCK for each individual keypoint and the
    averaged accuracy across all keypoints for coordinates.

    Note:
        PCK metric measures accuracy of the localization of the body joints.
        The distances between predicted positions and the ground-truth ones
        are typically normalized by the bounding box size.
        The threshold (thr) of the normalized distance is commonly set
        as 0.05, 0.1 or 0.2 etc.

        batch_size: N
        num_keypoints: K

    Args:
        pred (np.ndarray[N, K, 2]): Predicted keypoint location.
        gt (np.ndarray[N, K, 2]): Groundtruth keypoint location.
        mask (np.ndarray[N, K]): Visibility of the target. False for invisible
            joints, and True for visible. Invisible joints will be ignored for
            accuracy calculation.
        thr (float): Threshold of PCK calculation.
        normalize (np.ndarray[N, 2]): Normalization factor for H&W.

    Returns:
        tuple: A tuple containing keypoint accuracy.

        - acc (np.ndarray[K]): Accuracy of each keypoint.
        - avg_acc (float): Averaged accuracy across all keypoints.
        - cnt (int): Number of valid keypoints.
    """
    distances = _calc_distances(pred, gt, mask, normalize)

    acc = np.array([_distance_acc(d, thr) for d in distances])
    valid_acc = acc[acc >= 0]
    cnt = len(valid_acc)
    avg_acc = valid_acc.mean() if cnt > 0 else 0
    return acc, avg_acc, cnt

def _calc_distances(preds, targets, mask, normalize):
    """Calculate the normalized distances between preds and target.

    Note:
        batch_size: N
        num_keypoints: K
        dimension of keypoints: D (normally, D=2 or D=3)

    Args:
        preds (np.ndarray[N, K, D]): Predicted keypoint location.
        targets (np.ndarray[N, K, D]): Groundtruth keypoint location.
        mask (np.ndarray[N, K]): Visibility of the target. False for invisible
            joints, and True for visible. Invisible joints will be ignored for
            accuracy calculation.
        normalize (np.ndarray[N, D]): Typical value is heatmap_size

    Returns:
        np.ndarray[K, N]: The normalized distances.
          If target keypoints are missing, the distance is -1.
    """
    N, K, _ = preds.shape
    distances = np.full((N, K), -1, dtype=np.float32)
    # handle invalid values
    normalize[np.where(normalize <= 0)] = 1e6
    distances[mask] = np.linalg.norm(
        ((preds - targets) / normalize[:, None, :])[mask], axis=-1)
    return distances.T

def _distance_acc(distances, thr=0.5):
    """Return the percentage below the distance threshold, while ignoring
    distances values with -1.

    Note:
        batch_size: N
    Args:
        distances (np.ndarray[N, ]): The normalized distances.
        thr (float): Threshold of the distances.

    Returns:
        float: Percentage of distances below the threshold.
          If all target keypoints are missing, return -1.
    """
    distance_valid = distances != -1
    num_distance_valid = distance_valid.sum()
    if num_distance_valid > 0:
        return (distances[distance_valid] < thr).sum() / num_distance_valid
    return -1