COCO key point evaluation metric

最新推荐文章于 2025-01-13 12:01:33 发布

ForABiggerWorld

最新推荐文章于 2025-01-13 12:01:33 发布

阅读量1.1k

点赞数

CC 4.0 BY-SA版权

本文链接：https://blog.youkuaiyun.com/zjucor/article/details/85197810

deep learning 专栏收录该内容

92 篇文章

订阅专栏

COCO关键点评估采用对象关键点相似性（OKS）作为相似性度量，类似于对象检测中的IoU。OKS计算预测关键点与真实关键点之间的欧氏距离，并考虑对象规模和关键点可见性，用于评估关键点检测的精度。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

http://cocodataset.org/#keypoints-eval

1. Keypoint Evaluation

This page describes the keypoint evaluation metrics used by COCO. The evaluation code provided here can be used to obtain results on the publicly available COCO validation set. It computes multiple metrics described below. To obtain results on the COCO test set, for which ground-truth annotations are hidden, generated results must be uploaded to the evaluation server. The exact same evaluation code, described below, is used to evaluate results on the test set.

1.1. Evaluation Overview

The COCO keypoint task requires simultaneously detecting objects and localizing their keypoints (object locations are not given at test time). As the task of simultaneous detection and keypoint estimation is relatively new, we chose to adopt a novel metric inspired by object detection metrics. For simplicity, we refer to this task as keypoint detection and the prediction algorithm as the keypoint detector. We suggest reviewing the evaluation metrics for object detection before proceeding.

The core idea behind evaluating keypoint detection is to mimic the evaluation metrics used for object detection, namely average precision (AP) and average recall (AR) and their variants. At the heart of these metrics is a similarity measure between ground truth objects and predicted objects. In the case of object detection, the IoU serves as this similarity measure (for both boxes and segments). Thesholding the IoU defines matches between the ground truth and predicted objects and allows computing precision-recall curves. To adopt AP/AR for keypoints detection, we only need to define an analogous similarity measure. We do so by defining an object keypoint similarity (OKS) which plays the same role as the IoU.

1.2. Object Keypoint Similarity

For each object, ground truth keypoints have the form [x1,y1,v1,...,xk,yk,vk], where x,y are the keypoint locations and v is a visibility flag defined as v=0: not labeled, v=1: labeled but not visible, and v=2: labeled and visible. Each ground truth object also has a scale s which we define as the square root of the object segment area. For details on the ground truth format please see the download page.

For each object, the keypoint detector must output keypoint locations and an object-level confidence. Predicted keypoints for an object should have the same form as the ground truth: [x1,y1,v1,...,xk,yk,vk]. However, the detector's predicted vi are notcurrently used during evaluation, that is the keypoint detector is not required to predict per-keypoint visibilities or confidences.

We define the object keypoint similarity (OKS) as:

OKS = Σi[exp(-di2/2s2κi2)δ(vi>0)] / Σi[δ(vi>0)]

The di are the Euclidean distances between each corresponding ground truth and detected keypoint and the vi are the visibility flags of the ground truth (the detector's predicted vi are not used). To compute OKS, we pass the di through an unnormalized Guassian with standard deviation sκi, where s is the object scale and κi is a per-keypont constant that controls falloff. For each keypoint this yields a keypoint similarity that ranges between 0 and 1. These similarities are averaged over all labeled keypoints (keypoints for which vi>0). Predicted keypoints that are not labeled (vi=0) do not affect the OKS. Perfect predictions will have OKS=1 and predictions for which all keypoints are off by more than a few standard deviations sκi will have OKS~0. The OKS is analogous to the IoU. Given the OKS, we can compute AP and AR just as the IoU allows us to compute these metrics for box/segment detection.