【论文 | 】Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in theWild-优快云博客

Pixel-in-PixelNet是一种针对野生环境高效面部地标检测的方法，它探讨了热图回归和坐标回归之间的联系。热图回归虽然精度高但计算成本大，对异常值敏感，而坐标回归则速度快但准确性不足。该研究旨在结合两者的优点，提出了PIP回归、邻居回归模块和自我训练的课程框架，并利用从基于CNN的面部地标检测器中观察到的隐含先验。通过这种方式，提高了跨域泛化能力和检测精度，同时保持了推理速度。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Pixel-in-Pixel Net: Towards Effificient Facial Landmark Detection in the Wild

IJCV：

Abstract

Related Work

For deep learning based facial landmark detection, there are two widely used detection heads, namely heatmap regression and coordinate regression. Heatmap regression can achieve good results, but it has two drawbacks: (1) it is computationally expensive; (2) it is sensitive to outliers (see Figure 5(b)). In contrast, coordinate regression is fast and robust, but not accurate enough (see Figure 5(a)). Although coordinate regression can be used in a multi-stage manner to yield better performance, its inference speed becomes slow as a result.

基于DL脸部点检测方法×2：热图回归、坐标回归

Heatmap：√结果好 ×计算量大，对异常值敏感

coordinate：√快速且鲁棒 ×准确度不够(虽然多阶段来提高性能但会降低速度)

⇒目的是结合二者优点（the first study in this area that discusses the connection between heatmap and coordinate regression.）