[paper reading] CenterNet (Object as Points)

最新推荐文章于 2021-09-15 15:36:36 发布

Harry嗷

最新推荐文章于 2021-09-15 15:36:36 发布

阅读量408

点赞数

分类专栏： paper reading Detection 文章标签：机器学习人工智能深度学习计算机视觉论文笔记

本文链接：https://blog.youkuaiyun.com/qq_41683065/article/details/109548508

版权

paper reading 同时被 2 个专栏收录

17 篇文章

订阅专栏

Detection

11 篇文章

订阅专栏

[paper reading] CenterNet (Object as Points)

GitHub：Notes of Classic Detection Papers

2020.11.09更新：更新了Use Yourself，即对于本文的理解和想法，详情参见GitHub：Notes-of-Classic-Detection-Papers

本来想放到GitHub的，结果GitHub不支持公式。
没办法只能放到优快云，但是格式也有些乱
强烈建议去GitHub上下载源文件，来阅读学习！！！这样阅读体验才是最好的
当然，如果有用，希望能给个star！

topic	motivation	technique	key element	math	use yourself	relativity
CenterNet (Object as Points)	Problem to Solve Idea	CenterNet Architecture	Center Point & Anchor Getting Ground-Truth Model Output Data Augmentation Inference TTA Compared with SOTA Additional Experiments	Loss Function KeyPoint Loss $\text{L}_k$ Offset Loss $\text{L}_{off}$ Size Loss $\text{L}_{size}$	……	Anchor-Based KeyPoint-Based

文章目录

[paper reading] CenterNet (Object as Points)

Motivation

Problem to Solve

anchor-based method有以下的缺点：

wasteful & inefficient：

需要对object进行饱和式检测（饱和式地列出object的潜在位置）
need post-processing（e.g. NMS）

Idea

从本质上讲：

将Object Detection转化为Standard Keypoint Estimation
从思路上讲：

使用bounding box的center point表示一个object
从具体流程上讲：

使用keypoint estimation寻找center point，并根据center point回归其他的属性（因为其他的属性都和center point存在确定的数学关系）

Technique

CenterNet Architecture

Components

Backbone
- Stacked Hourglass Network
  
  详见 [CornerNet](./[paper reading] RetinaNet.md)
- Upconvolutional Residual Netwotk
- Deep Layer Aggregation（DLA）
Task-Specific Modality
- 1 个 3×3 Convolution
- ReLU
- 1 个 1×1 Convolution

Advantage

simpler & faster & accurate
end-to-end differential

所有的输出都是直接从keypoint estimation network输出，不需要NMS（以及其他post-processing）

Peak Keypoint Extraction由 $\ \text{Max Pooling}$ 实现，足够用来替换NMS
estimate additional object properties in one single forward pass

在单次前向传播中，可以估计出多种object properties

Key Element

Center Point & Anchor

Connection

center point可以看作是shape-agnostic anchor（形状不可知的anchor）

Difference

center point仅仅与location有关（与box overlap无关）

即：不需要手动设置foreground和background的threshold
每个object仅对应1个center point

直接在keypoint heatmap上提取local peak，不存在重复检测的问题
CenterNet有更大的输出分辨率

降采样步长为4（常见为16）

Getting Ground-Truth

详见 [Symbol Definition](#Symbol Definition)

Keypoint Ground-Truth

Ground-Truth：Input Image ==> Output Feature Map

$\in \mathcal{R}^2$ ：ground-truth keypoint
$\widetilde{p} = \lfloor\frac pR \rfloor$ ：low-resolution equivalent

将image的ground-truth keypoint $p$ 映射为output feature map上ground-truth keypoint $\widetilde p$
$\widetilde{p} = \lfloor\frac pR \rfloor$

Gaussian Penalty Reduction

$Y_{x y c}=\exp \left(-\frac{\left(x-\tilde{p}_{x}\right)^{2}+\left(y-\tilde{p}_{y}\right)^{2}}{2 \sigma_{p}^{2}}\right)$

$\sigma_{p}$ ：object size-adaptive的标准差

如果同一个类别的2个Gaussian发生重叠，则取element-wise maximum

keypoint heatmap：
$\hat{Y}\in[0,1]^{\frac{W}{R}×\frac HR×C}$

$\hat Y _{x,y,c} =1$ ==> keypoint
$\hat Y _{x,y,c} =0$ ==> background

注意：这里的center是bounding box的几何中心，即center到左右边和上下边的距离是相等的

Size Ground-Truth

bounding box 用4个点表示（第 $k$ 个object，类别为 $c_k$ ）：
$x_1^{(k)}, y_1^{(k)}, x_2^{(k)}, y_2^{(k)})$
Center 表示为：
$p_k = \big( \frac{x_1^{(k)} + x_2^{(k)} }{2} , \frac{y_1^{(k)} + y_2^{(k)} }{2} \big)$
Size Ground-Truth 表示为：
$s_k = \big(x_2^{(k)} - x_1^{(k)}, y_2^{(k)}-y_1^{(k)} \big)$

注意：不对scale进行归一化，而是直接使用raw pixel coordinate

Model Output

Input & Output Resolution：

512×512
128×128

所有的输出共享一个共用的全卷积网络

keypoint $\hat Y$ ==> $C$
offset $\hat O$ ==> 2
size $\hat S$ ==> 2

即：每个location都有C+4个output

对于each modality，在将feature经过：

1 个 3×3 Convolution
ReLU
1 个 1×1 Convolution

Data Augmentation

random flip
random scaling（0.6~1.3）
cropping
color jittering

Inference

CenterNet的Inference是single network forward pass

将image输入backbone（e.g. FCN），得到3个输出：
- keypoint $\hat Y$ ==> $C$
  
  heatmap的peak对应object的center（取top-100）
  
  peak的判定：值 $\ge$ 其8个邻居
- offset $\hat O$ ==> 2
- size $\hat S$ ==> 2
根据keypoint $\hat Y$ 、 offset $\hat O$ 、size $\hat S$ 计算bounding box
- $(\delta \hat x_i, \delta \hat x_i) = \hat O_{\hat x_i, \hat y_i}$ ：offset prediction
- $\hat w_i, \hat h_i) = \hat S _{\hat x_i, \hat y_i}$ ：size prediction
计算keypoint的confidence：keypoint对应位置的value
$\hat Y_{x_i,y_ic}$

TTA

有3种TTA方式：

no augmentation
flip augmentation

flip：在decoding之前，进行output average
flip & multi-scale（0.5，0.75，1，1.25，1.5）

multi-scale：使用NMS对结果进行聚合

Compared with SOTA

在这里插入图片描述

Additional Experiments

Center Point Collision

多个object经过下采样，其center keypoint有可能重叠

CenterNet可以减少Center Keypoint的冲突

NMS

CenterNet使用了NMS提升很小，说明CenterNet不需要NMS

Training & Testing Resolution

低分辨率速度最快但是精度最差
高分辨率精度提高，但速度降低
原尺寸速度略高于高分辨率，但速度略慢

Regression Loss

smooth L1 Loss的效果略差于L1 Loss

在这里插入图片描述

Bounding Box Size Weight

$\lambda_{size}$ 为0.1时最佳，增大时AP快速衰减，减小时鲁棒

在这里插入图片描述

Training Schedule

训练时间更长，效果更好

在这里插入图片描述

Math

Symbol Definition

$\in R^{W×H×3}$ ：image
$R$ ：output stride，实验中为4
$C$ ：keypoint的类别数

Loss Function

$\text{L}_{det} = \text{L}_k + \lambda_{size} \text{L}_{size} + \lambda_{off} \text{L}_{off}$

$\lambda_{size} = 0.1$
$\lambda_{off} = 1$

KeyPoint Loss $\text{L}_k$

penalty-reduced pixel-wise logistic regression with focal loss

<img src="[paper reading] CenterNet (Object as Points).assets/image-20201105190626950.png" alt="image- 在这里插入图片描述

$\hat{Y}_{xyc}$ ：predicted keypoint confidence
$\alpha =2,\beta=4$

Offset Loss $\text{L}_{off}$

目的：恢复由下采样带来的离散化错误（discretization error）

在这里插入图片描述

$\hat O \in \mathcal R^{\frac{W}{R}×\frac HR×2}$ ：predicted local offset

注意：

仅仅对keypoint location（positive）计算
所有的类别共享相同的offset prediction

Size Loss $\text{L}_{size}$

在这里插入图片描述

$\hat{S}_{p_{k}} \in \mathcal R^{\frac{W}{R}×\frac HR×2}$
$s_k = \big(x_2^{(k)} - x_1^{(k)}, y_2^{(k)}-y_1^{(k)} \big)$

Use Yourself

……

Related work

Anchor-Based Method

Essence

将detection降级为classification

Two-Stage Method

在image上放置anchor（同 [One-Stage Method](#One-Stage Method)）

即：在low-resolution上dense & grid采样anchor，分类为foreground/background ==> proposal
具体的label：
- foreground：
  
  与任意ground-truth box有 > 0.7 的IoU
- background：
  
  与任意ground-truth box有 < 0.3 的IoU
- ignored：
  
  与任意ground-truth box 的IoU $\in [0.3, 0.7]$
对anchor进行feature resample