Distilling Object Detectors with Fine-grained Feature Imitation的复现

最新推荐文章于 2024-02-21 15:51:06 发布

原创

最新推荐文章于 2024-02-21 15:51:06 发布 · 3.9k 阅读

22 ·

CC 4.0 BY-SA版权

文章标签：

#检测的蒸馏 #检测特征图层面蒸馏 #Distilling Object Detectors wi

复现基于原文开源代码：https://github.com/twangnh/Distilling-Object-Detectors

代码问题和细节可以在我的github讨论：

https://github.com/HqWei/Distillation-of-Faster-rcnn

这篇文章的本质是对于目标检测在Feature Level的蒸馏的改进，你首先得实现检测的特征图层面的蒸馏，实现起来比较简单：

sup_feature=output_teacher['features'][0]
stu_feature=output['features'][0]
＃model_adap是一个卷积层＋Relu层：作用是把student网络的特征图变得和ｔｅａｃｈｅｒ一样,通道数相同，后面才能直接求Ｌ２距离。
stu_feature_adap=model_adap(stu_feature)

start_weigth=cfg_feature_distillation.get('start_weigth')
end_weigth=cfg_feature_distillation.get('end_weigth')

imitation_loss_weigth=start_weigth+(end_weigth-start_weigth)*(float(epoch)/max_epoch)
# imitation_loss_weigth=0.0001
＃Ｌ２距离：特征图对应位置的差值的平方和
sup_loss = (torch.pow(sup_feature - stu_feature_adap, 2)).sum()
sup_loss = sup_loss * imitation_loss_weigth

然后就是本文的核心创新点：（没必要让ｓｔｕｄｅｎｔ在整个特征图模仿ｔｅａｃｈｅｒ，只需要在ＧＴ附近模仿）：

主要难点在于ｍａｓｋ的生成，原文是：

Specifically, as shown in Fig. 2, for each ground truth box, we compute the IOU between it and all anchors, which
forms a W × H × K IOU map m. Here W and H denote width and height of the feature map, and K indicates the
K preset anchor boxes. Then we find the largest IOU value M = max(m), times the thresholding factor ψ to obtain
a filter threshold F = ψ ∗ M . With F , we filter the IOU map to keep those larger then F locations and combine them
with OR operation to get a W × H mask.

大意是：先计算ＧＴ框和所有ａｎｃｈｏｒ的ＩＯＵ，得到一个ＷｘＨｘＫ的ＩＯＵｍａｐ：称为ｍ；Ｗ和Ｈ是特征图的高宽，Ｋ是单个点产生的ａｎｃｈｏｒ的数量（如ａｎｃｈｏｒ－ｒａｔｅ为0.5,1,2;scale为2,4,8,16,32时，Ｋ=3x5），也就是一个Ａｎｃｈｏｒ得到一个ＷｘＨ的ＩＯＵ得分图，这个得分图里面每个点（ＷｘＨ个）的值指的是该位置产生的ａｎｃｈｏｒ与ＧＴ的ＩＯＵ，对比Ｋ个ＷｘＨ，取ｋ个最大的值因为只要有一个ＩＯＵ大说明那地方离ＧＴ近。而最后我们得到的是一个ＷｘＨ的ｍａｓｋ，也就是只有

最低0.47元/天解锁文章