bmvc 2019
motivation
more attention paid on two types of hard samples:
- hard-to-learn samples predicted by teacher with low certainty
- hard-to-mimic samples with a large gap between the teacher’s and the student’s prediction
ADL
- enlarges the distillation loss for hard-to-learn and hard-to-mimic samples and reduces distillation loss for the dominant easy samples
- single-stage detector
However, when applying it on object detection, due to the ”small” capacity of the student network, it is hard to mimic all feature maps or logits well.
two-stage detector
- Learning efficient object detection models with knowledge distillation , 2017
weighted cross-entropy loss to underweight matching errors in background regions - Mimicking very efficient network for object detection , 2017
mimicked feature maps between the student
深度学习:Adaptive Distillation在目标检测中的应用

本文探讨了在目标检测任务中如何使用Adaptive Distillation(ADL)来提升学生网络的学习效率。ADL通过调整权重,对难以学习和模仿的样本加大损失,减少对简单样本的关注。研究比较了单阶段和两阶段检测器中的应用,并指出在处理密集锚点的单阶段检测器中,样本不平衡问题尤其具有挑战性。ADL结合了focal loss和KL散度,通过调整参数控制不同难度样本的权重,优化知识蒸馏效果。
最低0.47元/天 解锁文章
827

被折叠的 条评论
为什么被折叠?



