bmvc 2019
motivation
more attention paid on two types of hard samples:
- hard-to-learn samples predicted by teacher with low certainty
- hard-to-mimic samples with a large gap between the teacher’s and the student’s prediction
ADL
- enlarges the distillation loss for hard-to-learn and hard-to-mimic samples and reduces distillation loss for the dominant easy samples
- single-stage detector
However, when applying it on object detection, due to the ”small” capacity of the student network, it is hard to mimic all feature maps or logits well.
two-stage detector
- Learning efficient object detection models with knowledge distillation , 2017
weighted cross-entropy loss to underweight matching errors in background regions - Mimicking very efficient network for object detection , 2017
mimicked feature maps between the student