Paper: Correlation Congruence for Knowledge Distillation
1, Motivation:
通常情况下KD的teacher模型的特征空间没考虑类内类间的分布,student模型也将缺少我们期望的类内类间的分布特性。
Usually, the embedding space of teacher possesses the characteristic that intra-class instances cohere together while inter-class instances separate from each other. But its counterpart of student model trained by instance congruence would lack such desired characteristic.
2,Contribution:
- 提出相关一致性知识蒸馏(CCKD),它不仅关注实例一致性,而且关注相关一致性。(instance congruence通过mini-batch的PK或聚类实现。correlation congruence通过样本I,J直接的相关性损失函数的约束实现实现。)
- 将mini-batch中的相关性计算直接转成mini-batch的的大矩阵进行,减少计算量。
- 采用不同的mini-batch sampler strategies.
- 在CIFAR-100, ImageNet-1K, person reidentification and face recognition进行实验。