文章目录
Structured Knowledge Distillation for Dense Prediction
今天看一篇沈老师他们的工作,是知识蒸馏(knowledge distillation)相关
20200326 今天再看一遍这个文章,以前没有关注他的pair-wise distillation,今天要做相关工作,重新看一遍,复现相关代码2333
- Structured Knowledge Distillation for Dense Prediction
Previous knowledge distillation strategies used for dense prediction tasks often directly borrow the distillation scheme for image classification and perform knowledge distillation for each pixel separately, leading to sub-optimal performance
文章摘要说,以前的KD都是对每一个像素学习知识,会得到一个次优的解。他说的是对于dense prediction。这是为什么呢?
Here we propose to distill structured knowledge from large networks to small networks, taking into account the fact that dense prediction is a structured prediction problem.
由于dense prediction就是一个结构预测的问题,所以提出了一个‘蒸馏结构知识’的方法。有两种结构蒸馏方案,他管以前的KD叫做pixel-wise distillation
pair-wise distillation
The pair-wise distillation scheme is motivated by the widely-studied pair-wise Markov random field framework. 引文23
holistic distillation
The holistic distillation scheme aims to align higher-order consistencies
Specifically, we study two structured distillation schemes:
i) pair-wise distillation that distills the pairwise similarities by building astatic graph;
and ii) holistic distillation that usesadversarial trainingto distill holistic knowledge.
Dense Prediction
Dense prediction is a category of fundamental problems in computer vision, which learns a
mapping from input objects to complex output structures, including semantic segmentation, depth estimation and object detection, among many others.
将输入映射为复杂的结构输出,那么他的这种结构蒸馏好像不是我们图像恢复需要的?(思想好像可以照搬,但是他的设计可能更关注dense structure)
基于以上考虑,看了下pipeline

除了holistic loss不清楚怎么算的,其他好像很有道理。
好像文章都很热衷于各种trick疯狂刷指标。。。。
文章将他应用在语义分割,深度估计,目标检测
- 方法
I W × H × 3 W\times H\times 3 W×H×3 RGB输入
F W × H × N W\times H\times N W×H×N 输入I的feature map
Q W × H × C W\times H\times C W×H×C F经过a classifier计算的分割map???上采样到 W × H W\times H W×H作为分割结果。
Segmentation Q feature map变换后的结果作为分割即分类的结果
?这个变换是什么?
Pixel-wise distillation
计算S和T输出的概率图之间KL散度

Pair-wise distillation
static affinity graph表示空间成对的关系


β \beta β 使用平均池化得到聚合的节点
翻译一下:
- 节点(Node)表示不同的空间位置,两个节点之间的连接表示相似性(similarity)
- 每个节点的连接范围(connection range alpha)和间隔尺寸(granularity beta)控制着静态关联图(static affinity graph)的大小
- 由上面图示可知,alpha表示相邻的节点,beta表示聚合多少个像素作为一个节点
总共会有 W ′ × H ′ β \frac{W'\times H'}{\beta} β

最低0.47元/天 解锁文章
6797





