Knowledge Distillation(10)——Born Again Neural Networks

本文介绍了一种训练方法,使学生模型能够在某些方面超越教师模型。这种方法不是为了模型压缩,而是为了让学生模型在性能上超越教师模型。实验结果显示,经过特定训练的学生模型确实实现了这一目标。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

这篇论文不是用作模型压缩的,作者想的是让student超越teacher
在这里插入图片描述
其训练方式如下:
在这里插入图片描述
实验结果,student超越teacher了:
在这里插入图片描述

Training deep models for lane detection is challenging due to the very subtle and sparse supervisory signals in- herent in lane annotations. Without learning from much richer context, these models often fail in challenging sce- narios, e.g., severe occlusion, ambiguous lanes, and poor lighting conditions. In this paper, we present a novel knowl- edge distillation approach, i.e., Self Attention Distillation (SAD), which allows a model to learn from itself and gains substantial improvement without any additional supervision or labels. Specifically, we observe that attention maps ex- tracted from a model trained to a reasonable level would encode rich contextual information. The valuable contex- tual information can be used as a form of ‘free’ supervision for further representation learning through performing top- down and layer-wise attention distillation within the net- work itself. SAD can be easily incorporated in any feed- forward convolutional neural networks (CNN) and does not increase the inference time. We validate SAD on three pop- ular lane detection benchmarks (TuSimple, CULane and BDD100K) using lightweight models such as ENet, ResNet- 18 and ResNet-34. The lightest model, ENet-SAD, per- forms comparatively or even surpasses existing algorithms. Notably, ENet-SAD has 20 × fewer parameters and runs 10 × faster compared to the state-of-the-art SCNN [16], while still achieving compelling performance in all bench- marks. Our code is available at https://github. com/cardwing/Codes-for-Lane-Detection.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值