模型蒸馏笔记

hit56笔记

已于 2025-03-09 08:47:44 修改

阅读量845

点赞数 3

文章标签：笔记

于 2024-05-26 20:20:47 首次发布

本文链接：https://blog.youkuaiyun.com/zh515858237/article/details/139219596

版权

文章目录

一、什么是模型蒸馏
二、如何蒸馏
三、常见问题
- 3.1
四、参考文献

一、什么是模型蒸馏

Hinton在NIPS2014提出了知识蒸馏（Knowledge Distillation）的概念，旨在把一个大模型或者多个模型ensemble学到的知识迁移到另一个轻量级单模型上，方便部署。简单的说就是用小模型去学习大模型的预测结果，而不是直接学习训练集中的label。
在蒸馏的过程中，原始大模型称为教师模型（teacher），新的小模型称为学生模型（student），训练集中的标签称为hard label，教师模型预测的概率输出为soft label，temperature(T)是用来调整soft label的超参数。
学习软标签之所以能work，核心是因为好模型的目标不是拟合训练数据，而是学习如何泛化到新的数据。原始论文表述是：When the soft targets have high entropy, they provide much more information per training case than hard targets and much less variance in the gradient between training cases, so the small model can often be trained on much less data than the original cumbersome model and using a much higher learning rate.