Pruning Large Language Models with Semi-Structural Adaptive Sparse Training

本文是LLM系列文章,针对《Pruning Large Language Models with Semi-Structural Adaptive Sparse Training》的翻译。

通过半结构化自适应稀疏训练修剪大型语言模型

摘要

大型语言模型(LLM)在各种复杂任务中的巨大成功在很大程度上依赖于其巨大的规模,由于其大量的内存消耗,这给模型部署带来了挑战。最近,许多研究尝试使用一次性剪枝方法来压缩 LLM。然而,这些方法在复杂的语言理解任务上通常会出现相当大的性能下降,这使人们对LLM中剪枝的可行性产生了质疑。为了解决这个问题,我们提出了一种通过再训练对半结构化稀疏模型进行修剪的管道,称为自适应稀疏训练器(AST)。与之前的一次性剪枝方法不同,AST 通过对屏蔽权重应用衰减,逐步将密集模型转换为稀疏模型,同时允许模型在整个训练过程中自适应选择屏蔽。此外,我们观察到使用密集模型的蒸馏作为教师可以防止稀疏模型陷入局部最优并加速收敛。此外,我们还结合了额外的良好初始化参数,以在内存占用量增加最小的情况下进一步增强模型性能。 AST 可以显着增强模型性能,接近密集模型的水平。当应用于 LLaMA2-7B 模型时,AST 在多个零样本任务中将密集模型和半结构化稀疏模型之间的零样本精度差距缩小到 1.12%,而使用的预训练token不到 0.4%。我们的工作证明了部署半结构化稀疏大型语言模型的可行性,并介绍了一种与现有量化技术相结合实现高度压缩模型的新方法。

1 引言

2 相关工作

3 方法

4 实验

5 结论

在本文中,我们介绍了自适应稀疏训练器(AST),这是一种专为半结构化稀疏模型设计的新型训练管道。 AST 有效地缩小了密集型和稀疏型大型

### Isomorphic Pruning Technique in Vision Models Explained Isomorphic pruning is a method that aims to reduce the computational complexity of deep neural networks while preserving their performance by selectively removing redundant or less important parameters from models. This process ensures that the pruned network retains an architecture similar (isomorphic) to its original form but operates more efficiently. In vision models specifically, this technique can be applied through several strategies: #### Criteria for Selecting Parameters to Remove The selection criteria often involve evaluating parameter importance based on metrics such as magnitude, gradient flow, or contribution towards specific tasks like classification accuracy[^1]. By identifying those weights which have minimal impact when removed, one can effectively trim down layers without significantly affecting overall functionality. #### Maintaining Architectural Integrity Post-Pruning To maintain architectural integrity post-pruning, methods may include maintaining certain structural properties during removal processes—such as ensuring connectivity patterns within convolutional filters remain intact after thinning operations are performed upon them[^2]. #### Re-training After Pruning After applying these techniques, it's common practice to fine-tune or retrain the model so any slight loss due to weight elimination gets compensated over time via optimization algorithms adjusting remaining connections accordingly[^3]. ```python def prune_network(model, criterion='magnitude', threshold=0.01): """ Applies isomorphic pruning on a given PyTorch model. Args: model (torch.nn.Module): The target neural network module. criterion (str): Criterion used for selecting parameters ('magnitude'). threshold (float): Threshold value below which parameters will be pruned. Returns: torch.nn.Module: A pruned version of input `model`. """ import copy pruned_model = copy.deepcopy(model) for name, param in pruned_model.named_parameters(): if 'weight' in name: mask = abs(param.data) >= threshold # Apply masking according to selected criterion if criterion == "magnitude": masked_param = param.data * mask.float() setattr(pruned_model, name.split('.')[0], type(getattr(pruned_model,name.split('.')[0]))(masked_param)) return pruned_model ```
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

UnknownBody

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值