HuggingFace PEFT低阶API指南：深入理解适配器注入机制-优快云博客

本文链接：https://blog.youkuaiyun.com/gitblog_00798/article/details/148374875

HuggingFace PEFT低阶API指南：深入理解适配器注入机制

peft 🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning. 项目地址: https://gitcode.com/gh_mirrors/pe/peft

前言

在大型预训练模型微调领域，参数高效微调(PEFT)技术已成为降低计算资源需求的关键方法。本文将深入探讨HuggingFace PEFT库中的低阶API——适配器注入(Adapter Injection)机制，帮助开发者理解其工作原理和实际应用场景。

适配器注入概述

适配器注入是PEFT库提供的一种底层技术，允许开发者直接将可训练的适配器模块注入到任何PyTorch模型中，而无需依赖PEFT提供的模型类封装。这种技术特别适合需要高度自定义的场景。

当前支持的适配器类型

PEFT目前支持三种主流适配器类型的注入：

LoRA（低秩适配）：通过低秩分解实现参数高效微调
AdaLoRA（自适应低秩适配）：动态调整适配器秩的改进版本
IA3（注入适配激活）：通过缩放激活值实现微调

适配器注入的优缺点分析

优势

原位修改：直接在原模型上进行修改，保留所有原始属性和方法
灵活性高：适用于任何PyTorch模块和各种模态任务
多适配器支持：可同时注入多个不同配置的适配器

局限性

需要手动实现模型的保存和加载逻辑
无法使用PEFT模型类提供的高级功能（如适配器禁用/合并）

实战：适配器注入全流程

1. 创建并注入适配器

以下示例展示如何向自定义模型注入LoRA适配器：

import torch
from peft import inject_adapter_in_model, LoraConfig

# 定义自定义模型
class CustomModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.embedding = torch.nn.Embedding(10, 10)
        self.transformer = torch.nn.Linear(10, 10)
        self.head = torch.nn.Linear(10, 10)

    def forward(self, inputs):
        x = self.embedding(inputs)
        x = self.transformer(x)
        return self.head(x)

# 配置LoRA参数
lora_config = LoraConfig(
    lora_alpha=16,       # 缩放因子
    lora_dropout=0.1,    # 丢弃率
    r=64,                # 秩大小
    bias="none",         # 偏置处理方式
    target_modules=["transformer"],  # 目标模块
)

# 实例化并注入适配器
model = CustomModel()
model = inject_adapter_in_model(lora_config, model)

2. 模型结构分析

注入适配器后，目标模块会被添加LoRA特有的子模块：

lora_dropout：适配器的随机丢弃层
lora_A/lora_B：低秩分解矩阵
lora_embedding_A/B：嵌入层适配参数（如适用）

3. 模型保存与加载

保存适配器参数

from peft import get_peft_model_state_dict

# 仅保存适配器参数
adapter_state = get_peft_model_state_dict(model)
torch.save(adapter_state, "lora_adapter.bin")

加载适配器参数

from peft import set_peft_model_state_dict

# 初始化新模型并注入适配器
new_model = CustomModel()
new_model = inject_adapter_in_model(lora_config, new_model)

# 加载适配器参数
adapter_state = torch.load("lora_adapter.bin")
load_result = set_peft_model_state_dict(new_model, adapter_state)

# 检查是否有未匹配的参数
print("未匹配参数:", load_result.unexpected_keys)

4. 内存优化技巧

对于大型模型，可以使用内存优化模式：

# 元设备初始化（不占用实际内存）
model = inject_adapter_in_model(lora_config, model, low_cpu_mem_usage=True)

# 延迟加载参数（仅在需要时占用内存）
set_peft_model_state_dict(model, adapter_state, low_cpu_mem_usage=True)