微调大模型的工具选择

每天八杯水D

已于 2024-12-24 17:05:32 修改

阅读量1.1k

点赞数 11

分类专栏： LLM垂直领域的微调实战文章标签： PEFT LoRA 微调微调工具 LLM

于 2024-12-24 11:20:06 首次发布

本文链接：https://blog.youkuaiyun.com/weixin_45947938/article/details/144688823

版权

LLM垂直领域的微调实战专栏收录该内容

8 篇文章

订阅专栏

工具1：LLaMAFactory：

介绍

LLaMAFactory 是一个全栈微调工具，支持海量模型和各种主流微调方法，包括 LoRA。
它提供了运行脚本微调和基于 Web 端微调的能力，自带基础训练数据集，并且支持增量预训练和全量微调。
LLaMAFactory 在使用 Alpaca 样式的数据集进行微调时，会自动在 prompt 添加 template，这对于微调大模型后使用 vllm 推理是必要的。

工具2：Hugging Face 的 PEFT 包：

介绍

Hugging Face 提供了丰富的预训练模型和微调工具，支持大多数主流的 NLP 任务。
PEFT（Parameter-Efficient Fine-Tuning）是 Hugging Face 开源的微调基础工具，特别适合各种微调任务，包括 LoRA。
LoRA（Low-Rank Adaptation）是一种技术，通过低秩分解将权重更新表示为两个较小的矩阵，从而加速大型模型的微调，并减少内存消耗。

微调代码示例如下：

# 按照peft包、确定微调方法：LoraConfig, AdaptionPromptConfig, PrefixTuningConfig
from peft import LoraConfig, TaskType, get_peft_model, PeftModel
config = LoraConfig(target_modules=["query_key_value"], modules_to_save=["post_attention_layernorm"])

# 创建微调模型接下来训练
model = get_peft_model(model, config)
print(model.print_trainable_parameters())		# 打印all params、trainable params

# 配置训练参数
args = TrainingArguments(
    output_dir="./chatbot",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=8,
    gradient_checkpointing=True,
    logging_steps=100,
    num_train_epochs=10,
    learning_rate=1e-4,
    remove_unused_columns=False,
    save_strategy="epoch"
)

# 创建训练器
trainer = Trainer(
    model=model,
    args=args,
    train_dataset=tokenized_ds.select(range(10000)),
    data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
)

# 开始训练
trainer.train()

# 保存微调模型的参数和分词器
lora_path='./GLM4'
trainer.model.save_pretrained(lora_path)
tokenizer.save_pretrained(lora_path)

原模型加载LoRA权重进行推理代码如下：

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from peft import PeftModel

mode_path = '/root/autodl-tmp/glm-4-9b-chat/ZhipuAI/glm-4-9b-chat'
lora_path = './GLM4_lora'

# 加载tokenizer
tokenizer = AutoTokenizer.from_pretrained(mode_path, trust_remote_code=True)

# 加载原模型
model = AutoModelForCausalLM.from_pretrained(
    mode_path, 
    device_map="auto",
    torch_dtype=torch.bfloat16, 
    trust_remote_code=True).eval()

# 加载lora权重
model = PeftModel.from_pretrained(model, model_id=lora_path)

prompt = "你是谁？"
inputs = tokenizer.apply_chat_template([{"role": "system", "content": "你是关系抽取专家"},
                                        {"role": "user", "content": prompt}],
                                       add_generation_prompt=True,
                                       tokenize=True,
                                       return_tensors="pt",
                                       return_dict=True
                                       ).to('cuda')


gen_kwargs = {"max_length": 2500, "do_sample": True, "top_k": 1}
with torch.no_grad():
    outputs = model.generate(**inputs, **gen_kwargs)
    outputs = outputs[:, inputs['input_ids'].shape[1]:]
    print(tokenizer.decode(outputs[0], skip_special_tokens=True))

合并模型代码如下：

from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM, TextIteratorStreamer, GenerationConfig
from peft import PeftModel
 
# 载入预训练模型
tokenizer = AutoTokenizer.from_pretrained(
    base_model, 
    use_fast=True, 
    padding_side="left", **config_kwargs)
    print("Tokenizer Load Success!")

config = AutoConfig.from_pretrained(base_model, **config_kwargs)

# Load and prepare pretrained models (without valuehead).
model = AutoModelForCausalLM.from_pretrained(
        base_model,
        config=config,
        torch_dtype=torch.float16,
        low_cpu_mem_usage=True,
        trust_remote_code=True,
        revision='main'
)

print('origin config =', model.config)

# 合并模型
lora_path = "./save_lora"
model = PeftModel.from_pretrained(model, lora_path)
model = model.merge_and_unload()
print('merge config =', model.config)

# 保存合并模型
save_path = "./save_merge_model
model.save_pretrained(save_path)
tokenizer.save_pretrained(save_path)