DeepSeek-7B-chat 模型的 QLoRA 微调

最新推荐文章于 2025-05-12 11:21:28 发布

原创最新推荐文章于 2025-05-12 11:21:28 发布 · 588 阅读

9 ·

CC 4.0 BY-SA版权

文章标签：

#深度学习

本科项目实训专栏收录该内容

8 篇文章

订阅专栏

在这篇博客中，我将介绍如何使用 QLoRA 技术对 DeepSeek-7B-chat 模型进行微调。QLoRA 是一种高效的微调方法，能够在有限的计算资源下对大型语言模型进行定制化训练。

1. 环境准备

首先，我们需要安装必要的 Python 包：

pip install transformers datasets pandas peft bitsandbytes accelerate

具体版本为：
transformers 4.38.0
peft 0.10.0
accelerate 0.26.0
dataset 2.14.6

2. 数据准备

我们使用 HarmonyOS 的训练数据进行微调。数据需要转换为标准的指令-输入-输出格式：

import pandas as pd
from datasets import Dataset

# 读取训练数据
df = pd.read_json('train.json')
ds = Dataset.from_pandas(df)

3. 模型加载与量化配置

我们使用 Hugging Face 的 transformers 库加载 DeepSeek-7B-chat 模型，并应用量化配置以减少内存使用：

安装huggingface_hub：

pip install -U huggingface_hub

windows:

$env:HF_ENDPOINT = "https://hf-mirror.com"

linux:

export HF_ENDPOINT=https://hf-mirror.com

下载：

huggingface-cli download deepseek-ai/deepseek-llm-7b-chat --local-dir ./model_temp/deepseek-llm-7b-chat --local-dir-use-symlinks False

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

# 加载分词器
tokenizer = AutoTokenizer.from_pretrained('model_tmp/deepseek-llm-7b-chat/', use_fast=False, trust_remote_code=True)
tokenizer.padding_side = 'right'

# 创建量化配置
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.half,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True
)

# 加载模型
model = AutoModelForCausalLM.from_pretrained(
    'model_tmp/deepseek-llm-7b-chat/', 
    trust_remote_code=True, 
    torch_dtype=torch.half, 
    low_cpu_mem_usage=True,
    quantization_config=quantization_config
)

4. 数据处理与格式化

我们定义一个数据处理函数，将训练数据格式化为模型可以理解的输入：

def process_func(example):
    MAX_LENGTH = 384
    instruction = tokenizer(f"User: {example['instruction']+example['input']}\\n\\n", add_special_tokens=False)
    response = tokenizer(f"Assistant: {example['output']}<｜end▁of▁sentence｜>", add_special_tokens=False)
    input_ids = instruction["input_ids"] + response["input_ids"] + [tokenizer.pad_token_id]
    attention_mask = instruction["attention_mask"] + response["attention_mask"] + [1]
    labels = [-100] * len(instruction["input_ids"]) + response["input_ids"] + [tokenizer.pad_token_id]
    if len(input_ids) > MAX_LENGTH:
        input_ids = input_ids[:MAX_LENGTH]
        attention_mask = attention_mask[:MAX_LENGTH]
        labels = labels[:MAX_LENGTH]
    return {
        "input_ids": input_ids,
        "attention_mask": attention_mask,
        "labels": labels
    }

tokenized_id = ds.map(process_func, remove_columns=ds.column_names)

5. LoRA 配置与应用

我们使用 PEFT 库配置和应用 LoRA，以减少训练参数数量：

from peft import LoraConfig, TaskType, get_peft_model

# 配置LoRA
config = LoraConfig(
    task_type=TaskType.CAUSAL_LM, 
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    inference_mode=False,
    r=4,
    lora_alpha=32,
    lora_dropout=0.1
)

# 应用LoRA配置到模型
model = get_peft_model(model, config)

6. 训练配置与执行

我们使用 transformers 库的 Trainer 类进行训练：

from transformers import TrainingArguments, Trainer

# 配置训练参数
args = TrainingArguments(
    output_dir="./output/DeepSeek",
    per_device_train_batch_size=16,
    gradient_accumulation_steps=1,
    logging_steps=10,
    num_train_epochs=30,
    save_steps=100,
    learning_rate=1e-4,
    save_on_each_node=True,
    gradient_checkpointing=True,
    optim="paged_adamw_32bit"
)

# 创建训练器
trainer = Trainer(
    model=model,
    args=args,
    train_dataset=tokenized_id,
    data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
)

# 开始训练
trainer.train()

7. 模型测试与保存

训练完成后，我们可以测试模型并保存结果：

def test_model(text):
    inputs = tokenizer(f"User: {text}\\n\\n", return_tensors="pt")
    outputs = model.generate(**inputs.to(model.device), max_new_tokens=100)
    result = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(f"输入: {text}")
    print(f"输出: {result}")
    return result

# 测试示例
test_model("在ArkTS中，如何为一个包含imageUrl（字符串类型）和isAdd（布尔类型）属性的类创建构造函数？")

# 保存模型
model.save_pretrained("./output/DeepSeek/final_model")