deepseek蒸馏

最新推荐文章于 2025-05-04 12:02:17 发布

xnuscd

最新推荐文章于 2025-05-04 12:02:17 发布

阅读量1k

点赞数 7

CC 4.0 BY-SA版权

文章标签： python

本文链接：https://blog.youkuaiyun.com/xnuscd/article/details/145743795

1. 安装必要的依赖

%%capture
!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

%%capture: 该指令用于抑制终端输出，使得安装日志不会占据 Notebook 的输出区域。
安装 unsloth: unsloth 是一个用于高效加载和微调大模型的库。
强制安装 unsloth 的最新 nightly 版本:
- --force-reinstall: 强制重新安装，即使已安装。
- --no-cache-dir: 不使用缓存，确保下载最新版本。
- --no-deps: 不安装额外的依赖项，避免影响已有环境。

2. 导入 Unsloth 并初始化参数

from unsloth import FastLanguageModel
import torch

max_seq_length = 2048
dtype = None
load_in_4bit = True

FastLanguageModel: Unsloth 提供的高效加载 LLM（大语言模型）的 API。
设置模型参数:
- max_seq_length = 2048：最大序列长度设定为 2048，这决定了模型一次可以处理的最大 token 数量。
- dtype = None：使用默认数据类型。
- load_in_4bit = True：使用 4-bit 量化，减少显存占用，提高运行效率。

3. 安装 WandB（训练监控工具）

!pip install wandb

wandb（Weights & Biases）: 用于监控和可视化训练过程。

4. 配置 WandB 登录并初始化训练任务

import wandb

wandb.login(key="70de2d8c11b25ebc5eea1f8f09f3a51112a3ca7f")  # 填入你的 WandB API key
run = wandb.init(
    project='my fint-tune on deepseek r1 with medical data',
    job_type="training",
    anonymous="allow"
)

手动填入 API Key 进行 WandB 登录。
初始化 WandB 训练任务:
- project='my fint-tune on deepseek r1 with medical data'：设定项目名称，表示是 DeepSeek R1 的医疗数据微调。
- job_type="training"：设定任务类型为训练。
- anonymous="allow"：允许匿名使用。

5. 加载本地模型

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "/root/autodl-tmp", # 这里改成你本地模型
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

from_pretrained(): 加载预训练模型。
模型路径:
- "/root/autodl-tmp"：这里指向一个 本地存储路径，假设你已经提前下载了 DeepSeek R1 的权重文件。
参数说明:
- max_seq_length=max_seq_length：保持 2048 作为最大序列长度。
- load_in_4bit=True：启用 4-bit 量化，减少显存占用。

6. 继续解析完整代码

为了更完整地理解 Notebook，我们需要分析更多代码单元。我将进一步提取剩余部分代码并解析。

代码解析（续）

6. 设定 Prompt 结构

prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
{}

### Response:
<think>{}"""

解释

这里定义了一个 Prompt 模板，用于对模型输入进行格式化。
采用

链式思考（Chain of Thoughts, CoT）

方式，让模型在回答前先进行逻辑推理：
- Instruction 设定任务背景，让模型扮演 医学专家。
- Question 是实际输入的问题。
- Response 让模型先思考（<think>{}），然后再作答。

7. 示例推理（Inference）

question = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"

FastLanguageModel.for_inference(model)  # Unsloth has 2x faster inference!
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)

response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])

解释

示例医学问题：
- 题目描述了一位 61 岁女性的尿失禁病史，并要求模型预测 膀胱测压检查（cystometry） 的结果。
FastLanguageModel.for_inference(model)：
- 让模型进入推理模式，提升 2 倍推理速度。
格式化输入：
- prompt_style.format(question, "") 生成完整的 Prompt 结构。
- tokenizer(...) 进行 Token 化并转换为张量（PyTorch 格式）。
- .to("cuda") 将数据移动到 GPU 上进行计算。
模型生成答案：
- max_new_tokens=1200 限制输出的最大 Token 数。
- use_cache=True 允许缓存计算结果，提高推理效率。
解码输出并打印答案：
- response[0].split("### Response:")[1] 提取 模型回答部分。

8. 使用 LoRA 进行微调（参数高效微调）

model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long context
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

解释

使用 LoRA（Low-Rank Adaptation）进行高效参数微调，减少显存占用：
- r=16: LoRA 低秩分解的秩值，值越大代表越多参数需要训练。
- target_modules: 选择要进行 LoRA 调整的 Transformer 模块。
- lora_alpha=16: 缩放因子，用于调整 LoRA 层的影响力。
- lora_dropout=0: 关闭 Dropout，提高稳定性。
- bias="none": 不对偏置参数进行 LoRA 训练。
- use_gradient_checkpointing=“unsloth”:
  - 梯度检查点（Gradient Checkpointing）减少显存占用，适用于 长序列训练。
- random_state=3407: 设定随机种子，确保可复现。
- use_rslora=False: 不启用 RSLora（一种新的优化方案）。
- loftq_config=None: 不使用 LoFTQ（量化微调）。

9. 训练数据格式化

train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
{}

### Response:
<think>
{}
</think>
{}"""

解释

训练数据的 Prompt 模板：
- 和推理 Prompt 结构相同，但在 Response部分增加了
  
  标签：
  - <think>{}: 让模型先生成 思考步骤（CoT 推理链）。
  - {}: 让模型生成最终回答。

10. 格式化数据集（适配 LoRA 训练）

EOS_TOKEN = tokenizer.eos_token  # Must add EOS_TOKEN

def formatting_prompts_func(examples):
    inputs = examples["Question"]
    cots = examples["Complex_CoT"]
    outputs = examples["Response"]
    texts = []
    for input, cot, output in zip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }

解释

EOS_TOKEN = tokenizer.eos_token:
- EOS_TOKEN（结束标记）必须加在每个样本的末尾，以正确结束训练数据。
formatting_prompts_func():
- 处理输入数据集，生成符合 LoRA 训练要求的文本格式：
  - inputs: 问题（Question）。
  - cots: 复杂思维链（Complex Chain of Thought）。
  - outputs: 最终回答（Response）。
- 格式化：
  - train_prompt_style.format(input, cot, output) + EOS_TOKEN
  - 让每个样本都遵循 Prompt 结构，并加入 <think> 标记。

总结

这个 Notebook 主要完成： ✅ 加载 DeepSeek R1-8B 本地模型（使用 unsloth 高效加载）。
✅ 推理（Inference）：采用 医学专家 Prompt，使用 Chain of Thought（CoT） 增强推理能力。
✅ LoRA 高效微调：设定 目标模块，减少显存占用，提高训练效率。
✅ 数据格式化：将 医学问答 数据集转换为适用于 LoRA 训练的格式。