【大模型进阶】第一课模型微调实战-从环境准备到训练-优快云博客

本文链接：https://blog.youkuaiyun.com/spark_dev/article/details/148353931

这里写自定义目录标题

1. 环境准备
2. 使用autodl服务器的环境
- 2.1 安装依赖
- 2.2 训练情况监控
3. 训练
3.1 使用llama-factory训练
- 3.12 清理cuda缓存

1. 环境准备

1.1 服务器准备

服务器的内核版本：找5.15等于或以上的
https://www.autodl.com/ 购买服务器

root@autodl-container-5054488bf2-a49617c2:~# uname -r
5.15.0-124-generic

在魔搭社区的服务区器，竟然是4.19版本，不可用。最后去autodl采购

Detected kernel version 4.19.91, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
TypeError: expected str, bytes or os.PathLike object, not NoneTyp

1.2 安装环境

root@autodl-container-5054488bf2-a49617c2:~# nvidia-smi
Sat May 31 17:47:28 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.04             Driver Version: 570.124.04     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3080        On  |   00000000:98:00.0 Off |                  N/A |
|  0%   24C    P8             13W /  320W |       1MiB /  20480MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

1.2.1 安装conda环境

# 创建 Conda 环境
!conda create -n deepseek python=3.10

# 激活 Conda 环境
!conda activate deepseek

# 安装 ipykernel，使 jupyter能够识别环境
%pip install ipykernel

# 安装 modelscope，用于下载模型
%pip install modelscope

1.2.2 安装pytorch

安装 pytorch ，先查询系统的 CUDA 版本，pytorch 的 CUDA 不能超过当前的 CUDA 版本。

在 pytorch 官网查询历史版本的安装命令。（https://pytorch.org/get-started/previous-versions/）

CUDA Version: 12.8
在这里插入图片描述

1.2.3 安装torch

pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126 （老师的是2.4.1版本）
pip install triton==3.0.0 transformers==4.46.3 safetensors==0.4.5 accelerate==1.4.0

1.2.4 conda环境被jupyter感知

把conda的环境，注册到全局，被感知到

python -m ipykernel install --user --name deepseek --display-name “deepseek”
在这里插入图片描述

在这里插入图片描述

2. 使用autodl服务器的环境

与第一章的conda环境，互斥，2选1

2.1 安装依赖

pip install transformers
pip install accelerate
pip install jsonlines
pip install openpyxl
pip install datasets

# 导入所需的类，AutoModelForCausalLM 用于加载因果语言模型，AutoTokenizer 用于文本词元化
from transformers import AutoModelForCausalLM, AutoTokenizer

# 指定模型的路径
model_path = "model/DeepSeek-R1-Distill-Qwen-1.5B"




# torch_dtype="auto" 表示自动选择合适的数据类型
# device_map="auto" 表示自动将模型分配到可用的设备上
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype="auto", # float32 float16 bfloat16
    device_map="auto"
)


# 将模型设为评估模式，节省显存
model.eval()

# 加载与模型对应的分词器
tokenizer = AutoTokenizer.from_pretrained(model_path)

# 查看模型所在的设备和精度
model.device, model.dtype

在这里插入图片描述
device_map=“auto” ，把这行代码注释掉，就OK

2.2 训练情况监控

nvitop：实时监控显卡使用情况的强大工具
nvitop 是一个基于 Python 的命令行工具，专门用于实时监控 NVIDIA 显卡的使用情况。它提供了类似 top 命令的交互界面，能直观展示 GPU 利用率、内存占用、进程信息等关键指标，是深度学习训练和 GPU 管理的必备工具。

pip install nvitop
nvitop

在这里插入图片描述

3. 训练

3.1 使用llama-factory训练

准备代码

git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git

安装需要的模块，可以训练时，遇到需要说明模块就安装对应模块

pip install peft
pip install trl
pip install omegaconf

默认安装的trl版本太高，训练时会报错，安装报错日志提示，降低版本

pip install trl==0.9.6

开始训练


python -u src/train.py \
    --stage sft \
    --model_name_or_path /ai/data/Qwen2.5-0.5B-Instruct \
    --do_train \
    --dataset med_dialog,med_norm,med_mrg \
    --template qwen \
    --finetuning_type full \
    --output_dir /ai/data/trained_models/Qwen2.5-0.5B_med_xxzh2 \
    --overwrite_cache \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_strategy no \
    --learning_rate 1e-4 \
    --num_train_epochs 2.0 \
    --plot_loss \
    --preprocessing_num_workers  16 \
    --bf16 \
    --cutoff_len 4000

–finetuning_type full 也可以改为lora

内存溢出


[INFO|2025-05-31 22:47:27] llamafactory.model.adapter:143 >> Fine-tuning method: Full
Traceback (most recent call last):
  File "/root/llama/LLaMA-Factory/src/train.py", line 28, in <module>
    main()
  File "/root/llama/LLaMA-Factory/src/train.py", line 19, in main
    run_exp()
  File "/root/llama/LLaMA-Factory/src/llamafactory/train/tuner.py", line 110, in run_exp
    _training_function(config={"args": args, "callbacks": callbacks})
  File "/root/llama/LLaMA-Factory/src/llamafactory/train/tuner.py", line 72, in _training_function
    run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/root/llama/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 52, in run_sft
    model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/llama/LLaMA-Factory/src/llamafactory/model/loader.py", line 178, in load_model
    model = init_adapter(config, model, model_args, finetuning_args, is_trainable)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/llama/LLaMA-Factory/src/llamafactory/model/adapter.py", line 296, in init_adapter
    _setup_full_tuning(model, finetuning_args, is_trainable, cast_trainable_params_to_fp32)
  File "/root/llama/LLaMA-Factory/src/llamafactory/model/adapter.py", line 52, in _setup_full_tuning
    param.data = param.data.to(torch.float32)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 520.00 MiB. GPU 0 has a total capacity of 19.59 GiB of which 516.56 MiB is free. Process 8303 has 17.90 GiB memory in use. Including non-PyTorch memory, this process has 1.18 GiB memory in use. Of the allocated memory 950.17 MiB is allocated by PyTorch, and 41.83 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

3.12 清理cuda缓存

import torch
import gc

# 删除张量/模型引用
# del model, inputs, outputs

# 强制垃圾回收
gc.collect()

# 释放CUDA缓存
torch.cuda.empty_cache()

清理后效果：
在这里插入图片描述

训练过程：

20G的GPU,不够哈

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[3], line 2
      1 model_1 = AutoModelForCausalLM.from_pretrained(model_path_1, torch_dtype = torch.bfloat16, device_map = 'auto')
----> 2 model_2 = AutoModelForCausalLM.from_pretrained(model_path_2, torch_dtype = torch.bfloat16, device_map = 'auto')
      3 assert tokenizer.padding_side == 'left'
      4 print('模型加载完成')

File ~/miniconda3/lib/python3.12/site-packages/transformers/models/auto/auto_factory.py:531, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    528 if kwargs.get("quantization_config", None) is not None:
    529     _ = kwargs.pop("quantization_config")
--> 531 config, kwargs = AutoConfig.from_pretrained(
    532     pretrained_model_name_or_path,
    533     return_unused_kwargs=True,
    534     code_revision=code_revision,
    535     _commit_hash=commit_hash,
    536     **hub_kwargs,
    537     **kwargs,
    538 )
    540 # if torch_dtype=auto was passed here, ensure to pass it on
    541 if kwargs_orig.get("torch_dtype", None) == "auto":

File ~/miniconda3/lib/python3.12/site-packages/transformers/models/auto/configuration_auto.py:1190, in AutoConfig.from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
   1187         if pattern in str(pretrained_model_name_or_path):
   1188             return CONFIG_MAPPING[pattern].from_dict(config_dict, **unused_kwargs)
-> 1190 raise ValueError(
   1191     f"Unrecognized model in {pretrained_model_name_or_path}. "
   1192     f"Should have a `model_type` key in its {CONFIG_NAME}, or contain one of the following strings "
   1193     f"in its name: {', '.join(CONFIG_MAPPING.keys())}"
   1194 )

ValueError: Unrecognized model in /ai/data/trained_models/Qwen2.5-0.5B_med_xxzh2. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, aria, aria_text,

 f"Unrecognized model in {pretrained_model_name_or_path}. "
   1192     f"Should have a `model_type` key in its {CONFIG_NAME}, or contain one of the following strings "
   1193     f"in its name: {', '.join(CONFIG_MAPPING.keys())}"