建议微调初学者们不要用web-ui训练

原创已于 2025-04-05 10:24:38 修改 · 1.2k 阅读

14 ·

CC 4.0 BY-SA版权

文章标签：

#前端 #ui #python #numpy #人工智能 #语言模型

于 2024-12-12 16:38:09 首次发布

部署运行你感兴趣的模型镜像

血的教训家人们

折磨了3天终于把模型训练起来了

如果你遇到这些问题（对于未启用这些功能摸不着头脑的初学者来说）：

1.CUDA启动失败，但GPU可用(CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment!
If you cannot find any issues and suspect a bug, please open an issue with detals about your environment:)
.FileExistsError: [WinError 183] 当文件已存在时,无法创建该文件。
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 44.00 MiB. GPU 0 has a total capacity of 22.00 GiB of which 17.97 GiB is free. Of the allocated memory 2.78 GiB is allocated by PyTorch, and 61.81 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management ...
module ‘torch.library‘ has no attribute ‘register_fake‘
报错numpy必须<2.0.0
未检测到deepseepd（针对于初学者未启用的情况）
loss一直等于0
加载“\lib\site-packages\torch\lib\shm.dll”或其依赖项之一时出错
deepspeed安装报错 No module named ‘dskernels
没有经过微调的模型胡乱回答，乱码

如果你是初学者，对这些东西完全摸不着头脑，建议使用ms-swift的python代码进行训练！不要使用wei-ui！

最简示例：

# Experimental environment: RTX 2080 Ti 22G
# 22GB GPU memory
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

import torch

from swift.llm import (
    DatasetName, InferArguments, ModelType, SftArguments,
    infer_main, sft_main, app_ui_main, merge_lora_main
)

model_type = ModelType.qwen2_5_coder_7b_instruct # 改成你的模型的类型
sft_args = SftArguments(
    model_type=model_type,
    model_id_or_path="D:/LLaMA_Factory/Qwen/Qwen25Coder7B", # 改成你的本地模型路径
    train_dataset_sample=2000,
    dataset="D:/LLaMA_Factory/data/zh_INFJ_self_awareness.json", # 改成你的数据集路径
    output_dir='output')
result = sft_main(sft_args)
best_model_checkpoint = result['best_model_checkpoint']
print(f'best_model_checkpoint: {best_model_checkpoint}')
torch.cuda.empty_cache()

infer_args = InferArguments(
    ckpt_dir=best_model_checkpoint,
    load_dataset_config=True,
    show_dataset_sample=10)
# merge_lora_main(infer_args)
result = infer_main(infer_args)
torch.cuda.empty_cache()

app_ui_main(infer_args)

其实并没有想象中那么难对吧！

只需要更改这3个参数，小白都能把模型跑起来！

运行效果：