ModelScope/SWIFT框架快速入门指南-优快云博客

本文链接：https://blog.youkuaiyun.com/gitblog_00295/article/details/148442998

ModelScope/SWIFT框架快速入门指南

swift 魔搭大模型训练推理工具箱，支持LLaMA、千问、ChatGLM、BaiChuan等多种模型及LoRA等多种训练方式(The LLM training/inference framework of ModelScope community, Support various models like LLaMA, Qwen, Baichuan, ChatGLM and others, and training methods like LoRA, ResTuning, NEFTune, etc.) 项目地址: https://gitcode.com/gh_mirrors/swift1/swift

框架概述

ModelScope/SWIFT（简称SWIFT）是一个专为大型语言模型(LLM)和多模态大模型(MLLM)设计的全流程训练与部署框架。该框架由ModelScope社区提供，集成了从模型训练到实际部署的全套解决方案。

核心特性

模型支持能力

支持500+纯文本大模型和200+多模态大模型的训练与部署
涵盖All-to-All多模态模型、序列分类模型和嵌入模型
支持完整的训练流程：预训练(CPT)、微调(SFT)、人类反馈强化学习(RLHF)

训练技术栈

轻量化训练：提供LoRA、QLoRA、DoRA等10+种高效微调方法
分布式训练：支持DDP、DeepSpeed ZeRO2/3、FSDP等多种分布式方案
量化训练：集成BNB、AWQ、GPTQ等主流量化技术
多模态训练：支持图像、视频、音频等多种模态的联合训练

部署与评估

推理加速：兼容vLLM、LmDeploy等高性能推理引擎
模型评估：内置100+评估数据集，支持全面模型性能测试
量化部署：支持AWQ、GPTQ等量化格式的模型导出

环境安装

SWIFT框架支持多种硬件环境，包括：

NVIDIA显卡（RTX系列、T4/V100、A10/A100/H100）
Ascend NPU
Apple M系列芯片（MPS）
普通CPU环境

建议使用Python 3.8+环境，通过pip安装最新版本。

实战案例：Qwen2.5-7B-Instruct微调

训练配置

以下是在单张3090显卡（24GB显存）上进行自认知微调的完整命令：

CUDA_VISIBLE_DEVICES=0 \
swift sft \
    --model Qwen/Qwen2.5-7B-Instruct \
    --train_type lora \
    --dataset 'AI-ModelScope/alpaca-gpt4-data-zh#500' \
              'AI-ModelScope/alpaca-gpt4-data-en#500' \
              'swift/self-cognition#500' \
    --torch_dtype bfloat16 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --learning_rate 1e-4 \
    --lora_rank 8 \
    --lora_alpha 32 \
    --target_modules all-linear \
    --gradient_accumulation_steps 16 \
    --eval_steps 50 \
    --save_steps 50 \
    --save_total_limit 2 \
    --logging_steps 5 \
    --max_length 2048 \
    --output_dir output \
    --system 'You are a helpful assistant.' \
    --warmup_ratio 0.05 \
    --dataloader_num_workers 4 \
    --model_author swift \
    --model_name swift-robot

关键参数说明：

--train_type lora：使用LoRA轻量化微调方法
--gradient_accumulation_steps 16：通过梯度累积解决显存不足问题
--lora_rank 8：设置LoRA矩阵的秩为8
--model_author和--model_name：定义模型的自认知信息

推理部署

训练完成后，可使用以下命令进行模型推理：

# 基础推理模式
CUDA_VISIBLE_DEVICES=0 \
swift infer \
    --adapters output/vx-xxx/checkpoint-xxx \
    --stream true \
    --temperature 0 \
    --max_new_tokens 2048

# 合并LoRA权重并使用vLLM加速
CUDA_VISIBLE_DEVICES=0 \
swift infer \
    --adapters output/vx-xxx/checkpoint-xxx \
    --stream true \
    --merge_lora true \
    --infer_backend vllm \
    --max_model_len 8192 \
    --temperature 0 \
    --max_new_tokens 2048

模型发布

将训练好的模型发布到ModelScope平台：

CUDA_VISIBLE_DEVICES=0 \
swift export \
    --adapters output/vx-xxx/checkpoint-xxx \
    --push_to_hub true \
    --hub_model_id '<your-model-id>' \
    --hub_token '<your-sdk-token>'