基于 swift 在BitaHub平台微调Llama3大模型

Swift微调Llama3实战指南

最新推荐文章于 2025-11-23 22:37:29 发布

原创最新推荐文章于 2025-11-23 22:37:29 发布 · 670 阅读

13 ·

CC 4.0 BY-SA版权

文章标签：

#swift #开发语言 #ios #AI编程 #人工智能 #大模型 #BitaHub

一.背景介绍

随着大语言模型（LLM）在自然语言处理领域的广泛应用，如何对这些通用预训练模型进行高效、低成本的微调，已成为众多开发者与研究者关注的核心问题。Meta 在 2024 年发布的 Llama3 模型，作为 Llama 系列的成员之一，不仅在对话连贯性、知识准确性、多语言能力和复杂指令理解等方面取得了显著进展，还极大地优化了模型推理效率与部署适配性，使其成为当前开源 LLM 中的佼佼者。

与此同时，面对实际应用场景中多样化、领域化的任务需求，参数高效微调（如 LoRA）技术正成为微调大模型的重要解决方案之一。它通过引入少量可训练参数，在不改变原模型主体的前提下实现性能迁移，显著降低了资源成本。

swift 是一款轻量级的大模型训练与微调工具，兼容多种主流训练框架，内置 LoRA、QLoRA等多种轻量化训练技术，支持对上百种主流开源模型进行微调。配合 BitaHub 平台提供的开发环境与算力支持，可以帮助开发者更加高效地完成模型训练流程。

本教程将围绕 Llama3 模型，在 BitaHub 平台上，使用 swift 工具库，通过 OpenO1-SFT 数据集演示完整的微调过程，涵盖任务创建、训练命令配置、LoRA 合并、推理验证等关键步骤，为读者提供一套清晰、实用的 LLM 微调实操指南。

二.项目步骤详解

1. 环境配置

首先，从镜像地址 https://hf-mirror.com/meta-llama/Meta-Llama-3-8B 拉取 Llama3-8B 预训练模型至本地，并将其挂载到 BitaHub 工作台的文件存储中。在BitaHub工作台创建开发环境，选择单卡4090GPU，并通过JupyterLab访问方式进入开发环境。

随后，在 JupyterLab 环境中运行以下命令安装 swift 框架，以便开始模型的微调流程：

!pip install 'ms-swift[llm]' -U!

我们使用 OpenO1-SFT 数据集，它由 OpenAI O1 模型生成，数据集旨在增强模型生成连贯和逻辑推理序列的能力。特别适用于数学类和逻辑推理任务。通过使用此数据集，模型可以学习生成详细和结构化的推理步骤，从而在复杂推理任务中表现更好。数据示例如下：

{"prompt": "下列生活用品所含的主要材料，属于有机合成材料的是（  ）\nA．铝合金门框\nB．瓷碗\nC．塑料盆\nD．玻璃钢", "response": "<Thought>\nAlright, let's tackle this question step by step.\n\nThe question is about identifying which of the listed household items is made from an organic synthetic material.\n\nLet's look at the options:\n\nA. **铝合金门框 (Aluminum alloy door frame)** – Aluminum alloy is a mixture of aluminum with other elements like silicon, copper, magnesium, etc. These are metals and are considered inorganic materials.\n\nB. **瓷碗 (Porcelain bowl)** – Porcelain is a ceramic material made by heating materials, generally including kaolin, in a kiln. It's an inorganic material.\n\nC. **塑料盆 (Plastic basin)** – Plastics are synthetic materials made from organic polymers. They are based on carbon-containing compounds, making them organic synthetic materials.\n\nD. **玻璃钢 (Fiberglass)** – Fiberglass is a composite material made from fine fibers of glass and a resin. The glass fibers themselves are inorganic, while the resin is a synthetic organic polymer. However, the primary material is inorganic.\n\nGiven this analysis, Option C, **塑料盆 (Plastic basin)**, is primarily made from an organic synthetic material – plastics.\n\nTherefore, the correct answer is C.\n</Thought>\n\n<Output>\nC．塑料盆\n</Output>"}

{"prompt": "A line segment begins at \((2, 5)\). It is 13 units long and ends at the point \((10, y)\) where \(y > 0\). What is the value of \(y\)?", "response": "<Thought>\nAlright, let's tackle this geometry problem step by step.\n\nWe are given:\n- A line segment starts at point \( A = (2, 5) \).\n- The length of the line segment is 13 units.\n- The line segment ends at point \( B = (10, y) \), where \( y > 0 \).\n\nOur goal is to find the value of \( y \).\n\nFirst, let's recall the distance formula between two points in a plane:\n\[\n\text{Distance} = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}\n\]\n\nApplying this formula to points \( A \) and \( B \):\n\[\n13 = \sqrt{(10 - 2)^2 + (y - 5)^2}\n\]\n\nSimplifying inside the square root:\n\[\n13 = \sqrt{(8)^2 + (y - 5)^2}\n\]\n\[\n13 = \sqrt{64 + (y - 5)^2}\n\]\n\nTo eliminate the square root, we'll square both sides of the equation:\n\[\n13^2 = 64 + (y - 5)^2\n\]\n\[\n169 = 64 + (y - 5)^2\n\]\n\nSubtract 64 from both sides to isolate the squared term:\n\[\n169 - 64 = (y - 5)^2\n\]\n\[\n105 = (y - 5)^2\n\]\n\nNow, to solve for \( y \), we'll take the square root of both sides:\n\[\n\sqrt{105} = y - 5 \quad \text{or} \quad -\sqrt{105} = y - 5\n\]\n\nAdding 5 to both sides for both scenarios:\n\[\ny = 5 + \sqrt{105}\n\]\n\[\ny = 5 - \sqrt{105}\n\]\n\nGiven that \( y > 0 \), we need to determine which of these solutions is valid.\n\nCalculate the approximate value of \( \sqrt{105} \):\n\[\n\sqrt{105} \approx 10.247\n\]\n\nSo,\n\[\ny = 5 + 10.247 \approx 15.247 \quad (\text{Valid since } y > 0)\n\]\n\[\ny = 5 - 10.247 \approx -5.247 \quad (\text{Invalid since } y > 0)\n\]\n\nTherefore, the valid solution is:\n\[\ny \approx 15.247\n\]\n\nAlternatively, expressing the exact value without approximation:\n\[\ny = 5 + \sqrt{105}\n\]\n</Thought>\n\n<Output>\n\( y = 5 + \sqrt{105} \) or approximately \( y \approx 15.247 \)\n</Output>"}

响应字段使用<Thought> </Thought>和<Output> </Output>分隔符来区分思考过程和最终答案。

2. 模型训练

训练模型需要通过一系列参数来精确控制训练过程。下面是本次微调 Llama3-8b 模型的完整训练命令：

CUDA_VISIBLE_DEVICES=0 \
swift sft \
    --model_type llama3-8b \
    --model_id_or_path /input/test/llama3-8b \
    --model_revision master \
    --sft_type lora \
    --tuner_backend peft \
    --template_type AUTO \
    --dtype AUTO \
    --output_dir output \
    --dataset llamafactory/OpenO1-SFT \
    --train_dataset_sample 10000 \
    --num_train_epochs 5 \
    --max_length 2048 \
    --check_dataset_strategy warning \
    --lora_rank 8 \
    --lora_alpha 32 \
    --lora_dropout_p 0.05 \
    --lora_target_modules ALL \
    --gradient_checkpointing true \
    --batch_size 1 \
    --weight_decay 0.1 \
    --learning_rate 1e-4 \
    --gradient_accumulation_steps 16 \
    --max_grad_norm 0.5 \
    --warmup_ratio 0.03 \
    --eval_steps 100 \
    --save_steps 100 \
    --save_total_limit 2 \
    --logging_steps 10

采样 1 万条样本进行训练，设置总训练轮次为 5。训练过程中启用了梯度累积、梯度检查点和权重衰减等机制，训练启动运行日志：

完成训练需要6个小时左右，如果需要全部数据训练建议使用多卡，节约时间。

3.合并 LoRA 微调权重

在完成微调训练后，LoRA 会以额外权重的形式单独保存在 checkpoint 中，而不是直接更新基础模型的参数。因此，如果你希望获得一个独立可部署的完整模型文件，就需要将 LoRA 权重与原始预训练模型进行合并。

swift 提供了便捷的命令来完成这个操作：

CUDA_VISIBLE_DEVICES=0 swift export \
  --ckpt_dir output/llama3-8b/v0-20241203-111328/checkpoint-1100 \
  --merge_lora true

4.模型推理

完成 LoRA 合并后，我们可以直接使用合并后的模型文件进行推理测试。相比未合并的方式，这种方式更适合部署上线，模型结构更加简洁，推理速度也会更快。

CUDA_VISIBLE_DEVICES=0 
swift infer \
  --model_type llama3-8b \
  --model_id_or_path 'output/llama3-8b/v0-20241203-111328/checkpoint-1100-merged'     \
  --max_new_tokens 2048  \   
  --temperature 0.1  \   
  --top_p 0.7    \ 
  --repetition_penalty 1.   \
  --do_sample true

max_new_tokens 2048：生成的最大 token 数量，适用于长文本输出。
temperature 0.1：控制生成文本的随机性，值越小输出越确定。
top_p 0.7：使用 nucleus sampling 截断概率分布，提升生成多样性。
repetition_penalty 1.0：防止模型重复输出相同内容。
do_sample true：启用采样生成，适用于多样性需求场景。

运行该命令后，你可以与微调后的 Llama3 模型进行交互，观察其在实际问答、数学推理等任务中的表现。我们以一个简单的英文拼写问题作为测试用例，检验微调后的 Llama3-8B 模型对基础语言任务的处理能力。

该测试可以验证模型是否具备基本的字符识别和语言理解能力，模型能正确回答，说明微调后的参数已有效应用，基础推理能力保持良好。

三.总结

本项目系统地演示了如何在 BitaHub 平台上，利用 swift 框架对 Meta 开源大模型 Llama3-8B 进行高效微调。我们介绍了开发环境的配置、微调训练命令的关键参数解释、LoRA 参数高效训练策略的应用，并展示了完整的训练、模型合并与推理流程。通过本项目，即使是初学者也能快速完成 LLM 微调全流程操作，并具备将其部署到下游任务的能力，为后续个性化训练、垂类知识注入或科研应用提供坚实基础。