XTuner 模型微调，打造个人小助手

NoemPol

已于 2024-12-31 11:52:02 修改

阅读量964

点赞数 24

文章标签：人工智能 python github 开源

于 2024-12-18 11:36:08 首次发布

本文链接：https://blog.youkuaiyun.com/NoemPol/article/details/144557090

版权

XTuner 微调实践微调

在这里插入图片描述

本文档将介绍 InternLM 个人小助手认知

写在前面

微调内容需要使用 30% A100 才能完成。
本次实战营的微调内容包括了以下两个部分：

SFT 数据的获取
使用 InternLM2.5-7B-Chat 模型微调

这节课你会收获：

针对业务场景（如特殊自我认知的机器人）的微调能力
一个属于自己的语言聊天机器人

XTuner 文档链接：XTuner-doc-cn

环境配置与数据准备

本节中，我们将演示如何安装 XTuner。
推荐使用 Python-3.10 的 conda 虚拟环境安装 XTuner。

步骤 0. 使用 conda 先构建一个 Python-3.10 的虚拟环境

cd ~
#git clone 本repo
git clone https://github.com/InternLM/Tutorial.git -b camp4
mkdir -p /root/finetune && cd /root/finetune
conda create -n xtuner-env python=3.10 -y
conda activate xtuner-env

步骤 1. 安装 XTuner

此处推荐用我 freeze 的 requirements.txt，更多的安装方法请回到前面看 XTuner 文档

cd /root/Tutorial/docs/L1/XTuner
pip install -r requirements.txt

如果安装过程出现错误，请参考以下解决方案： > WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1007)'))': /pypi/simple/bitsandbytes/

Could not fetch URL https://mirrors.aliyun.com/pypi/simple/bitsandbytes/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host=‘mirrors.aliyun.com’, port=443): Max retries exceeded with url: /pypi/simple/bitsandbytes/ (Caused by SSLError(SSLCertVerificationError(1, ‘[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1007)’))) - skipping

INFO: pip is looking at multiple versions of xtuner to determine which version is compatible with other requirements. This could take a while.

ERROR: Could not find a version that satisfies the requirement bitsandbytes>=0.40.0.post4 (from xtuner) (from versions: none)，可以 Ctrl + C 退出后换成 pip install --trusted-host mirrors.aliyun.com -e '.[deepspeed]' -i https://mirrors.aliyun.com/pypi/simple/

验证安装

为了验证 XTuner 是否安装正确，我们将使用命令打印配置文件。

打印配置文件： 在命令行中使用 xtuner list-cfg 验证是否能打印配置文件列表。

xtuner list-cfg

输出没有报错则为此结果

xtuner list-cfg
CONFIGS=
baichuan2_13b_base_full_custom_pretrain_e1
baichuan2_13b_base_qlora_alpaca_e3
baichuan2_13b_base_qlora_alpaca_enzh_e3
baichuan2_13b_base_qlora_alpaca_enzh_oasst1_e3
…
internlm2_1_8b_full_alpaca_e3
internlm2_1_8b_full_custom_pretrain_e1
internlm2_1_8b_qlora_alpaca_e3
internlm2_20b_full_custom_pretrain_e1
internlm2_20b_full_finetune_custom_dataset_e1
internlm2_20b_qlora_alpaca_e3
internlm2_20b_qlora_arxiv_gentitle_e3
internlm2_20b_qlora_code_alpaca_e3
internlm2_20b_qlora_colorist_e5
internlm2_20b_qlora_lawyer_e3
internlm2_20b_qlora_msagent_react_e3_gpu8
internlm2_20b_qlora_oasst1_512_e3
internlm2_20b_qlora_oasst1_e3
internlm2_20b_qlora_sql_e3
internlm2_5_chat_20b_alpaca_e3
internlm2_5_chat_20b_qlora_alpaca_e3
internlm2_5_chat_7b_full_finetune_custom_dataset_e1
internlm2_5_chat_7b_qlora_alpaca_e3
internlm2_5_chat_7b_qlora_oasst1_e3
internlm2_7b_full_custom_pretrain_e1
internlm2_7b_full_finetune_custom_dataset_e1
internlm2_7b_full_finetune_custom_dataset_e1_sequence_parallel_4
internlm2_7b_qlora_alpaca_e3
internlm2_7b_qlora_arxiv_gentitle_e3
internlm2_7b_qlora_code_alpaca_e3
internlm2_7b_qlora_colorist_e5
internlm2_7b_qlora_json_e3
internlm2_7b_qlora_lawyer_e3
internlm2_7b_qlora_msagent_react_e3_gpu8
internlm2_7b_qlora_oasst1_512_e3
internlm2_7b_qlora_oasst1_e3
internlm2_7b_qlora_sql_e3
…

输出内容为 XTuner 支持微调的模型

修改提供的数据

步骤 0. 创建一个新的文件夹用于存储微调数据

mkdir -p /root/finetune/data && cd /root/finetune/data
cp -r /root/Tutorial/data/assistant_Tuner.jsonl  /root/finetune/data

此时 `finetune` 文件夹下应该有如下结构

finetune
├── data
│   └── assistant_Tuner.jsonl
└── xtuner

步骤 1. 创建修改脚本

我们写一个脚本生成修改我们需要的微调训练数据，在当前目录下创建一个 change_script.py 文件，内容如下：

# 创建 `change_script.py` 文件
touch /root/finetune/data/change_script.py

打开该change_script.py文件后将下面的内容复制进去。</

最低0.47元/天解锁文章