llamafactory使用

原创已于 2025-11-03 20:58:10 修改 · 254 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#神经网络

于 2025-10-28 09:23:13 首次发布

部署运行你感兴趣的模型镜像

1.克隆git上的llamafactory项目

git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git

2.进入目录,安装依赖

cd LLaMA-Factory
pip install -e ".[torch,metrics]"
#如果有依赖冲突 pip install --no-deps -e .

3.对项目内置的数据集微调,进行训练

llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml

对应的配置yaml文件,对应的qwen模型自行下载到本地文件夹

(llamafactory) root@LAPTOP-QHT8D44R:/plf/llamafactory/LLaMA-Factory/examples/train_lora# cat llama3_lora_sft.yaml
### model
model_name_or_path: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B
        #模型地址自行下载到本地
trust_remote_code: true

### method
stage: sft
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all

### dataset
dataset: identity,alpaca_en_demo
       #对应数据集的名称
template: qwen
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4

### output
output_dir: saves/Qwen3-0.6B/lora/sft
      #lora模型保存的位置
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null

### eval
# eval_dataset: alpaca_en_demo
# val_size: 0.1
# per_device_eval_batch_size: 1
# eval_strategy: steps
# eval_steps: 500

4.对微调的lora模型进行推理

llamafactory-cli chat examples/inference/llama3_lora_sft.yaml

推理的配置文件

cat inference/llama3_lora_sft.yaml

model_name_or_path: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B
              #基座模型的位置
adapter_name_or_path: saves/Qwen3-0.6B/lora/sft
              #lora模型的位置
template: qwen
infer_backend: huggingface  # choices: [huggingface, vllm, sglang]
trust_remote_code: true

5.将微调的lora模型+基座模型进行合并导出

llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml

### Note: DO NOT use quantized model or quantization_bit when merging lora adapters

### model
model_name_or_path: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B
adapter_name_or_path: saves/Qwen3-0.6B/lora/sft
template: qwen
trust_remote_code: true

### export
export_dir: output/qwen3_lora_sft
export_size: 5
export_device: cpu  # choices: [cpu, auto]
export_legacy_format: false

6.对合并后的大模型进行推理

llamafactory-cli chat examples/inference/llama3_lora_sft.yaml

更改模型的位置,使用合并后的模型路径

#model_name_or_path: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B
              #基座模型的位置
model_name_or_path: output/qwen3_lora_sft
             #lora+基座合并模型位置
#adapter_name_or_path: saves/Qwen3-0.6B/lora/sft
              #lora模型的位置
template: qwen
infer_backend: huggingface  # choices: [huggingface, vllm, sglang]
trust_remote_code: true

永久设置清华园 pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple/

7.将微调的lora模型+基座模型+gptq量化为4位进行合并导出

llamafactory-cli export examples/merge_lora/llama3_gptq.yaml

配置文件

### model
#model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
#量化既可以直接根据基座模型进行，也可以基于微调后的模型（无论是否合并基座模型）进行
model_name_or_path: /mnt/workspace/LLaMA-Factory/output/llama3_lora_sft
template: llama3
trust_remote_code: true

### export
export_dir: output/llama3_gptq
export_quantization_bit: 4
export_quantization_dataset: data/c4_demo.jsonl
export_size: 5
export_device: cpu  # choices: [cpu, auto]
export_legacy_format: false

8.通用能力评估

llamafactory-cli eval examples/train_lora/llama3_lora_eval.yaml

评估配置文件

### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
adapter_name_or_path: saves/llama3-8b/lora/sft
trust_remote_code: true

### method
finetuning_type: lora

### dataset
task: mmlu_test  # choices: [mmlu_test, ceval_validation, cmmlu_test]
template: fewshot
lang: en
n_shot: 5

### output
save_dir: saves/llama3-8b/lora/eval

### eval
batch_size: 4

9.NLG评估

llamafactory-cli train examples/extras/nlg_eval/llama3_lora_predict.yaml

配置文件

### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
adapter_name_or_path: saves/llama3-8b/lora/sft
trust_remote_code: true

### method
stage: sft
do_predict: true
finetuning_type: lora

### dataset
eval_dataset: identity,alpaca_en_demo
template: llama3
cutoff_len: 2048
max_samples: 50
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4

### output
output_dir: saves/llama3-8b/lora/predict
overwrite_output_dir: true
report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]

### eval
per_device_eval_batch_size: 1
predict_with_generate: true
ddp_timeout: 180000000

10.单机多卡

启动DDP引擎，并行训练

FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/llama3_full_sft_ds3.yaml

11. 实验监控

llamaboard,swanlab,tensorboard,wandb

pip install swanlab
pip install swanlab[dashboard]

swanlab watch /mnt/workspace/LLaMA-Factory/swanlog 启动命令

7.webui 启动命令

CUDA_VISIBLE_DEVICES=0 GRADIO_SHARE=1 GRADIO_SERVER_PORT=7860 llamafactory-cli webui

8.使用api,并编写python脚本进行批量推理

API_PORT=8000 CUDA_VISIBLE_DEVICES=0 llamafactory-cli api examples/inference/llama3_lora_sft.yaml

9.使用vllm加速推理

python scripts/vllm_infer.py --model_name_or_path /mnt/workspace/LLaMA-Factory/output/llama3_lora_sft --dataset alpaca_en_demo

您可能感兴趣的与本文相关的镜像

Vllm-v0.11.0

Vllm

vLLM是伯克利大学LMSYS组织开源的大语言模型高速推理框架，旨在极大地提升实时场景下的语言模型服务的吞吐与内存使用效率。vLLM是一个快速且易于使用的库，用于 LLM 推理和服务，可以和HuggingFace 无缝集成。vLLM利用了全新的注意力算法「PagedAttention」，有效地管理注意力键和值