1.克隆git上的llamafactory项目
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
2.进入目录,安装依赖
cd LLaMA-Factory
pip install -e ".[torch,metrics]"
#如果有依赖冲突 pip install --no-deps -e .
3.对项目内置的数据集微调,进行训练
llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
对应的配置yaml文件,对应的qwen模型自行下载到本地文件夹
(llamafactory) root@LAPTOP-QHT8D44R:/plf/llamafactory/LLaMA-Factory/examples/train_lora# cat llama3_lora_sft.yaml
### model
model_name_or_path: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B
#模型地址自行下载到本地
trust_remote_code: true
### method
stage: sft
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all
### dataset
dataset: identity,alpaca_en_demo
#对应数据集的名称
template: qwen
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4
### output
output_dir: saves/Qwen3-0.6B/lora/sft
#lora模型保存的位置
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null
### eval
# eval_dataset: alpaca_en_demo
# val_size: 0.1
# per_device_eval_batch_size: 1
# eval_strategy: steps
# eval_steps: 500
4.对微调的lora模型进行推理
llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
推理的配置文件
cat inference/llama3_lora_sft.yaml
model_name_or_path: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B
#基座模型的位置
adapter_name_or_path: saves/Qwen3-0.6B/lora/sft
#lora模型的位置
template: qwen
infer_backend: huggingface # choices: [huggingface, vllm, sglang]
trust_remote_code: true
5.将微调的lora模型+基座模型进行合并导出
llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
### Note: DO NOT use quantized model or quantization_bit when merging lora adapters
### model
model_name_or_path: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B
adapter_name_or_path: saves/Qwen3-0.6B/lora/sft
template: qwen
trust_remote_code: true
### export
export_dir: output/qwen3_lora_sft
export_size: 5
export_device: cpu # choices: [cpu, auto]
export_legacy_format: false
6.对合并后的大模型进行推理
llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
更改模型的位置,使用合并后的模型路径
#model_name_or_path: /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B
#基座模型的位置
model_name_or_path: output/qwen3_lora_sft
#lora+基座合并模型位置
#adapter_name_or_path: saves/Qwen3-0.6B/lora/sft
#lora模型的位置
template: qwen
infer_backend: huggingface # choices: [huggingface, vllm, sglang]
trust_remote_code: true
永久设置清华园 pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple/
7.将微调的lora模型+基座模型+gptq量化为4位 进行合并导出
llamafactory-cli export examples/merge_lora/llama3_gptq.yaml
配置文件
### model
#model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
#量化既可以直接根据基座模型进行,也可以基于微调后的模型(无论是否合并基座模型)进行
model_name_or_path: /mnt/workspace/LLaMA-Factory/output/llama3_lora_sft
template: llama3
trust_remote_code: true
### export
export_dir: output/llama3_gptq
export_quantization_bit: 4
export_quantization_dataset: data/c4_demo.jsonl
export_size: 5
export_device: cpu # choices: [cpu, auto]
export_legacy_format: false
8.通用能力评估
llamafactory-cli eval examples/train_lora/llama3_lora_eval.yaml
评估配置文件
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
adapter_name_or_path: saves/llama3-8b/lora/sft
trust_remote_code: true
### method
finetuning_type: lora
### dataset
task: mmlu_test # choices: [mmlu_test, ceval_validation, cmmlu_test]
template: fewshot
lang: en
n_shot: 5
### output
save_dir: saves/llama3-8b/lora/eval
### eval
batch_size: 4
9.NLG评估
llamafactory-cli train examples/extras/nlg_eval/llama3_lora_predict.yaml
配置文件
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
adapter_name_or_path: saves/llama3-8b/lora/sft
trust_remote_code: true
### method
stage: sft
do_predict: true
finetuning_type: lora
### dataset
eval_dataset: identity,alpaca_en_demo
template: llama3
cutoff_len: 2048
max_samples: 50
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4
### output
output_dir: saves/llama3-8b/lora/predict
overwrite_output_dir: true
report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
### eval
per_device_eval_batch_size: 1
predict_with_generate: true
ddp_timeout: 180000000
10.单机多卡
启动DDP引擎,并行训练
FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/llama3_full_sft_ds3.yaml
11. 实验监控
llamaboard,swanlab,tensorboard,wandb
pip install swanlab
pip install swanlab[dashboard]
swanlab watch /mnt/workspace/LLaMA-Factory/swanlog 启动命令
7.webui 启动命令
CUDA_VISIBLE_DEVICES=0 GRADIO_SHARE=1 GRADIO_SERVER_PORT=7860 llamafactory-cli webui
8.使用api,并编写python脚本进行批量推理
API_PORT=8000 CUDA_VISIBLE_DEVICES=0 llamafactory-cli api examples/inference/llama3_lora_sft.yaml
9.使用vllm加速推理
python scripts/vllm_infer.py --model_name_or_path /mnt/workspace/LLaMA-Factory/output/llama3_lora_sft --dataset alpaca_en_demo
297

被折叠的 条评论
为什么被折叠?



