数据集格式与样例
数据集包含多模态信息,结构如下:
{
"messages": [
{
"content": "<video><audio>What is the video describing?",
"role": "user"
},
{
"content": "A girl who is drawing a picture of a guitar and feel nervous.",
"role": "assistant"
}
],
"videos": [
"mllm_demo_data/4.mp4"
],
"audios": [
"mllm_demo_data/4.mp3"
]
}
SFT微调流程
采用四卡微调配置,使用LLaMAFactory进行SFT微调,主要参数如下:
llamafactory-cli train \
--stage sft \
--do_train True \
--model_name_or_path /workspace/Qwen2___5-Omni-7B \
--preprocessing_num_workers 16 \
--finetuning_type lora \
--template qwen2_omni \
--flash_attn auto \
--dataset_dir data \
--dataset mllm_audio_demo,mllm_video_demo,mllm_demo \
--cutoff_len 2048 \
--learning_rate 5e-05 \
--num_train_epochs 10.0 \
--max_samples 100000 \
--per_device_train_batch_size 2 \
--gradient_accumulation_steps 8 \
--lr_scheduler_type cosine \
--max_grad_norm 1.0 \
--logging_steps 5 \
--save_steps 100 \
--warmup_steps 0 \
--packing False \
--enable_thinking True \
--report_to none \
--output_dir saves/Qwen2.5-Omni-7B/lora/train_2025-09-22-08-20-53 \
--fp16 True \
--plot_loss True \
--trust_remote_code True \
--ddp_timeout 180000000 \
--include_num_input_tokens_seen True \
--optim adamw_torch \
--lora_rank 8 \
--lora_alpha 16 \
--lora_dropout 0 \
--lora_target all \
--freeze_vision_tower True \
--freeze_multi_modal_projector True \
--image_max_pixels 589824 \
--image_min_pixels 1024 \
--video_max_pixels 65536 \
--video_min_pixels 256
关键参数说明
model_name_or_path:模型路径dataset_dir:自定义数据集路径output_dir:输出模型路径num_train_epochs:训练轮数model_max_length:模型序列长度(根据数据定义)per_device_train_batch_size:批处理大小save_steps:模型保存步数
模型参数合并流程
合并模型参数命令:
llamafactory-cli export examples/merge_lora/qwen2_5omni_lora_sft.yaml
合并配置文件样例:
### Note: DO NOT use quantized model or quantization_bit when merging lora adapters
### model
model_name_or_path: /workspace/Qwen2___5-Omni-7B
adapter_name_or_path: saves/Qwen2.5-Omni-7B/lora/train_2025-09-19-09-42-22
template: qwen2_omni
trust_remote_code: true
### export
export_dir: output/qwen2_5omni_lora_sft
export_size: 5
export_device: cpu # choices: [cpu, auto]
export_legacy_format: false
常见问题与解决方案
问题描述:
Qwen2.5-Omni Inference Error after Full-SFT: KeyError: ‘qwen2_5_omni_thinker’
原因分析:
微调后保存的是 omni.thinker,需要与原始模型合并 [thinker + talker] -> [omni]
解决方法:
参考 LLaMA-Factory Pull Request #7537
使用脚本合并:
python3 ./scripts/qwen_omni_merge.py merge_lora \
--base_model_path="/workspace/Qwen2___5-Omni-7B" \
--lora_checkpoint_path="/app/saves/Qwen2.5-Omni-7B/lora/train_2025-10-13-03-14-01" \
--save_path="output/qwen2_5omni_lora_sft"
VLLM推理兼容性调整
如需VLLM推理,将合并权重模型文件中的config.py中Qwen2_5OmniForConditionalGeneration修改为Qwen2_5OmniModel。
推理测试
CUDA_VISIBLE_DEVICES=4,5,6,7 llamafactory-cli webchat \
--model_name_or_path /app/output/qwen2_5omni_lora_sft_100 \
--template qwen2_omni \
--finetuning_type lora
3901

被折叠的 条评论
为什么被折叠?



