(vlm) face8@jamesdeMac-Studio vlm % python train_vlm\ copy.py
✅ MPS设备可用
🛠️ 系统配置:
- 设备: mps
- 内存状态: 256.00GB 总计
- 训练参数: ScriptArguments(train_path='train.jsonl', valid_path='valid.jsonl', model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', output_dir='./output_lora_qwen25vl_instruct', per_device_train_batch_size=1, gradient_accumulation_steps=4, num_train_epochs=3, logging_steps=5, save_steps=100, eval_steps=100, image_size=672, learning_rate=2e-05, warmup_steps=50, weight_decay=0.01, lora_rank=16, lora_alpha=32, lora_dropout=0.05, fp16=False, bf16=False, max_steps=-1, gradient_checkpointing=True, seed=42, report_to='none', enable_mps_fallback=True, debug_mode=True, max_retries=3, patch_image_size=32, ignore_video_inputs=True)
Loading checkpoint shards: 100%|███| 5/5 [00:05<00:00, 1.02s/it]
🔧 已将模型移动到MPS设备
✅ 已启用梯度检查点
✅ 已应用模型补丁: 自动计算image_grid_thw
trainable params: 35,090,432 || all params: 8,324,397,056 || trainable%: 0.4215
✅ LoRA配置加载完成
✅ 数据集加载完成: 路径=train.jsonl, 总行数=934, 有效样本=934
✅ 数据集加载完成: 路径=valid.jsonl, 总行数=104, 有效样本=104
✅ 数据集准备完成
/Users/face8/works/vlm/train_vlm copy.py:482: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
trainer = Trainer(
✅ Trainer创建完成
🔍 训练前环境检查:
🔍 收到 2 个样本
样本0: 图像=images/00316.jpg, 指令长度=141, 回答长度=4
样本1: 图像=images/00653.jpg, 指令长度=141, 回答长度=9
📊 批处理完成: 图像=2, 输入ID形状=torch.Size([2, 1024]), 图像网格=(1, 21, 21)
✅ 环境检查通过
🚀 开始训练...
训练前内存使用: 总计=256.00GB, 已用=109.31GB, 可用=145.77GB
Currently training with a batch size of: 1
***** Running training *****
Num examples = 934
Num Epochs = 3
Instantaneous batch size per device = 1
Total train batch size (w. parallel, distributed & accumulation) = 4
Gradient Accumulation steps = 4
Total optimization steps = 702
Number of trainable parameters = 35,090,432
🚀 训练开始
训练开始时内存使用: 总计=256.00GB, 已用=109.25GB, 可用=145.83GB
🔄 开始第 0 轮训练
/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/torch/utils/data/dataloader.py:683: UserWarning: 'pin_memory' argument is set as true but not supported on MPS now, then device pinned memory won't be used.
warnings.warn(warn_msg)
🔍 收到 1 个样本
样本0: 图像=images/00722.jpg, 指令长度=141, 回答长度=6
📊 批处理完成: 图像=1, 输入ID形状=torch.Size([1, 1024]), 图像网格=(1, 21, 21)
🔍 收到 1 个样本
样本0: 图像=images/00689.jpg, 指令长度=141, 回答长度=3
📊 批处理完成: 图像=1, 输入ID形状=torch.Size([1, 1024]), 图像网格=(1, 21, 21)
🔍 收到 1 个样本
样本0: 图像=images/00458.jpg, 指令长度=141, 回答长度=18
📊 批处理完成: 图像=1, 输入ID形状=torch.Size([1, 1024]), 图像网格=(1, 21, 21)
🔍 收到 1 个样本
样本0: 图像=images/00915.jpg, 指令长度=141, 回答长度=12
📊 批处理完成: 图像=1, 输入ID形状=torch.Size([1, 1024]), 图像网格=(1, 21, 21)
🔍 收到 1 个样本
样本0: 图像=images/00161.jpg, 指令长度=141, 回答长度=12
📊 批处理完成: 图像=1, 输入ID形状=torch.Size([1, 1024]), 图像网格=(1, 21, 21)
❌ 训练过程中出错: forward() got multiple values for argument 'input_ids'
❌ 主程序异常终止: forward() got multiple values for argument 'input_ids'
Traceback (most recent call last):
File "/Users/face8/works/vlm/train_vlm copy.py", line 521, in <module>
main()
File "/Users/face8/works/vlm/train_vlm copy.py", line 508, in main
trainer.train()
File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/transformers/trainer.py", line 2207, in train
return inner_training_loop(
File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/transformers/trainer.py", line 2549, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/transformers/trainer.py", line 3750, in training_step
loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/transformers/trainer.py", line 3837, in compute_loss
outputs = model(**inputs)
File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/peft/peft_model.py", line 1757, in forward
return self.base_model(
File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/peft/tuners/tuners_utils.py", line 193, in forward
return self.model.forward(*args, **kwargs)
File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/transformers/utils/generic.py", line 943, in wrapper
output = func(self, *args, **kwargs)
File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 1487, in forward
outputs = self.model(
File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/face8/works/vlm/train_vlm copy.py", line 127, in patched_forward
return original_forward(
TypeError: forward() got multiple values for argument 'input_ids'