以上代码出现以下问题,告诉我停在了哪一步,并分析修改:(style_tune) C:\Users\28996\Desktop\AI\persona_contrastive_finetuning>python Contrastive_Training_LM.py
INFO:accelerate.utils.modeling:We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
trainable params: 1,572,864 || all params: 1,838,401,536 || trainable%: 0.0856
训练集样本示例: {'anchor_input_ids': [56568, 118919, 116122, 11319], 'positive_input_ids': [116122, 20412, 107340, 9370, 100357, 102323, 3837, 109202, 104078, 103975, 100675, 101940, 100912, 105054, 6313], 'negative_input_ids': [100323, 104307, 99245, 9370, 106059, 104060, 3837, 104530, 115604, 99329, 11319]}
验证集样本示例: {'anchor_input_ids': [56568, 118919, 116122, 11319], 'positive_input_ids': [116122, 20412, 107340, 9370, 100357, 102323, 3837, 109202, 104078, 103975, 100675, 101940, 100912, 105054, 6313], 'negative_input_ids': [100323, 104307, 99245, 9370, 106059, 104060, 3837, 104530, 115604, 99329, 11319]}
Trainer.tokenizer is now deprecated. You should use `Trainer.processing_class = processing_class` instead.
INFO:__main__:GPU内存使用: 已分配 2.93GB, 保留 4.13GB
可训练参数列表:
- base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.0.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.0.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.0.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.1.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.1.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.1.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.1.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.2.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.2.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.2.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.2.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.3.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.3.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.3.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.3.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.4.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.4.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.4.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.4.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.5.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.5.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.5.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.5.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.6.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.6.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.6.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.6.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.7.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.7.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.7.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.7.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.8.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.8.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.8.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.8.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.9.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.9.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.9.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.9.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.10.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.10.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.10.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.10.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.11.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.11.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.11.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.11.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.12.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.12.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.12.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.12.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.13.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.13.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.13.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.13.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.14.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.14.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.14.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.14.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.15.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.15.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.15.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.15.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.16.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.16.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.16.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.16.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.17.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.17.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.17.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.17.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.18.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.18.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.18.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.18.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.19.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.19.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.19.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.19.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.20.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.20.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.20.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.20.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.21.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.21.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.21.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.21.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.22.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.22.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.22.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.22.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.23.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.23.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.23.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.23.self_attn.v_proj.lora_B.default.weight
0%| | 0/3 [00:00<?, ?it/s]You're using a Qwen2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
INFO:__main__:GPU内存使用: 已分配 4.00GB, 保留 4.21GB
Could not estimate the number of tokens of the input, floating-point operations will not be computed
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
INFO:__main__:GPU内存使用: 已分配 4.02GB, 保留 4.22GB
33%|████████████████████████████ | 1/3 [00:03<00:06, 3.25s/it]Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
INFO:__main__:GPU内存使用: 已分配 4.01GB, 保留 4.25GB
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
INFO:__main__:GPU内存使用: 已分配 4.02GB, 保留 4.26GB
67%|████████████████████████████████████████████████████████ | 2/3 [00:06<00:02, 2.98s/it]Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
INFO:__main__:GPU内存使用: 已分配 4.01GB, 保留 4.25GB
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
INFO:__main__:GPU内存使用: 已分配 4.02GB, 保留 4.26GB
{'train_runtime': 9.034, 'train_samples_per_second': 0.664, 'train_steps_per_second': 0.332, 'train_loss': 1.0772175788879395, 'epoch': 3.0}
100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:09<00:00, 3.01s/it]
Traceback (most recent call last):
File "C:\Users\28996\Desktop\AI\persona_contrastive_finetuning\Contrastive_Training_LM.py", line 356, in <module>
eval_results = trainer.evaluate()
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\transformers\trainer.py", line 4076, in evaluate
output = eval_loop(
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\transformers\trainer.py", line 4270, in evaluation_loop
losses, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\transformers\trainer.py", line 4496, in prediction_step
outputs = model(**inputs)
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\accelerate\utils\operations.py", line 818, in forward
return model_forward(*args, **kwargs)
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\accelerate\utils\operations.py", line 806, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\amp\autocast_mode.py", line 44, in decorate_autocast
return func(*args, **kwargs)
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\peft\peft_model.py", line 1719, in forward
return self.base_model(
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\peft\tuners\tuners_utils.py", line 197, in forward
return self.model.forward(*args, **kwargs)
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\transformers\models\qwen2\modeling_qwen2.py", line 816, in forward
outputs = self.model(
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\transformers\models\qwen2\modeling_qwen2.py", line 521, in forward
raise ValueError("You must specify exactly one of input_ids or inputs_embeds")
ValueError: You must specify exactly one of input_ids or inputs_embeds