(style_tune) C:\Users\28996\Desktop\AI\persona_contrastive_finetuning>python Contrastive_Training_LM.py
INFO:accelerate.utils.modeling:We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
trainable params: 1,572,864 || all params: 1,838,401,536 || trainable%: 0.0856
训练集样本示例: {'anchor_input_ids': [56568, 118919, 116122, 11319], 'positive_input_ids': [116122, 20412, 107340, 9370, 100357, 102323, 3837, 109202, 104078, 103975, 100675, 101940, 100912, 105054, 6313], 'negative_input_ids': [100323, 104307, 99245, 9370, 106059, 104060, 3837, 104530, 115604, 99329, 11319]}
验证集样本示例: {'anchor_input_ids': [56568, 118919, 116122, 11319], 'positive_input_ids': [116122, 20412, 107340, 9370, 100357, 102323, 3837, 109202, 104078, 103975, 100675, 101940, 100912, 105054, 6313], 'negative_input_ids': [100323, 104307, 99245, 9370, 106059, 104060, 3837, 104530, 115604, 99329, 11319]}
Trainer.tokenizer is now deprecated. You should use `Trainer.processing_class = processing_class` instead.
INFO:__main__:GPU内存使用: 已分配 2.93GB, 保留 4.13GB
可训练参数列表:
- base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.0.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.0.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.0.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.1.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.1.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.1.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.1.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.2.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.2.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.2.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.2.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.3.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.3.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.3.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.3.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.4.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.4.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.4.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.4.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.5.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.5.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.5.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.5.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.6.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.6.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.6.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.6.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.7.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.7.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.7.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.7.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.8.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.8.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.8.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.8.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.9.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.9.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.9.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.9.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.10.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.10.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.10.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.10.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.11.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.11.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.11.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.11.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.12.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.12.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.12.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.12.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.13.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.13.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.13.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.13.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.14.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.14.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.14.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.14.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.15.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.15.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.15.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.15.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.16.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.16.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.16.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.16.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.17.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.17.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.17.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.17.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.18.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.18.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.18.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.18.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.19.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.19.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.19.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.19.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.20.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.20.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.20.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.20.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.21.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.21.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.21.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.21.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.22.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.22.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.22.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.22.self_attn.v_proj.lora_B.default.weight
- base_model.model.model.layers.23.self_attn.q_proj.lora_A.default.weight
- base_model.model.model.layers.23.self_attn.q_proj.lora_B.default.weight
- base_model.model.model.layers.23.self_attn.v_proj.lora_A.default.weight
- base_model.model.model.layers.23.self_attn.v_proj.lora_B.default.weight
0%| | 0/3 [00:00<?, ?it/s]You're using a Qwen2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
INFO:__main__:GPU内存使用: 已分配 4.00GB, 保留 4.21GB
Could not estimate the number of tokens of the input, floating-point operations will not be computed
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
INFO:__main__:GPU内存使用: 已分配 4.02GB, 保留 4.22GB
33%|████████████████████████████ | 1/3 [00:03<00:07, 3.66s/it]Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
INFO:__main__:GPU内存使用: 已分配 4.01GB, 保留 4.25GB
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
INFO:__main__:GPU内存使用: 已分配 4.02GB, 保留 4.26GB
67%|████████████████████████████████████████████████████████ | 2/3 [00:06<00:03, 3.36s/it]Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
INFO:__main__:GPU内存使用: 已分配 4.01GB, 保留 4.25GB
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
INFO:__main__:GPU内存使用: 已分配 4.02GB, 保留 4.26GB
{'train_runtime': 10.2272, 'train_samples_per_second': 0.587, 'train_steps_per_second': 0.293, 'train_loss': 1.0806043942769368, 'epoch': 3.0}
100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:10<00:00, 3.41s/it]
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
评估过程中发生错误: 'anchor_input_ids'
Traceback (most recent call last):
File "C:\Users\28996\Desktop\AI\persona_contrastive_finetuning\Contrastive_Training_LM.py", line 437, in <module>
eval_results = trainer.evaluate()
File "C:\Users\28996\Desktop\AI\persona_contrastive_finetuning\Contrastive_Training_LM.py", line 299, in evaluate
return super().evaluate(
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\transformers\trainer.py", line 4076, in evaluate
output = eval_loop(
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\transformers\trainer.py", line 4270, in evaluation_loop
losses, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\transformers\trainer.py", line 4486, in prediction_step
loss, outputs = self.compute_loss(model, inputs, return_outputs=True)
File "C:\Users\28996\Desktop\AI\persona_contrastive_finetuning\Contrastive_Training_LM.py", line 224, in compute_loss
anchor_ids = inputs["anchor_input_ids"]
KeyError: 'anchor_input_ids'
最新发布