解决“Unable to create tensor, you should probably activate truncation and/or padding with ‘padding...”

1,完整报错为
“Unable to create tensor, you should probably activate truncation and/or padding with ‘padding=True’ ‘truncation=True’ to have batched tensors with the same length. Perhaps your features (labels in this case) have excessive nesting (inputs type list where type int is expected)”
2,问题出现背景
基于Qwen-1.5(github地址)官方提供的微调脚本,使用72B-Chat为底座并使用lora进行高效参数微调,脚本启动后报上述错误
3,问题解决
官方提供的dockefile中transformer的版本为4.37.0,我的版本为4.40.0,回退到4.37.0版本,问题解决。

BUG的具体信息还没有仔细理解,后续如果理解了再更新吧,也希望有大神看到评论具体分析下。

Done!!!

data_collator = DataCollatorForTokenClassification(tokenizer) train_loader = DataLoader(train_encodings, shuffle=True, collate_fn=data_collator, batch_size=BATCH_SIZE) for batch in train_loader: print("Batch sample:") # 假设这个 batch 是一个字典,键的情况可能根据 data_collator 的实现有所不同 for key, value in batch.items(): print(f"{key}: {value.shape}") # 打印每个字段及其形状 break # 只打印第一个批次以避免过长输出Traceback (most recent call last): File "D:\Anaconda\envs\pytorch\Lib\site-packages\transformers\tokenization_utils_base.py", line 759, in convert_to_tensors tensor = as_tensor(value) ^^^^^^^^^^^^^^^^ File "D:\Anaconda\envs\pytorch\Lib\site-packages\transformers\tokenization_utils_base.py", line 721, in as_tensor return torch.tensor(value) ^^^^^^^^^^^^^^^^^^^ ValueError: too many dimensions 'str' The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:\Users\HTL的笔记本\Desktop\Graduation project\Question\src\data_loader.py", line 88, in <module> for batch in train_loader: File "D:\Anaconda\envs\pytorch\Lib\site-packages\torch\utils\data\dataloader.py", line 631, in __next__ data = self._next_data() ^^^^^^^^^^^^^^^^^ File "D:\Anaconda\envs\pytorch\Lib\site-packages\torch\utils\data\dataloader.py", line 675, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Anaconda\envs\pytorch\Lib\site-packages\torch\utils\data\_utils\fetch.py", line 54, in fetch return self.collate_fn(data) ^^^^^^^^^^^^^^^^^^^^^ File "D:\Anaconda\envs\pytorch\Lib\site-packages\transformers\data\data_collator.py", line 45, in __call__ return self.torch_call(features) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Anaconda\envs\pytorch\Lib\site-packages\transformers\data\data_collator.py", line 333, in torch_call batch = pad_without_fast_tokenizer_warning( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Anaconda\envs\pytorch\Lib\site-packages\transformers\data\data_collator.py", line 66, in pad_without_fast_tokenizer_warning padded = tokenizer.pad(*pad_args, **pad_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Anaconda\envs\pytorch\Lib\site-packages\transformers\tokenization_utils_base.py", line 3380, in pad return BatchEncoding(batch_outputs, tensor_type=return_tensors) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Anaconda\envs\pytorch\Lib\site-packages\transformers\tokenization_utils_base.py", line 224, in __init__ self.convert_to_tensors(tensor_type=tensor_type, prepend_batch_axis=prepend_batch_axis) File "D:\Anaconda\envs\pytorch\Lib\site-packages\transformers\tokenization_utils_base.py", line 775, in convert_to_tensors raise ValueError( ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (`tokens` in this case) have excessive nesting (inputs type `list` where type `int` is expected).
最新发布
03-18
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

CrystalheartLi

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值