1 系统环境
硬件环境(Ascend/GPU/CPU): Ascend
MindSpore版本: 2.2
执行模式(PyNative/ Graph): 不限
2 报错信息
2.1 问题描述
使用MindSpore2.2+CANN 7.0环境,并行策略为dp:mp:pp=1:4:2时可正常跑通训练,但是改变并行策略为dp:mp:pp = 1:8:1时出现如下报错:
2023-10-28-10:52:46.336.155 -> 2023-10-28-10:52:46.336.477
For more details, please refer to the FAQ at https://www.mindspore.cn/docs/en/master/faag/data processing.htmlL.
Traceback (most recent call last):
File “wizardcoder/run_wizardcoder.py", line 170, in <module>
merge_file=args.merge_file)
File “wizardcoder/run_wizardcoder.py", line 104, in main
task.finetune(finetune_checkpoint=config.load_checkpoint, auto_trans_ckpt=config.auto_trans_ckpt, resume=resume)
File "/home/wizardcoder/2_wizardcoder-mindformers -1019/mindformers/trainer/trainer.py", line 462, in finetune