出现“RuntimeError: CuDNN error: CUDNN_STATES_SUCCESS”问题的解决方案

文章讲述了在使用PyTorch时遇到CUDA和CuDNN版本不兼容的问题,作者发现即使按官方步骤安装,仍需设置torch.backends.cudnn.benchmark=True来解决问题。
部署运行你感兴趣的模型镜像

出现这个问题一般是CUDA和CuDNN版本的问题,之前查找解决方案说cuda版本不匹配,但是在pysot项目中,我是按照步骤安装的环境,

conda install pytorch=0.4.1 torchvision cuda90 -c pytorch

 

所以,需要设置PyTorch的CuDNN基准测试即可,所以在运行程序中添加以下代码:

torch.backends.cudnn.benchmark = True

然后就ok了。

添加的地方如下图所示:

 

您可能感兴趣的与本文相关的镜像

PyTorch 2.6

PyTorch 2.6

PyTorch
Cuda

PyTorch 是一个开源的 Python 机器学习库,基于 Torch 库,底层由 C++ 实现,应用于人工智能领域,如计算机视觉和自然语言处理

分析跑internVL断点续训练的报错原因,并提供解决方法。 [INFO|deepspeed.py:400] 2025-11-06 08:03:17,906 >> Attempting to resume from work_dirs/InternVL_1b_remote_sensing_ViTP_03NTL_WAS_digit_weight=2n+1_loss/checkpoint-6000 [2025-11-06 08:03:17,907] [INFO] [torch_checkpoint_engine.py:27:load] [Torch] Loading checkpoint from work_dirs/InternVL_1b_remote_sensing_ViTP_03NTL_WAS_digit_weight=2n+1_loss/checkpoint-6000/global_step6000/mp_rank_00_model_states.pt... [2025-11-06 08:03:21,497] [INFO] [torch_checkpoint_engine.py:29:load] [Torch] Loaded checkpoint from work_dirs/InternVL_1b_remote_sensing_ViTP_03NTL_WAS_digit_weight=2n+1_loss/checkpoint-6000/global_step6000/mp_rank_00_model_states.pt. [2025-11-06 08:03:21,631] [INFO] [torch_checkpoint_engine.py:27:load] [Torch] Loading checkpoint from work_dirs/InternVL_1b_remote_sensing_ViTP_03NTL_WAS_digit_weight=2n+1_loss/checkpoint-6000/global_step6000/mp_rank_00_model_states.pt... [2025-11-06 08:03:22,980] [INFO] [torch_checkpoint_engine.py:29:load] [Torch] Loaded checkpoint from work_dirs/InternVL_1b_remote_sensing_ViTP_03NTL_WAS_digit_weight=2n+1_loss/checkpoint-6000/global_step6000/mp_rank_00_model_states.pt. [2025-11-06 08:03:23,347] [INFO] [torch_checkpoint_engine.py:27:load] [Torch] Loading checkpoint from work_dirs/InternVL_1b_remote_sensing_ViTP_03NTL_WAS_digit_weight=2n+1_loss/checkpoint-6000/global_step6000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... [rank1]: Traceback (most recent call last): [rank1]: File "/filesdir/ZZH/ViTP/ViTP/internvl/train/internvl_chat_finetune.py", line 1161, in <module> [rank1]: main() [rank1]: File "/filesdir/ZZH/ViTP/ViTP/internvl/train/internvl_chat_finetune.py", line 1146, in main [rank1]: train_result = trainer.train(resume_from_checkpoint=checkpoint) [rank1]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/transformers/trainer.py", line 1539, in train [rank1]: return inner_training_loop( [rank1]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/transformers/trainer.py", line 1708, in _inner_training_loop [rank1]: deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint) [rank1]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/transformers/integrations/deepspeed.py", line 402, in deepspeed_load_checkpoint [rank1]: load_path, _ = deepspeed_engine.load_checkpoint( [rank1]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2770, in load_checkpoint [rank1]: success = self._load_zero_checkpoint(load_dir, tag, load_optimizer_states=load_optimizer_states) [rank1]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2955, in _load_zero_checkpoint [rank1]: zero_sd_list = self._get_all_zero_checkpoints(load_dir, tag) [rank1]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 3030, in _get_all_zero_checkpoints [rank1]: return self._get_all_zero_checkpoint_state_dicts(zero_ckpt_names) [rank1]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 3009, in _get_all_zero_checkpoint_state_dicts [rank1]: _state = self.checkpoint_engine.load( [rank1]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/deepspeed/runtime/checkpoint_engine/torch_checkpoint_engine.py", line 28, in load [rank1]: partition = torch.load(path, map_location=map_location) [rank1]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/torch/serialization.py", line 1529, in load [rank1]: raise pickle.UnpicklingError(_get_wo_message(str(e))) from None [rank1]: _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. [rank1]: (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. [rank1]: (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. [rank1]: WeightsUnpickler error: Unsupported global: GLOBAL deepspeed.runtime.fp16.loss_scaler.LossScaler was not an allowed global by default. Please use `torch.serialization.add_safe_globals([deepspeed.runtime.fp16.loss_scaler.LossScaler])` or the `torch.serialization.safe_globals([deepspeed.runtime.fp16.loss_scaler.LossScaler])` context manager to allowlist this global if you trust this class/function. [rank1]: Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. [rank0]: Traceback (most recent call last): [rank0]: File "/filesdir/ZZH/ViTP/ViTP/internvl/train/internvl_chat_finetune.py", line 1161, in <module> [rank0]: main() [rank0]: File "/filesdir/ZZH/ViTP/ViTP/internvl/train/internvl_chat_finetune.py", line 1146, in main [rank0]: train_result = trainer.train(resume_from_checkpoint=checkpoint) [rank0]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/transformers/trainer.py", line 1539, in train [rank0]: return inner_training_loop( [rank0]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/transformers/trainer.py", line 1708, in _inner_training_loop [rank0]: deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint) [rank0]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/transformers/integrations/deepspeed.py", line 402, in deepspeed_load_checkpoint [rank0]: load_path, _ = deepspeed_engine.load_checkpoint( [rank0]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2770, in load_checkpoint [rank0]: success = self._load_zero_checkpoint(load_dir, tag, load_optimizer_states=load_optimizer_states) [rank0]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2955, in _load_zero_checkpoint [rank0]: zero_sd_list = self._get_all_zero_checkpoints(load_dir, tag) [rank0]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 3030, in _get_all_zero_checkpoints [rank0]: return self._get_all_zero_checkpoint_state_dicts(zero_ckpt_names) [rank0]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 3009, in _get_all_zero_checkpoint_state_dicts [rank0]: _state = self.checkpoint_engine.load( [rank0]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/deepspeed/runtime/checkpoint_engine/torch_checkpoint_engine.py", line 28, in load [rank0]: partition = torch.load(path, map_location=map_location) [rank0]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/torch/serialization.py", line 1529, in load [rank0]: raise pickle.UnpicklingError(_get_wo_message(str(e))) from None [rank0]: _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. [rank0]: (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. [rank0]: (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. [rank0]: WeightsUnpickler error: Unsupported global: GLOBAL deepspeed.runtime.fp16.loss_scaler.LossScaler was not an allowed global by default. Please use `torch.serialization.add_safe_globals([deepspeed.runtime.fp16.loss_scaler.LossScaler])` or the `torch.serialization.safe_globals([deepspeed.runtime.fp16.loss_scaler.LossScaler])` context manager to allowlist this global if you trust this class/function. [rank0]: Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. [rank3]: Traceback (most recent call last): [rank3]: File "/filesdir/ZZH/ViTP/ViTP/internvl/train/internvl_chat_finetune.py", line 1161, in <module> [rank3]: main() [rank3]: File "/filesdir/ZZH/ViTP/ViTP/internvl/train/internvl_chat_finetune.py", line 1146, in main [rank3]: train_result = trainer.train(resume_from_checkpoint=checkpoint) [rank3]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/transformers/trainer.py", line 1539, in train [rank3]: return inner_training_loop( [rank3]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/transformers/trainer.py", line 1708, in _inner_training_loop [rank3]: deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint) [rank3]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/transformers/integrations/deepspeed.py", line 402, in deepspeed_load_checkpoint [rank3]: load_path, _ = deepspeed_engine.load_checkpoint( [rank3]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2770, in load_checkpoint [rank3]: success = self._load_zero_checkpoint(load_dir, tag, load_optimizer_states=load_optimizer_states) [rank3]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2955, in _load_zero_checkpoint [rank3]: zero_sd_list = self._get_all_zero_checkpoints(load_dir, tag) [rank3]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 3030, in _get_all_zero_checkpoints [rank3]: return self._get_all_zero_checkpoint_state_dicts(zero_ckpt_names) [rank3]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 3009, in _get_all_zero_checkpoint_state_dicts [rank3]: _state = self.checkpoint_engine.load( [rank3]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/deepspeed/runtime/checkpoint_engine/torch_checkpoint_engine.py", line 28, in load [rank3]: partition = torch.load(path, map_location=map_location) [rank3]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/torch/serialization.py", line 1529, in load [rank3]: raise pickle.UnpicklingError(_get_wo_message(str(e))) from None [rank3]: _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. [rank3]: (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. [rank3]: (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. [rank3]: WeightsUnpickler error: Unsupported global: GLOBAL deepspeed.runtime.fp16.loss_scaler.LossScaler was not an allowed global by default. Please use `torch.serialization.add_safe_globals([deepspeed.runtime.fp16.loss_scaler.LossScaler])` or the `torch.serialization.safe_globals([deepspeed.runtime.fp16.loss_scaler.LossScaler])` context manager to allowlist this global if you trust this class/function. [rank3]: Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. [rank2]: Traceback (most recent call last): [rank2]: File "/filesdir/ZZH/ViTP/ViTP/internvl/train/internvl_chat_finetune.py", line 1161, in <module> [rank2]: main() [rank2]: File "/filesdir/ZZH/ViTP/ViTP/internvl/train/internvl_chat_finetune.py", line 1146, in main [rank2]: train_result = trainer.train(resume_from_checkpoint=checkpoint) [rank2]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/transformers/trainer.py", line 1539, in train [rank2]: return inner_training_loop( [rank2]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/transformers/trainer.py", line 1708, in _inner_training_loop [rank2]: deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint) [rank2]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/transformers/integrations/deepspeed.py", line 402, in deepspeed_load_checkpoint [rank2]: load_path, _ = deepspeed_engine.load_checkpoint( [rank2]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2770, in load_checkpoint [rank2]: success = self._load_zero_checkpoint(load_dir, tag, load_optimizer_states=load_optimizer_states) [rank2]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2955, in _load_zero_checkpoint [rank2]: zero_sd_list = self._get_all_zero_checkpoints(load_dir, tag) [rank2]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 3030, in _get_all_zero_checkpoints [rank2]: return self._get_all_zero_checkpoint_state_dicts(zero_ckpt_names) [rank2]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 3009, in _get_all_zero_checkpoint_state_dicts [rank2]: _state = self.checkpoint_engine.load( [rank2]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/deepspeed/runtime/checkpoint_engine/torch_checkpoint_engine.py", line 28, in load [rank2]: partition = torch.load(path, map_location=map_location) [rank2]: File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/torch/serialization.py", line 1529, in load [rank2]: raise pickle.UnpicklingError(_get_wo_message(str(e))) from None [rank2]: _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. [rank2]: (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. [rank2]: (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. [rank2]: WeightsUnpickler error: Unsupported global: GLOBAL deepspeed.runtime.fp16.loss_scaler.LossScaler was not an allowed global by default. Please use `torch.serialization.add_safe_globals([deepspeed.runtime.fp16.loss_scaler.LossScaler])` or the `torch.serialization.safe_globals([deepspeed.runtime.fp16.loss_scaler.LossScaler])` context manager to allowlist this global if you trust this class/function. [rank2]: Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. [rank0]:[W1106 08:03:25.600984408 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) W1106 08:03:26.770491 1735 site-packages/torch/distributed/elastic/multiprocessing/api.py:900] Sending process 1813 closing signal SIGTERM W1106 08:03:26.771584 1735 site-packages/torch/distributed/elastic/multiprocessing/api.py:900] Sending process 1814 closing signal SIGTERM W1106 08:03:26.771985 1735 site-packages/torch/distributed/elastic/multiprocessing/api.py:900] Sending process 1815 closing signal SIGTERM E1106 08:03:27.887712 1735 site-packages/torch/distributed/elastic/multiprocessing/api.py:874] failed (exitcode: 1) local_rank: 0 (pid: 1812) of binary: /root/miniconda3/envs/ViTP/bin/python3.9 Traceback (most recent call last): File "/root/miniconda3/envs/ViTP/bin/torchrun", line 7, in <module> sys.exit(main()) File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 357, in wrapper return f(*args, **kwargs) File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/torch/distributed/run.py", line 901, in main run(args) File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/torch/distributed/run.py", line 892, in run elastic_launch( File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 143, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/root/miniconda3/envs/ViTP/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 277, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ internvl/train/internvl_chat_finetune.py FAILED
最新发布
11-07
rank3]: Traceback (most recent call last): [rank3]: File "/root/DriveDreamer/dreamer-train/dreamer_train/distributed/run_task.py", line 42, in <module> [rank3]: main() [rank3]: File "/root/DriveDreamer/dreamer-train/dreamer_train/distributed/run_task.py", line 38, in main [rank3]: run_tasks(args.config, args.runners) [rank3]: File "/root/DriveDreamer/dreamer-train/dreamer_train/distributed/run_task.py", line 27, in run_tasks [rank3]: runner.train() [rank3]: File "/root/DriveDreamer/dreamer-train/dreamer_train/trainers/trainer.py", line 513, in train [rank3]: batch_dict = next(dataloader_iter) [rank3]: ^^^^^^^^^^^^^^^^^^^^^ [rank3]: File "/opt/conda/lib/python3.11/site-packages/accelerate/data_loader.py", line 579, in __iter__ [rank3]: next_batch = next(dataloader_iter) [rank3]: ^^^^^^^^^^^^^^^^^^^^^ [rank3]: File "/opt/conda/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 733, in __next__ [rank3]: data = self._next_data() [rank3]: ^^^^^^^^^^^^^^^^^ [rank3]: File "/opt/conda/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1491, in _next_data [rank3]: idx, data = self._get_data() [rank3]: ^^^^^^^^^^^^^^^^ [rank3]: File "/opt/conda/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1453, in _get_data [rank3]: success, data = self._try_get_data() [rank3]: ^^^^^^^^^^^^^^^^^^^^ [rank3]: File "/opt/conda/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1297, in _try_get_data [rank3]: raise RuntimeError( [rank3]: RuntimeError: DataLoader worker (pid(s) 116846) exited unexpectedly ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). [rank5]: Traceback (most recent call last): [rank5]: File "/root/DriveDreamer/dreamer-train/dreamer_train/distributed/run_task.py", line 42, in <module> [rank5]: main() [rank5]: File "/root/DriveDreamer/dreamer-train/dreamer_train/distributed/run_task.py", line 38, in main [rank5]: run_tasks(args.config, args.runners) [rank5]: File "/root/DriveDreamer/dreamer-train/dreamer_train/distributed/run_task.py", line 27, in run_tasks [rank5]: runner.train() [rank5]: File "/root/DriveDreamer/dreamer-train/dreamer_train/trainers/trainer.py", line 515, in train [rank5]: losses = self.forward_step(batch_dict) [rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank5]: File "/root/DriveDreamer/dreamer-train/projects/DriveDreamer/drivedreamer/trainer.py", line 190, in forward_step [rank5]: latents = self.vae.encode(image).latent_dist.sample() [rank5]: ^^^^^^^^^^^^^^^^^^^^^^ [rank5]: File "/opt/conda/lib/python3.11/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper [rank5]: return method(self, *args, **kwargs) [rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank5]: File "/opt/conda/lib/python3.11/site-packages/diffusers/models/autoencoders/autoencoder_kl.py", line 278, in encode [rank5]: h = self._encode(x) [rank5]: ^^^^^^^^^^^^^^^ [rank5]: File "/opt/conda/lib/python3.11/site-packages/diffusers/models/autoencoders/autoencoder_kl.py", line 252, in _encode [rank5]: enc = self.encoder(x) [rank5]: ^^^^^^^^^^^^^^^ [rank5]: File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl [rank5]: return self._call_impl(*args, **kwargs) [rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank5]: File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl [rank5]: return forward_call(*args, **kwargs) [rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank5]: File "/opt/conda/lib/python3.11/site-packages/diffusers/models/autoencoders/vae.py", line 168, in forward [rank5]: sample = down_block(sample) [rank5]: ^^^^^^^^^^^^^^^^^^ [rank5]: File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl [rank5]: return self._call_impl(*args, **kwargs) [rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank5]: File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl [rank5]: return forward_call(*args, **kwargs) [rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank5]: File "/opt/conda/lib/python3.11/site-packages/diffusers/models/unets/unet_2d_blocks.py", line 1442, in forward [rank5]: hidden_states = resnet(hidden_states, temb=None) [rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank5]: File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl [rank5]: return self._call_impl(*args, **kwargs) [rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank5]: File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl [rank5]: return forward_call(*args, **kwargs) [rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank5]: File "/opt/conda/lib/python3.11/site-packages/torch/utils/data/_utils/signal_handling.py", line 73, in handler [rank5]: _error_if_any_worker_fails() [rank5]: RuntimeError: DataLoader worker (pid 116842) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.
08-06
评论 1
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值