RuntimeError: External Comm Manager: Create the hccl communication group failed. export ASDOPS_LOG_LEVEL=ERROR, export ASDOPS_LOG_TO_STDOUT=1 to see more details. Default log path is $HOME/atb/log.
2025-05-11 13:26:43,621 [ERROR] model.py:42 - [Model] >>> return initialize error result: {'status': 'error', 'npuBlockNum': '0', 'cpuBlockNum': '0'}
2025-05-11 13:26:43,618 [ERROR] model.py:39 - [Model] >>> Exception:External Comm Manager: Create the hccl communication group failed. export ASDOPS_LOG_LEVEL=ERROR, export ASDOPS_LOG_TO_STDOUT=1 to see more details. Default log path is $HOME/atb/log.
Traceback (most recent call last):
File "/root/anaconda3/envs/python311/lib/python3.11/site-packages/model_wrapper/model.py", line 37, in initialize
return self.python_model.initialize(config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/python311/lib/python3.11/site-packages/model_wrapper/standard_model.py", line 146, in initialize
self.generator = Generator(
^^^^^^^^^^
File "/root/anaconda3/envs/python311/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 119, in __init__
self.warm_up(max_prefill_tokens, max_seq_len, max_input_len, max_iter_times, inference_mode)
File "/root/anaconda3/envs/python311/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 303, in warm_up
raise e
File "/root/anaconda3/envs/python311/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 296, in warm_up
self._generate_inputs_warm_up_backend(input_metadata, inference_mode, dummy=True)
File "/root/anaconda3/envs/python311/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 378, in _generate_inputs_warm_up_backend
self.generator_backend.warm_up(model_inputs, inference_mode=inference_mode)
File "/root/anaconda3/envs/python311/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_torch.py", line 198, in warm_up
super().warm_up(model_inputs)
File "/root/anaconda3/envs/python311/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_backend.py", line 170, in warm_up
_ = self.forward(model_inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/python311/lib/python3.11/site-packages/mindie_llm/utils/decorators/time_decorator.py", line 38, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/python311/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_torch.py", line 153, in forward
logits = self.model_wrapper.forward(model_inputs, self.cache_pool.npu_cache, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/python311/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/atb/atb_model_wrapper.py", line 89, in forward
logits = self.forward_tensor(
^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/python311/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/atb/atb_model_wrapper.py", line 116, in forward_tensor
logits = self.model_runner.forward(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/Ascend/atb-models/atb_llm/runner/model_runner.py", line 297, in forward
res = self.model.forward(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/Ascend/atb-models/atb_llm/models/base/flash_causal_lm.py", line 491, in forward
self.init_ascend_weight()
File "/usr/local/Ascend/atb-models/atb_llm/models/qwen2/flash_causal_qwen2.py", line 287, in init_ascend_weight
self.acl_encoder_operation.set_param(json.dumps({**encoder_param}))
RuntimeError: External Comm Manager: Create the hccl communication group failed. export ASDOPS_LOG_LEVEL=ERROR, export ASDOPS_LOG_TO_STDOUT=1 to see more details. Default log path is $HOME/atb/log.
2025-05-11 13:26:43,623 [ERROR] model.py:42 - [Model] >>> return initialize error result: {'status': 'error', 'npuBlockNum': '0', 'cpuBlockNum': '0'}
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
/root/anaconda3/envs/python311/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 30 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
/root/anaconda3/envs/python311/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 30 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
/root/anaconda3/envs/python311/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 30 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
/root/anaconda3/envs/python311/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 30 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Daemon is killing...
Killed
(python311) root@zhangzhouzhixiao:/usr/local/Ascend/mindie/latest/mindie-service/bin# 我确认过当前容器内没有对应的hccl,我该如何安装?
最新发布