error related with autoconfig

Errors as follow:

./configure: line 2140: AX_CHECK_ENABLE_DEBUG: command not found

checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
./configure: line 2262: AX_COMPILER_VENDOR: command not found
./configure: line 2263: AX_COMPILER_VERSION: command not found
checking for a sed that does not truncate output... /bin/sed
checking for g++... g++
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking how to run the C++ preprocessor... g++ -E
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking cstdint usability... no
checking cstdint presence... no
checking for cstdint... no
./configure: line 3246: AX_PTHREAD: command not found
./configure: line 3252: syntax error near unexpected token `${cxxflags_test}'

./configure: line 3252: `       AX_APPEND_COMPILE_FLAGS(${cxxflags_test} -Wall -Wno-redeclared-class-member, CXXFLAGS)'


Reason:

Micros such as AX_PTHREAD were defined in autoconf-archive. We must install autoconf-archive first. Then we get a directory which contain th *.m4 files, such as ax_pthread.m4. Suppose this path is /home/tom/04.lib/autoconf-archive/share/aclocal.

To solve,

add a line to file Makefile.am: 

ACLOCAL_AMFLAGS = -I /home/tom/04.lib/autoconf-archive/share/aclocal

then:

autoreconf -vif



bash: docker: command not found [root@ac6b15bb1f77 mas]# python -m vllm.entrypoints.openai.api_server \ > --model /models/z50051264/summary/Qwen2.5-7B-awq/ \ > --max-num-seqs=256 \ > --max-model-len=4096 \ > --max-num-batched-tokens=4096 \ > --tensor-parallel-size=1 \ > --block-size=128 \ > --host=0.0.0.0 \ > --port=8080 \ > --gpu-memory-utilization=0.9 \ > --trust-remote-code \ > --served-model-name=zzz \ > --quantization awq INFO 07-25 03:09:14 [__init__.py:39] Available plugins for group vllm.platform_plugins: INFO 07-25 03:09:14 [__init__.py:41] - ascend -> vllm_ascend:register INFO 07-25 03:09:14 [__init__.py:44] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 07-25 03:09:14 [__init__.py:235] Platform plugin ascend is activated WARNING 07-25 03:09:15 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") INFO 07-25 03:09:18 [importing.py:63] Triton not installed or not compatible; certain GPU-related functions will not be available. WARNING 07-25 03:09:19 [registry.py:413] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP. WARNING 07-25 03:09:19 [registry.py:413] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration. WARNING 07-25 03:09:19 [registry.py:413] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration. WARNING 07-25 03:09:19 [registry.py:413] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM. WARNING 07-25 03:09:19 [registry.py:413] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM. WARNING 07-25 03:09:19 [registry.py:413] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM. INFO 07-25 03:09:20 [api_server.py:1395] vLLM API server version 0.9.2 INFO 07-25 03:09:20 [cli_args.py:325] non-default args: {'host': '0.0.0.0', 'port': 8080, 'model': '/models/z50051264/summary/Qwen2.5-7B-awq/', 'trust_remote_code': True, 'max_model_len': 4096, 'quantization': 'awq', 'served_model_name': ['zzz'], 'block_size': 128, 'max_num_batched_tokens': 4096, 'max_num_seqs': 256} INFO 07-25 03:09:34 [config.py:841] This model supports multiple tasks: {'generate', 'classify', 'embed', 'reward'}. Defaulting to 'generate'. INFO 07-25 03:09:34 [config.py:1472] Using max model len 4096 WARNING 07-25 03:09:35 [config.py:960] ascend quantization is not fully optimized yet. The speed can be slower than non-quantized models. INFO 07-25 03:09:35 [config.py:2285] Chunked prefill is enabled with max_num_batched_tokens=4096. INFO 07-25 03:09:35 [platform.py:174] PIECEWISE compilation enabled on NPU. use_inductor not supported - using only ACL Graph mode INFO 07-25 03:09:35 [utils.py:321] Calculated maximum supported batch sizes for ACL graph: 66 INFO 07-25 03:09:35 [utils.py:336] Adjusted ACL graph batch sizes for Qwen2ForCausalLM model (layers: 28): 67 → 66 sizes INFO 07-25 03:09:45 [__init__.py:39] Available plugins for group vllm.platform_plugins: INFO 07-25 03:09:45 [__init__.py:41] - ascend -> vllm_ascend:register INFO 07-25 03:09:45 [__init__.py:44] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 07-25 03:09:45 [__init__.py:235] Platform plugin ascend is activated WARNING 07-25 03:09:46 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") INFO 07-25 03:09:50 [importing.py:63] Triton not installed or not compatible; certain GPU-related functions will not be available. INFO 07-25 03:09:50 [core.py:526] Waiting for init message from front-end. WARNING 07-25 03:09:50 [registry.py:413] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP. WARNING 07-25 03:09:50 [registry.py:413] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration. WARNING 07-25 03:09:50 [registry.py:413] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration. WARNING 07-25 03:09:50 [registry.py:413] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM. WARNING 07-25 03:09:50 [registry.py:413] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM. WARNING 07-25 03:09:50 [registry.py:413] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM. INFO 07-25 03:09:50 [core.py:69] Initializing a V1 LLM engine (v0.9.2) with config: model='/models/z50051264/summary/Qwen2.5-7B-awq/', speculative_config=None, tokenizer='/models/z50051264/summary/Qwen2.5-7B-awq/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=ascend, enforce_eager=False, kv_cache_dtype=auto, device_config=npu, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=zzz, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["all"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.unified_ascend_attention_with_output"],"use_inductor":false,"compile_sizes":[],"inductor_compile_config":{},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null} ERROR 07-25 03:09:53 [core.py:586] EngineCore failed to start. ERROR 07-25 03:09:53 [core.py:586] Traceback (most recent call last): ERROR 07-25 03:09:53 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 577, in run_engine_core ERROR 07-25 03:09:53 [core.py:586] engine_core = EngineCoreProc(*args, **kwargs) ERROR 07-25 03:09:53 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 404, in __init__ ERROR 07-25 03:09:53 [core.py:586] super().__init__(vllm_config, executor_class, log_stats, ERROR 07-25 03:09:53 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 75, in __init__ ERROR 07-25 03:09:53 [core.py:586] self.model_executor = executor_class(vllm_config) ERROR 07-25 03:09:53 [core.py:586] File "/vllm-workspace/vllm/vllm/executor/executor_base.py", line 53, in __init__ ERROR 07-25 03:09:53 [core.py:586] self._init_executor() ERROR 07-25 03:09:53 [core.py:586] File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 47, in _init_executor ERROR 07-25 03:09:53 [core.py:586] self.collective_rpc("init_device") ERROR 07-25 03:09:53 [core.py:586] File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 57, in collective_rpc ERROR 07-25 03:09:53 [core.py:586] answer = run_method(self.driver_worker, method, args, kwargs) ERROR 07-25 03:09:53 [core.py:586] File "/vllm-workspace/vllm/vllm/utils/__init__.py", line 2736, in run_method ERROR 07-25 03:09:53 [core.py:586] return func(*args, **kwargs) ERROR 07-25 03:09:53 [core.py:586] File "/vllm-workspace/vllm/vllm/worker/worker_base.py", line 606, in init_device ERROR 07-25 03:09:53 [core.py:586] self.worker.init_device() # type: ignore ERROR 07-25 03:09:53 [core.py:586] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 132, in init_device ERROR 07-25 03:09:53 [core.py:586] NPUPlatform.set_device(device) ERROR 07-25 03:09:53 [core.py:586] File "/vllm-workspace/vllm-ascend/vllm_ascend/platform.py", line 98, in set_device ERROR 07-25 03:09:53 [core.py:586] torch.npu.set_device(device) ERROR 07-25 03:09:53 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/utils.py", line 80, in set_device ERROR 07-25 03:09:53 [core.py:586] torch_npu._C._npu_setDevice(device_id) ERROR 07-25 03:09:53 [core.py:586] RuntimeError: SetPrecisionMode:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:156 NPU function error: at_npu::native::AclSetCompileopt(aclCompileOpt::ACL_PRECISION_MODE, precision_mode), error code is 500001 ERROR 07-25 03:09:53 [core.py:586] [ERROR] 2025-07-25-03:09:53 (PID:977, Device:0, RankID:-1) ERR00100 PTA call acl api failed ERROR 07-25 03:09:53 [core.py:586] [Error]: The internal ACL of the system is incorrect. ERROR 07-25 03:09:53 [core.py:586] Rectify the fault based on the error information in the ascend log. ERROR 07-25 03:09:53 [core.py:586] EC0010: [PID: 977] 2025-07-25-03:09:53.177.260 Failed to import Python module [AttributeError: `np.float_` was removed in the NumPy 2.0 release. Use `np.float64` instead..]. ERROR 07-25 03:09:53 [core.py:586] Solution: Check that all required components are properly installed and the specified Python path matches the Python installation directory. (If the path does not match the directory, run set_env.sh in the installation package.) ERROR 07-25 03:09:53 [core.py:586] TraceBack (most recent call last): ERROR 07-25 03:09:53 [core.py:586] AOE Failed to call InitCannKB[FUNC:Initialize][FILE:python_adapter_manager.cc][LINE:47] ERROR 07-25 03:09:53 [core.py:586] Failed to initialize TeConfigInfo. ERROR 07-25 03:09:53 [core.py:586] [GraphOpt][InitializeInner][InitTbeFunc] Failed to init tbe.[FUNC:InitializeTeFusion][FILE:tbe_op_store_adapter.cc][LINE:1889] ERROR 07-25 03:09:53 [core.py:586] [GraphOpt][InitializeInner][InitTeFusion]: Failed to initialize TeFusion.[FUNC:InitializeInner][FILE:tbe_op_store_adapter.cc][LINE:1856] ERROR 07-25 03:09:53 [core.py:586] [SubGraphOpt][PreCompileOp][InitAdapter] InitializeAdapter adapter [tbe_op_adapter] failed! Ret [4294967295][FUNC:InitializeAdapter][FILE:op_store_adapter_manager.cc][LINE:79] ERROR 07-25 03:09:53 [core.py:586] [SubGraphOpt][PreCompileOp][Init] Initialize op store adapter failed, OpsStoreName[tbe-custom].[FUNC:Initialize][FILE:op_store_adapter_manager.cc][LINE:120] ERROR 07-25 03:09:53 [core.py:586] [FusionMngr][Init] Op store adapter manager init failed.[FUNC:Initialize][FILE:fusion_manager.cc][LINE:115] ERROR 07-25 03:09:53 [core.py:586] PluginManager InvokeAll failed.[FUNC:Initialize][FILE:ops_kernel_manager.cc][LINE:83] ERROR 07-25 03:09:53 [core.py:586] OpsManager initialize failed.[FUNC:InnerInitialize][FILE:gelib.cc][LINE:259] ERROR 07-25 03:09:53 [core.py:586] GELib::InnerInitialize failed.[FUNC:Initialize][FILE:gelib.cc][LINE:184] ERROR 07-25 03:09:53 [core.py:586] GEInitialize failed.[FUNC:GEInitialize][FILE:ge_api.cc][LINE:371] ERROR 07-25 03:09:53 [core.py:586] [Initialize][Ge]GEInitialize failed. ge result = 4294967295[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] ERROR 07-25 03:09:53 [core.py:586] [Init][Compiler]Init compiler failed[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145] ERROR 07-25 03:09:53 [core.py:586] [Set][Options]OpCompileProcessor init failed![FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145] ERROR 07-25 03:09:53 [core.py:586] Process EngineCore_0: Traceback (most recent call last): File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 590, in run_engine_core raise e File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 577, in run_engine_core engine_core = EngineCoreProc(*args, **kwargs) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 404, in __init__ super().__init__(vllm_config, executor_class, log_stats, File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 75, in __init__ self.model_executor = executor_class(vllm_config) File "/vllm-workspace/vllm/vllm/executor/executor_base.py", line 53, in __init__ self._init_executor() File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 47, in _init_executor self.collective_rpc("init_device") File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 57, in collective_rpc answer = run_method(self.driver_worker, method, args, kwargs) File "/vllm-workspace/vllm/vllm/utils/__init__.py", line 2736, in run_method return func(*args, **kwargs) File "/vllm-workspace/vllm/vllm/worker/worker_base.py", line 606, in init_device self.worker.init_device() # type: ignore File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 132, in init_device NPUPlatform.set_device(device) File "/vllm-workspace/vllm-ascend/vllm_ascend/platform.py", line 98, in set_device torch.npu.set_device(device) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/utils.py", line 80, in set_device torch_npu._C._npu_setDevice(device_id) RuntimeError: SetPrecisionMode:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:156 NPU function error: at_npu::native::AclSetCompileopt(aclCompileOpt::ACL_PRECISION_MODE, precision_mode), error code is 500001 [ERROR] 2025-07-25-03:09:53 (PID:977, Device:0, RankID:-1) ERR00100 PTA call acl api failed [Error]: The internal ACL of the system is incorrect. Rectify the fault based on the error information in the ascend log. EC0010: [PID: 977] 2025-07-25-03:09:53.177.260 Failed to import Python module [AttributeError: `np.float_` was removed in the NumPy 2.0 release. Use `np.float64` instead..]. Solution: Check that all required components are properly installed and the specified Python path matches the Python installation directory. (If the path does not match the directory, run set_env.sh in the installation package.) TraceBack (most recent call last): AOE Failed to call InitCannKB[FUNC:Initialize][FILE:python_adapter_manager.cc][LINE:47] Failed to initialize TeConfigInfo. [GraphOpt][InitializeInner][InitTbeFunc] Failed to init tbe.[FUNC:InitializeTeFusion][FILE:tbe_op_store_adapter.cc][LINE:1889] [GraphOpt][InitializeInner][InitTeFusion]: Failed to initialize TeFusion.[FUNC:InitializeInner][FILE:tbe_op_store_adapter.cc][LINE:1856] [SubGraphOpt][PreCompileOp][InitAdapter] InitializeAdapter adapter [tbe_op_adapter] failed! Ret [4294967295][FUNC:InitializeAdapter][FILE:op_store_adapter_manager.cc][LINE:79] [SubGraphOpt][PreCompileOp][Init] Initialize op store adapter failed, OpsStoreName[tbe-custom].[FUNC:Initialize][FILE:op_store_adapter_manager.cc][LINE:120] [FusionMngr][Init] Op store adapter manager init failed.[FUNC:Initialize][FILE:fusion_manager.cc][LINE:115] PluginManager InvokeAll failed.[FUNC:Initialize][FILE:ops_kernel_manager.cc][LINE:83] OpsManager initialize failed.[FUNC:InnerInitialize][FILE:gelib.cc][LINE:259] GELib::InnerInitialize failed.[FUNC:Initialize][FILE:gelib.cc][LINE:184] GEInitialize failed.[FUNC:GEInitialize][FILE:ge_api.cc][LINE:371] [Initialize][Ge]GEInitialize failed. ge result = 4294967295[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] [Init][Compiler]Init compiler failed[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145] [Set][Options]OpCompileProcessor init failed![FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145] Traceback (most recent call last): File "/usr/local/python3.10.17/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/local/python3.10.17/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 1495, in <module> uvloop.run(run_server(args)) File "/usr/local/python3.10.17/lib/python3.10/site-packages/uvloop/__init__.py", line 82, in run return loop.run_until_complete(wrapper()) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete File "/usr/local/python3.10.17/lib/python3.10/site-packages/uvloop/__init__.py", line 61, in wrapper return await main File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 1431, in run_server await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 1451, in run_server_worker async with build_async_engine_client(args, client_config) as engine_client: File "/usr/local/python3.10.17/lib/python3.10/contextlib.py", line 199, in __aenter__ return await anext(self.gen) File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 158, in build_async_engine_client async with build_async_engine_client_from_engine_args( File "/usr/local/python3.10.17/lib/python3.10/contextlib.py", line 199, in __aenter__ return await anext(self.gen) File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 194, in build_async_engine_client_from_engine_args async_llm = AsyncLLM.from_vllm_config( File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 162, in from_vllm_config return cls( File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 124, in __init__ self.engine_core = EngineCoreClient.make_async_mp_client( File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 96, in make_async_mp_client return AsyncMPClient(*client_args) File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 666, in __init__ super().__init__( File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 403, in __init__ with launch_core_engines(vllm_config, executor_class, File "/usr/local/python3.10.17/lib/python3.10/contextlib.py", line 142, in __exit__ next(self.gen) File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 434, in launch_core_engines wait_for_engine_startup( File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 484, in wait_for_engine_startup raise RuntimeError("Engine core initialization failed. " RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {} [ERROR] 2025-07-25-03:09:59 (PID:707, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception 分析报错
最新发布
07-26
<think>我们面对的是vLLM在Ascend NPU上启动时出现的ACL_PRECISION_MODE设置失败问题(错误代码500001)。这个错误发生在初始化阶段,具体是在设置编译选项时。根据引用[1]的错误信息,我们得知是在调用`at_npu::native::AclSetCompileopt`函数设置精度模式(ACL_PRECISION_MODE)时出现了问题。 首先,我们需要理解ACL_PRECISION_MODE的作用。在Ascend NPU的ACL(Ascend Computing Language)中,这个选项用于设置计算精度模式,通常有“allow_fp32_to_fp16”(允许降低精度以提升性能)、“must_keep_origin_dtype”(保持原始精度)等选项。错误代码500001通常表示参数无效或者环境配置问题。 根据引用[2]的描述,整个初始化流程包括: 1. 基础资源初始化(成功) 2. 模型资源初始化(开始并完成,包括为输出分配内存和创建输出数据集) 3. 资源释放(成功释放所有资源) 但是,在设置精度模式时出现了问题。因此,我们需要检查以下几个方面: ### 1. **检查ACL_PRECISION_MODE的值** 确保传递给`AclSetCompileopt`的`precision_mode`参数是ACL支持的合法值。常见的合法值有: - "allow_fp32_to_fp16" - "allow_mix_precision" - "must_keep_origin_dtype" - "allow_fp32_to_bf16" - 等等(具体取决于ACL版本) 如果传递了不支持的字符串,就会导致错误。 ### 2. **检查ACL环境配置** 错误可能由于ACL环境未正确初始化或配置导致。在调用`AclSetCompileopt`之前,必须确保已经成功调用了`aclInit`进行初始化。 ### 3. **检查ACL版本兼容性** 确保使用的ACL版本与vLLM版本兼容。有时候,新版本的ACL可能会修改或删除某些选项,而旧版本的vLLM可能使用了过时的选项。 ### 4. **检查硬件支持** 某些精度模式可能依赖于特定的硬件型号。例如,如果NPU硬件不支持BF16,那么设置"allow_fp32_to_bf16"就会失败。 ### 5. **查看更详细的日志** 启用ACL的详细日志,以获取更多错误信息。可以通过设置环境变量来开启日志: ```bash export ASCEND_GLOBAL_LOG_LEVEL=3 # 0:debug, 1:info, 2:warning, 3:error export ASCEND_SLOG_PRINT_TO_STDOUT=1 ``` ### 解决方案步骤: 1. **确认精度模式设置值**: 在vLLM的代码中,查找设置`ACL_PRECISION_MODE`的地方。通常在初始化NPU设备时设置。检查该值是否合法。 例如,在vLLM的源码中,可能有一个类似这样的调用: ```cpp AclSetCompileopt(ACL_PRECISION_MODE, "allow_fp32_to_fp16"); ``` 确保这个字符串是当前ACL版本支持的。 2. **更新ACL和vLLM**: 确保使用最新版本的ACL驱动和vLLM。vLLM可能已经修复了与最新ACL版本的兼容性问题。 3. **修改vLLM源码(临时)**: 如果确认是精度模式设置值的问题,可以尝试修改vLLM源码,将精度模式设置为另一个可能的值(如"must_keep_origin_dtype")来测试。 4. **联系华为Ascend支持**: 如果以上步骤无法解决,可能是ACL底层的bug。需要联系华为技术支持,并提供详细的日志和复现步骤。 ### 具体操作示例(假设在vLLM源码中修改): 1. 找到设置精度模式的代码位置。在vLLM中,可能位于`vllm/npu/`目录下的某个文件(如`init.cpp`或类似)。 2. 修改设置精度模式的代码行,例如: ```cpp // 将原来的值替换为另一个合法值 AclSetCompileopt(ACL_PRECISION_MODE, "must_keep_origin_dtype"); ``` 3. 重新编译并安装vLLM,然后重新运行服务。 ### 验证: 运行服务后,观察是否还会出现同样的错误。如果错误消失,说明是精度模式设置值的问题。如果仍然出现错误,则需要进一步查看日志。 ### 注意事项: - 修改精度模式可能会影响模型的精度和性能,需要确保修改后的精度模式满足模型的要求。 - 如果模型对精度敏感(如某些科学计算或高精度要求的模型),应避免使用降低精度的模式。 ### 引用说明: 引用[1]提供了具体的错误信息,引用[2]则展示了正常的初始化流程。根据这些信息,我们推断问题出在设置精度模式这一步骤上。因此,我们的解决方案围绕如何正确设置精度模式展开。 §§相关问题§§ 1. 如何查看当前ACL支持的编译选项? 2. vLLM在Ascend NPU上的安装和配置步骤是什么? 3. 如何调试ACL程序并获取详细错误信息? 4. 在Ascend NPU上运行大模型时,如何平衡性能和精度?</think>### 解决 vLLM 启动时 NPU 设备初始化失败问题(错误代码 500001) #### 问题分析 错误信息 `ACL_PRECISION_MODE set failed (error 500001)` 表明在初始化 Ascend NPU 设备时,尝试设置计算精度模式失败。该错误通常与以下原因相关: 1. **精度模式不兼容**:设置的精度模式(如 `fp16`/`bf16`)与硬件/驱动不兼容[^1]。 2. **驱动或固件版本问题**:NPU 驱动、CANN 工具包或固件版本与 vLLM 要求不匹配。 3. **环境配置错误**:缺少必要的环境变量或权限。 4. **资源冲突**:NPU 设备已被其他进程占用或未正确释放[^2]。 --- ### 解决方案 #### 步骤 1:验证精度模式兼容性 检查当前设置的精度模式是否被硬件支持: ```bash # 查看 NPU 支持的精度模式 npu-smi info -t board -i 0 -c 0 ``` - 若输出包含 `Precision Support List`,确认 `ACL_PRECISION_MODE` 的值(如 `allow_fp32_to_fp16`)在支持列表中。 - **临时解决方案**:在启动命令中显式指定兼容模式: ```bash export ACL_PRECISION_MODE="must_keep_origin_dtype" # 强制保持原精度 python -m vllm.entrypoints.api_server --model your_model --tensor-parallel-size 1 ``` #### 步骤 2:更新驱动和工具包 确保环境符合 vLLM 的版本要求: 1. **升级 NPU 驱动**: ```bash # 下载最新驱动(示例) wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/Ascend-hdk-910-npu-driver_6.0.0_linux-x86_64.run ./Ascend-hdk-910-npu-driver_6.0.0_linux-x86_64.run --full ``` 2. **更新 CANN 工具包**(建议 >= 7.0.RC1): ```bash pip install -U cann-toolkit ``` #### 步骤 3:检查设备资源状态 释放被占用的 NPU 资源: ```bash # 查看 NPU 设备状态 npu-smi info # 强制释放所有设备(需 root) npu-smi reset -d 0 -c 0 # -d 设备ID, -c 芯片ID ``` #### 步骤 4:配置环境变量 添加必要的环境配置到 `~/.bashrc`: ```bash # 确保 ACL 初始化路径正确 export ASCEND_HOME=/usr/local/Ascend export LD_LIBRARY_PATH=$ASCEND_HOME/fwkacllib/lib64:$LD_LIBRARY_PATH # 启用详细日志 export ASCEND_GLOBAL_LOG_LEVEL=1 # 0=DEBUG, 1=INFO export ASCEND_SLOG_PRINT_TO_STDOUT=1 ``` 生效配置:`source ~/.bashrc` #### 步骤 5:验证 ACL 功能 运行基础测试脚本检查 ACL 是否正常: ```python import acl ret = acl.init() print("ACL Init Status:", ret) # 预期输出 0 acl.finalize() ``` 若返回非 `0`,重新安装 CANN 工具包。 --- ### 故障排除流程图 ```mermaid graph TD A[错误 500001] --> B{检查精度模式} B -->|模式无效| C[修改 ACL_PRECISION_MODE] B -->|模式有效| D{检查驱动版本} D -->|版本旧| E[更新 NPU 驱动/CANN] D -->|版本新| F{检查设备占用} F -->|被占用| G[npu-smi reset 释放] F -->|空闲| H[检查环境变量] H -->|配置错误| I[修复 .bashrc] H -->|配置正确| J[运行 ACL 测试] J -->|失败| K[重装 CANN 工具包] ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值