tensorflow-gpu error:CUDNN_STATUS_ALLOC_FAILED或者self._traceback = tf_stack.extract_stack()

部署运行你感兴趣的模型镜像

在有些情况下,因为深度学习框架版本更新,细节的变动会使我们的代码最初对应修改:

报错信息(出现其中一种):

1.Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
2.self._traceback = tf_stack.extract_stack()

若文本导入tensorflow:则在红色部分后添加以下三行信息:

import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)

若文本导入keras:则在红色部分后添加以下一行信息:
from keras import backend as K
K.set_session(sess)

若二者都已导入,可直接如下:

import tensorflow as tf
from keras import backend as K

config = tf.ConfigProto()
config.gpu_options.allow_growth=True
sess = tf.Session(config=config)
K.set_session(sess)

您可能感兴趣的与本文相关的镜像

TensorFlow-v2.15

TensorFlow-v2.15

TensorFlow

TensorFlow 是由Google Brain 团队开发的开源机器学习框架,广泛应用于深度学习研究和生产环境。 它提供了一个灵活的平台,用于构建和训练各种机器学习模型

# ComfyUI Error Report ## Error Details - **Node Type:** Mix Color By Mask - **Exception Type:** RuntimeError - **Exception Message:** [enforce fail at alloc_cpu.cpp:114] data. DefaultCPUAllocator: not enough memory: you tried to allocate 144019814400 bytes. ## Stack Trace ``` File "F:\comfyui v1.4\ComfyUI-aki-v1.4\execution.py", line 317, in execute output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) File "F:\comfyui v1.4\ComfyUI-aki-v1.4\execution.py", line 192, in get_output_data return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) File "F:\comfyui v1.4\ComfyUI-aki-v1.4\execution.py", line 169, in _map_node_over_list process_inputs(input_dict, i) File "F:\comfyui v1.4\ComfyUI-aki-v1.4\execution.py", line 158, in process_inputs results.append(getattr(obj, func)(**inputs)) File "F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\masquerade-nodes-comfyui-main\MaskNodes.py", line 509, in mix mask = tensor2batch(tensor2mask(mask), image.size()) File "F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\masquerade-nodes-comfyui-main\MaskNodes.py", line 76, in tensor2batch t.repeat(bs[0], 1, 1, 1) ``` ## System Information - **ComfyUI Version:** v0.1.3-22-gd4aeefc2 - **Arguments:** F:\comfyui v1.4\ComfyUI-aki-v1.4\main.py --auto-launch --preview-method auto --disable-cuda-malloc - **OS:** nt - **Python Version:** 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] - **Embedded Python:** false - **PyTorch Version:** 2.3.1+cu121 ## Devices - **Name:** cuda:0 NVIDIA GeForce RTX 3070 : cudaMallocAsync - **Type:** cuda - **VRAM Total:** 8589279232 - **VRAM Free:** 3596957476 - **Torch VRAM Total:** 3791650816 - **Torch VRAM Free:** 10827556 ## Logs ``` 2025-10-23 09:36:37,550 - root - INFO - Total VRAM 8191 MB, total RAM 65457 MB 2025-10-23 09:36:37,550 - root - INFO - pytorch version: 2.3.1+cu121 2025-10-23 09:36:42,177 - root - INFO - xformers version: 0.0.27 2025-10-23 09:36:42,177 - root - INFO - Set vram state to: NORMAL_VRAM 2025-10-23 09:36:42,177 - root - INFO - Device: cuda:0 NVIDIA GeForce RTX 3070 : cudaMallocAsync 2025-10-23 09:36:42,627 - root - INFO - Using xformers cross attention 2025-10-23 09:36:44,974 - root - INFO - [Prompt Server] web root: F:\comfyui v1.4\ComfyUI-aki-v1.4\web 2025-10-23 09:36:47,303 - root - WARNING - Traceback (most recent call last): File "F:\comfyui v1.4\ComfyUI-aki-v1.4\nodes.py", line 1993, in load_custom_node module_spec.loader.exec_module(module) File "<frozen importlib._bootstrap_external>", line 879, in exec_module File "<frozen importlib._bootstrap_external>", line 1016, in get_code File "<frozen importlib._bootstrap_external>", line 1073, in get_data FileNotFoundError: [Errno 2] No such file or directory: 'F:\\comfyui v1.4\\ComfyUI-aki-v1.4\\custom_nodes\\ComfyUI-Frame-Interpolation\\__init__.py' 2025-10-23 09:36:47,303 - root - WARNING - Cannot import F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\ComfyUI-Frame-Interpolation module for custom nodes: [Errno 2] No such file or directory: 'F:\\comfyui v1.4\\ComfyUI-aki-v1.4\\custom_nodes\\ComfyUI-Frame-Interpolation\\__init__.py' 2025-10-23 09:36:51,657 - root - WARNING - Traceback (most recent call last): File "F:\comfyui v1.4\ComfyUI-aki-v1.4\nodes.py", line 1993, in load_custom_node module_spec.loader.exec_module(module) File "<frozen importlib._bootstrap_external>", line 879, in exec_module File "<frozen importlib._bootstrap_external>", line 1016, in get_code File "<frozen importlib._bootstrap_external>", line 1073, in get_data FileNotFoundError: [Errno 2] No such file or directory: 'F:\\comfyui v1.4\\ComfyUI-aki-v1.4\\custom_nodes\\ComfyUI-JakeUpgrade\\__init__.py' 2025-10-23 09:36:51,657 - root - WARNING - Cannot import F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\ComfyUI-JakeUpgrade module for custom nodes: [Errno 2] No such file or directory: 'F:\\comfyui v1.4\\ComfyUI-aki-v1.4\\custom_nodes\\ComfyUI-JakeUpgrade\\__init__.py' 2025-10-23 09:36:51,687 - root - INFO - Total VRAM 8191 MB, total RAM 65457 MB 2025-10-23 09:36:51,687 - root - INFO - pytorch version: 2.3.1+cu121 2025-10-23 09:36:51,687 - root - INFO - xformers version: 0.0.27 2025-10-23 09:36:51,687 - root - INFO - Set vram state to: NORMAL_VRAM 2025-10-23 09:36:51,687 - root - INFO - Device: cuda:0 NVIDIA GeForce RTX 3070 : cudaMallocAsync 2025-10-23 09:37:03,093 - root - WARNING - Traceback (most recent call last): File "F:\comfyui v1.4\ComfyUI-aki-v1.4\nodes.py", line 1993, in load_custom_node module_spec.loader.exec_module(module) File "<frozen importlib._bootstrap_external>", line 879, in exec_module File "<frozen importlib._bootstrap_external>", line 1016, in get_code File "<frozen importlib._bootstrap_external>", line 1073, in get_data FileNotFoundError: [Errno 2] No such file or directory: 'F:\\comfyui v1.4\\ComfyUI-aki-v1.4\\custom_nodes\\masquerade-nodes-comfyui\\__init__.py' 2025-10-23 09:37:03,093 - root - WARNING - Cannot import F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\masquerade-nodes-comfyui module for custom nodes: [Errno 2] No such file or directory: 'F:\\comfyui v1.4\\ComfyUI-aki-v1.4\\custom_nodes\\masquerade-nodes-comfyui\\__init__.py' 2025-10-23 09:37:03,166 - root - INFO - Import times for custom nodes: 2025-10-23 09:37:03,166 - root - INFO - 0.0 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\AIGODLIKE-ComfyUI-Translation 2025-10-23 09:37:03,166 - root - INFO - 0.0 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\websocket_image_save.py 2025-10-23 09:37:03,166 - root - INFO - 0.0 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\ControlNet-LLLite-ComfyUI 2025-10-23 09:37:03,166 - root - INFO - 0.0 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\FreeU_Advanced 2025-10-23 09:37:03,166 - root - INFO - 0.0 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\ComfyUI_TiledKSampler 2025-10-23 09:37:03,166 - root - INFO - 0.0 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\stability-ComfyUI-nodes 2025-10-23 09:37:03,166 - root - INFO - 0.0 seconds (IMPORT FAILED): F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\ComfyUI-JakeUpgrade 2025-10-23 09:37:03,166 - root - INFO - 0.0 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\ComfyUI-WD14-Tagger 2025-10-23 09:37:03,166 - root - INFO - 0.0 seconds (IMPORT FAILED): F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\masquerade-nodes-comfyui 2025-10-23 09:37:03,166 - root - INFO - 0.0 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\masquerade-nodes-comfyui-main 2025-10-23 09:37:03,166 - root - INFO - 0.0 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\ComfyUI_experiments 2025-10-23 09:37:03,166 - root - INFO - 0.0 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\PowerNoiseSuite 2025-10-23 09:37:03,167 - root - INFO - 0.0 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\images-grid-comfy-plugin 2025-10-23 09:37:03,167 - root - INFO - 0.0 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\ComfyUI_UltimateSDUpscale 2025-10-23 09:37:03,167 - root - INFO - 0.0 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\ComfyUI-Custom-Scripts 2025-10-23 09:37:03,167 - root - INFO - 0.0 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\Derfuu_ComfyUI_ModdedNodes 2025-10-23 09:37:03,167 - root - INFO - 0.0 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\ComfyUI-Advanced-ControlNet 2025-10-23 09:37:03,167 - root - INFO - 0.0 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\ComfyUI-Frame-Interpolation-main 2025-10-23 09:37:03,167 - root - INFO - 0.0 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\efficiency-nodes-comfyui 2025-10-23 09:37:03,167 - root - INFO - 0.1 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\rgthree-comfy 2025-10-23 09:37:03,167 - root - INFO - 0.1 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\ComfyUI_IPAdapter_plus 2025-10-23 09:37:03,167 - root - INFO - 0.1 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\ComfyUI-KJNodes 2025-10-23 09:37:03,167 - root - INFO - 0.1 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\ComfyUI-AnimateDiff-Evolved 2025-10-23 09:37:03,167 - root - INFO - 0.1 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\ComfyUI_Comfyroll_CustomNodes 2025-10-23 09:37:03,167 - root - INFO - 0.1 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\ComfyUI-Inspire-Pack 2025-10-23 09:37:03,167 - root - INFO - 0.1 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\comfyui_controlnet_aux 2025-10-23 09:37:03,167 - root - INFO - 0.1 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\comfyui-workspace-manager 2025-10-23 09:37:03,167 - root - INFO - 0.1 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\ComfyUI-Crystools 2025-10-23 09:37:03,167 - root - INFO - 0.1 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\ComfyUI-VideoHelperSuite 2025-10-23 09:37:03,168 - root - INFO - 0.3 seconds (IMPORT FAILED): F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\ComfyUI-Frame-Interpolation 2025-10-23 09:37:03,168 - root - INFO - 0.4 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\ComfyUI-Manager 2025-10-23 09:37:03,168 - root - INFO - 0.5 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\ComfyUI-Marigold 2025-10-23 09:37:03,168 - root - INFO - 0.5 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\comfyui_segment_anything 2025-10-23 09:37:03,168 - root - INFO - 1.4 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\ComfyUI_FizzNodes 2025-10-23 09:37:03,168 - root - INFO - 4.2 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\ComfyUI-Impact-Pack 2025-10-23 09:37:03,168 - root - INFO - 7.9 seconds: F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\ComfyUI_Custom_Nodes_AlekPet 2025-10-23 09:37:03,168 - root - INFO - 2025-10-23 09:37:03,189 - root - INFO - Starting server 2025-10-23 09:37:03,189 - root - INFO - To see the GUI go to: http://127.0.0.1:8188 2025-10-23 09:48:57,444 - root - INFO - got prompt 2025-10-23 09:48:57,639 - root - INFO - model weight dtype torch.float16, manual cast: None 2025-10-23 09:48:57,754 - root - INFO - model_type EPS 2025-10-23 09:48:58,270 - root - INFO - Using xformers attention in VAE 2025-10-23 09:48:58,272 - root - INFO - Using xformers attention in VAE 2025-10-23 09:48:58,906 - root - INFO - Requested to load SD1ClipModel 2025-10-23 09:48:58,906 - root - INFO - Loading 1 new model 2025-10-23 09:48:58,972 - root - INFO - loaded completely 0.0 235.84423828125 True 2025-10-23 09:48:59,356 - comfyui_segment_anything - WARNING - using extra model: F:\comfyui v1.4\ComfyUI-aki-v1.4\models\sams\sam_vit_h_4b8939.pth 2025-10-23 10:05:36,264 - root - ERROR - !!! Exception during processing !!! The shape of the mask [242, 960, 540, 3] at index 0 does not match the shape of the indexed tensor [287, 960, 540, 3] at index 0 2025-10-23 10:05:36,274 - root - ERROR - Traceback (most recent call last): File "F:\comfyui v1.4\ComfyUI-aki-v1.4\execution.py", line 317, in execute output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) File "F:\comfyui v1.4\ComfyUI-aki-v1.4\execution.py", line 192, in get_output_data return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) File "F:\comfyui v1.4\ComfyUI-aki-v1.4\execution.py", line 169, in _map_node_over_list process_inputs(input_dict, i) File "F:\comfyui v1.4\ComfyUI-aki-v1.4\execution.py", line 158, in process_inputs results.append(getattr(obj, func)(**inputs)) File "F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\comfyui_controlnet_aux\node_wrappers\inpaint.py", line 19, in preprocess image[mask > 0.5] = -1.0 # set as masked pixel IndexError: The shape of the mask [242, 960, 540, 3] at index 0 does not match the shape of the indexed tensor [287, 960, 540, 3] at index 0 2025-10-23 10:05:36,276 - root - INFO - Prompt executed in 998.77 seconds 2025-10-23 11:41:16,176 - root - INFO - got prompt 2025-10-23 11:41:16,539 - root - INFO - Requested to load SD1ClipModel 2025-10-23 11:41:16,539 - root - INFO - Loading 1 new model 2025-10-23 11:41:19,879 - root - ERROR - !!! Exception during processing !!! [enforce fail at alloc_cpu.cpp:114] data. DefaultCPUAllocator: not enough memory: you tried to allocate 144019814400 bytes. 2025-10-23 11:41:19,912 - root - ERROR - Traceback (most recent call last): File "F:\comfyui v1.4\ComfyUI-aki-v1.4\execution.py", line 317, in execute output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) File "F:\comfyui v1.4\ComfyUI-aki-v1.4\execution.py", line 192, in get_output_data return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) File "F:\comfyui v1.4\ComfyUI-aki-v1.4\execution.py", line 169, in _map_node_over_list process_inputs(input_dict, i) File "F:\comfyui v1.4\ComfyUI-aki-v1.4\execution.py", line 158, in process_inputs results.append(getattr(obj, func)(**inputs)) File "F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\masquerade-nodes-comfyui-main\MaskNodes.py", line 509, in mix mask = tensor2batch(tensor2mask(mask), image.size()) File "F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\masquerade-nodes-comfyui-main\MaskNodes.py", line 76, in tensor2batch t.repeat(bs[0], 1, 1, 1) RuntimeError: [enforce fail at alloc_cpu.cpp:114] data. DefaultCPUAllocator: not enough memory: you tried to allocate 144019814400 bytes. 2025-10-23 11:41:19,913 - root - INFO - Prompt executed in 3.60 seconds 2025-10-23 11:44:36,321 - root - INFO - got prompt 2025-10-23 11:44:36,664 - root - INFO - Requested to load SD1ClipModel 2025-10-23 11:44:36,664 - root - INFO - Loading 1 new model 2025-10-23 11:44:38,174 - root - ERROR - !!! Exception during processing !!! [enforce fail at alloc_cpu.cpp:114] data. DefaultCPUAllocator: not enough memory: you tried to allocate 144019814400 bytes. 2025-10-23 11:44:38,175 - root - ERROR - Traceback (most recent call last): File "F:\comfyui v1.4\ComfyUI-aki-v1.4\execution.py", line 317, in execute output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) File "F:\comfyui v1.4\ComfyUI-aki-v1.4\execution.py", line 192, in get_output_data return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) File "F:\comfyui v1.4\ComfyUI-aki-v1.4\execution.py", line 169, in _map_node_over_list process_inputs(input_dict, i) File "F:\comfyui v1.4\ComfyUI-aki-v1.4\execution.py", line 158, in process_inputs results.append(getattr(obj, func)(**inputs)) File "F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\masquerade-nodes-comfyui-main\MaskNodes.py", line 509, in mix mask = tensor2batch(tensor2mask(mask), image.size()) File "F:\comfyui v1.4\ComfyUI-aki-v1.4\custom_nodes\masquerade-nodes-comfyui-main\MaskNodes.py", line 76, in tensor2batch t.repeat(bs[0], 1, 1, 1) RuntimeError: [enforce fail at alloc_cpu.cpp:114] data. DefaultCPUAllocator: not enough memory: you tried to allocate 144019814400 bytes. 2025-10-23 11:44:38,176 - root - INFO - Prompt executed in 1.79 seconds ``` ## Attached Workflow Please make sure that workflow does not contain any sensitive information such as API keys or passwords. ``` Workflow too large. Please manually upload the workflow from local file system. ``` ## Additional Context (Please add any additional context or steps to reproduce the error here)
最新发布
10-24
INFO 07-06 22:09:23 __init__.py:207] Automatically detected platform cuda. (VllmWorkerProcess pid=10542) INFO 07-06 22:09:23 multiproc_worker_utils.py:229] Worker ready; awaiting tasks (VllmWorkerProcess pid=10542) INFO 07-06 22:09:24 cuda.py:178] Cannot use FlashAttention-2 backend for Volta and Turing GPUs. (VllmWorkerProcess pid=10542) INFO 07-06 22:09:24 cuda.py:226] Using XFormers backend. (VllmWorkerProcess pid=10542) INFO 07-06 22:09:25 utils.py:916] Found nccl from library libnccl.so.2 (VllmWorkerProcess pid=10542) INFO 07-06 22:09:25 pynccl.py:69] vLLM is using nccl==2.21.5 INFO 07-06 22:09:25 utils.py:916] Found nccl from library libnccl.so.2 INFO 07-06 22:09:25 pynccl.py:69] vLLM is using nccl==2.21.5 INFO 07-06 22:09:25 custom_all_reduce_utils.py:206] generating GPU P2P access cache in /root/.cache/vllm/gpu_p2p_access_cache_for_0,1.json INFO 07-06 22:09:39 custom_all_reduce_utils.py:244] reading GPU P2P access cache from /root/.cache/vllm/gpu_p2p_access_cache_for_0,1.json (VllmWorkerProcess pid=10542) INFO 07-06 22:09:39 custom_all_reduce_utils.py:244] reading GPU P2P access cache from /root/.cache/vllm/gpu_p2p_access_cache_for_0,1.json WARNING 07-06 22:09:39 custom_all_reduce.py:145] Custom allreduce is disabled because your platform lacks GPU P2P capability or P2P test failed. To silence this warning, specify disable_custom_all_reduce=True explicitly. (VllmWorkerProcess pid=10542) WARNING 07-06 22:09:39 custom_all_reduce.py:145] Custom allreduce is disabled because your platform lacks GPU P2P capability or P2P test failed. To silence this warning, specify disable_custom_all_reduce=True explicitly. INFO 07-06 22:09:39 shm_broadcast.py:258] vLLM message queue communication handle: Handle(connect_ip='127.0.0.1', local_reader_ranks=[1], buffer_handle=(1, 4194304, 6, 'psm_b99a0acb'), local_subscribe_port=57279, remote_subscribe_port=None) INFO 07-06 22:09:39 model_runner.py:1110] Starting to load model /home/user/Desktop/DeepSeek-R1-Distill-Qwen-32B... (VllmWorkerProcess pid=10542) INFO 07-06 22:09:39 model_runner.py:1110] Starting to load model /home/user/Desktop/DeepSeek-R1-Distill-Qwen-32B... ERROR 07-06 22:09:39 engine.py:400] CUDA out of memory. Tried to allocate 136.00 MiB. GPU 0 has a total capacity of 21.49 GiB of which 43.50 MiB is free. Including non-PyTorch memory, this process has 21.44 GiB memory in use. Of the allocated memory 21.19 GiB is allocated by PyTorch, and 20.25 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) ERROR 07-06 22:09:39 engine.py:400] Traceback (most recent call last): ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 391, in run_mp_engine ERROR 07-06 22:09:39 engine.py:400] engine = MQLLMEngine.from_engine_args(engine_args=engine_args, ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 124, in from_engine_args ERROR 07-06 22:09:39 engine.py:400] return cls(ipc_path=ipc_path, ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 76, in __init__ ERROR 07-06 22:09:39 engine.py:400] self.engine = LLMEngine(*args, **kwargs) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 273, in __init__ ERROR 07-06 22:09:39 engine.py:400] self.model_executor = executor_class(vllm_config=vllm_config, ) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 271, in __init__ ERROR 07-06 22:09:39 engine.py:400] super().__init__(*args, **kwargs) ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 52, in __init__ ERROR 07-06 22:09:39 engine.py:400] self._init_executor() ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/mp_distributed_executor.py", line 125, in _init_executor ERROR 07-06 22:09:39 engine.py:400] self._run_workers("load_model", ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers ERROR 07-06 22:09:39 engine.py:400] driver_worker_output = run_method(self.driver_worker, sent_method, ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/utils.py", line 2196, in run_method ERROR 07-06 22:09:39 engine.py:400] return func(*args, **kwargs) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/worker/worker.py", line 183, in load_model ERROR 07-06 22:09:39 engine.py:400] self.model_runner.load_model() ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1112, in load_model ERROR 07-06 22:09:39 engine.py:400] self.model = get_model(vllm_config=self.vllm_config) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model ERROR 07-06 22:09:39 engine.py:400] return loader.load_model(vllm_config=vllm_config) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 406, in load_model ERROR 07-06 22:09:39 engine.py:400] model = _initialize_model(vllm_config=vllm_config) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 125, in _initialize_model ERROR 07-06 22:09:39 engine.py:400] return model_class(vllm_config=vllm_config, prefix=prefix) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 453, in __init__ ERROR 07-06 22:09:39 engine.py:400] self.model = Qwen2Model(vllm_config=vllm_config, ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/compilation/decorators.py", line 151, in __init__ ERROR 07-06 22:09:39 engine.py:400] old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs) ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 307, in __init__ ERROR 07-06 22:09:39 engine.py:400] self.start_layer, self.end_layer, self.layers = make_layers( ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 557, in make_layers ERROR 07-06 22:09:39 engine.py:400] [PPMissingLayer() for _ in range(start_layer)] + [ ERROR 07-06 22:09:39 engine.py:400] ^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 558, in <listcomp> ERROR 07-06 22:09:39 engine.py:400] maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}")) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 309, in <lambda> ERROR 07-06 22:09:39 engine.py:400] lambda prefix: Qwen2DecoderLayer(config=config, ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 220, in __init__ ERROR 07-06 22:09:39 engine.py:400] self.mlp = Qwen2MLP( ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 82, in __init__ ERROR 07-06 22:09:39 engine.py:400] self.down_proj = RowParallelLinear( ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 1062, in __init__ ERROR 07-06 22:09:39 engine.py:400] self.quant_method.create_weights( ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 129, in create_weights ERROR 07-06 22:09:39 engine.py:400] weight = Parameter(torch.empty(sum(output_partition_sizes), ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/torch/utils/_device.py", line 106, in __torch_function__ ERROR 07-06 22:09:39 engine.py:400] return func(*args, **kwargs) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 136.00 MiB. GPU 0 has a total capacity of 21.49 GiB of which 43.50 MiB is free. Including non-PyTorch memory, this process has 21.44 GiB memory in use. Of the allocated memory 21.19 GiB is allocated by PyTorch, and 20.25 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) Process SpawnProcess-1: INFO 07-06 22:09:39 multiproc_worker_utils.py:128] Killing local vLLM worker processes Traceback (most recent call last): File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 402, in run_mp_engine raise e File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 391, in run_mp_engine engine = MQLLMEngine.from_engine_args(engine_args=engine_args, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 124, in from_engine_args return cls(ipc_path=ipc_path, ^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 76, in __init__ self.engine = LLMEngine(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 273, in __init__ self.model_executor = executor_class(vllm_config=vllm_config, ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 271, in __init__ super().__init__(*args, **kwargs) File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 52, in __init__ self._init_executor() File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/mp_distributed_executor.py", line 125, in _init_executor self._run_workers("load_model", File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers driver_worker_output = run_method(self.driver_worker, sent_method, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/utils.py", line 2196, in run_method return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/worker/worker.py", line 183, in load_model self.model_runner.load_model() File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1112, in load_model self.model = get_model(vllm_config=self.vllm_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model return loader.load_model(vllm_config=vllm_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 406, in load_model model = _initialize_model(vllm_config=vllm_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 125, in _initialize_model return model_class(vllm_config=vllm_config, prefix=prefix) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 453, in __init__ self.model = Qwen2Model(vllm_config=vllm_config, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/compilation/decorators.py", line 151, in __init__ old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs) File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 307, in __init__ self.start_layer, self.end_layer, self.layers = make_layers( ^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 557, in make_layers [PPMissingLayer() for _ in range(start_layer)] + [ ^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 558, in <listcomp> maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}")) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 309, in <lambda> lambda prefix: Qwen2DecoderLayer(config=config, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 220, in __init__ self.mlp = Qwen2MLP( ^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 82, in __init__ self.down_proj = RowParallelLinear( ^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 1062, in __init__ self.quant_method.create_weights( File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 129, in create_weights weight = Parameter(torch.empty(sum(output_partition_sizes), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/torch/utils/_device.py", line 106, in __torch_function__ return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 136.00 MiB. GPU 0 has a total capacity of 21.49 GiB of which 43.50 MiB is free. Including non-PyTorch memory, this process has 21.44 GiB memory in use. Of the allocated memory 21.19 GiB is allocated by PyTorch, and 20.25 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) [rank0]:[W706 22:09:40.345098611 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator()) Traceback (most recent call last): File "/root/miniconda3/envs/deepseek_vllm/bin/vllm", line 8, in <module> sys.exit(main()) ^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 73, in main args.dispatch_function(args) File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 34, in cmd uvloop.run(run_server(args)) File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/uvloop/__init__.py", line 105, in run return runner.run(wrapper()) ^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/uvloop/__init__.py", line 61, in wrapper return await main ^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 947, in run_server async with build_async_engine_client(args) as engine_client: File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 139, in build_async_engine_client async with build_async_engine_client_from_engine_args( File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 233, in build_async_engine_client_from_engine_args raise RuntimeError( RuntimeError: Engine process failed to start. See stack trace for the root cause. (deepseek_vllm) root@user-X99:/home/user/Desktop# /root/miniconda3/envs/deepseek_vllm/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' (deepseek_vllm) root@user-X99:/home/user/Desktop#
07-07
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值