3.10_mlp-pytorch

本文介绍了如何使用PyTorch库实现一个多层感知机,包括定义模型(增加隐藏层和ReLU激活),读取Fashion-MNIST数据并训练。展示了从数据加载到训练过程的详细步骤和最终的准确率结果。

3.10 多层感知机的简洁实现

下面我们使用PyTorch来实现上一节中的多层感知机。首先导入所需的包或模块。

import torch
from torch import nn
from torch.nn import init
import numpy as np
import sys
sys.path.append("..") 
import d2lzh_pytorch as d2l

3.10.1 定义模型

和softmax回归唯一的不同在于,我们多加了一个全连接层作为隐藏层。它的隐藏单元个数为256,并使用ReLU函数作为激活函数。

num_inputs, num_outputs, num_hiddens = 784, 10, 256
    
net = nn.Sequential(
        d2l.FlattenLayer(),
        nn.Linear(num_inputs, num_hiddens),
        nn.ReLU(),
        nn.Linear(num_hiddens, num_outputs), 
        )

for params in net.parameters():
    init.normal_(params, mean=0, std=0.01)

3.10.2 读取数据并训练模型

我们使用与3.7节中训练softmax回归几乎相同的步骤来读取数据并训练模型。

注:由于这里使用的是PyTorch的SGD而不是d2lzh_pytorch里面的sgd,所以就不存在3.9节那样学习率看起来很大的问题了。

batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
loss = torch.nn.CrossEntropyLoss()

optimizer = torch.optim.SGD(net.parameters(), lr=0.5)

num_epochs = 5
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, None, None, optimizer)

输出:

epoch 1, loss 0.0030, train acc 0.712, test acc 0.744
epoch 2, loss 0.0019, train acc 0.823, test acc 0.821
epoch 3, loss 0.0017, train acc 0.844, test acc 0.842
epoch 4, loss 0.0015, train acc 0.856, test acc 0.842
epoch 5, loss 0.0014, train acc 0.864, test acc 0.818

小结

  • 通过PyTorch可以更简洁地实现多层感知机。

注:本节除了代码之外与原书基本相同,原书传送门

/home/yxx/miniconda3/envs/simpler_ori/lib/python3.10/site-packages/huggingface_hub/file_download.py:943: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /home/yxx/miniconda3/envs/simpler_ori/lib/python3.10/site-packages/huggingface_hub/file_download.py:943: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to (&#39;cuda&#39;)`. Traceback (most recent call last): File "/home/yxx/miniconda3/envs/simpler_ori/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1510, in _get_module return importlib.import_module("." + module_name, self.__name__) File "/home/yxx/miniconda3/envs/simpler_ori/lib/python3.10/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1050, in _gcd_import File "<frozen importlib._bootstrap>", line 1027, in _find_and_load File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 688, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 883, in exec_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "/home/yxx/miniconda3/envs/simpler_ori/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 55, in <module> from flash_attn import flash_attn_func, flash_attn_varlen_func File "/home/yxx/miniconda3/envs/simpler_ori/lib/python3.10/site-packages/flash_attn/__init__.py", line 3, in <module> from flash_attn.flash_attn_interface import ( File "/home/yxx/miniconda3/envs/simpler_ori/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 10, in <module> import flash_attn_2_cuda as flash_attn_cuda ImportError: /home/yxx/miniconda3/envs/simpler_ori/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda 9SetDeviceEi怎么解决
07-01
``` Traceback (most recent call last): File "/models/z50051264/bitsandbytes-main/examples/inference/nf4_save.py", line 65, in <module> generated_ids = model.generate( File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/transformers/generation/utils.py", line 2597, in generate result = self._sample( File "/usr/local/python3.10.17/lib/python3.10/site-packages/transformers/generation/utils.py", line 3560, in _sample outputs = model_forward(**model_inputs, return_dict=True) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/checkpoints/modeling_Pangu.py", line 734, in forward outputs = self.model( File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/checkpoints/modeling_Pangu.py", line 621, in forward layer_outputs = decoder_layer( File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/checkpoints/modeling_Pangu.py", line 343, in forward hidden_states = self.mlp(hidden_states) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/checkpoints/modeling_Pangu.py", line 167, in forward return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x)) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/models/z50051264/bitsandbytes-main/bitsandbytes/nn/modules.py", line 525, in forward return bnb.matmul_4bit(x, weight, bias=bias, quant_state=self.weight.quant_state).to(inp_dtype) File "/models/z50051264/bitsandbytes-main/bitsandbytes/autograd/_functions.py", line 461, in matmul_4bit out = F.gemv_4bit(A, B.t(), out, state=quant_state) File "/models/z50051264/bitsandbytes-main/bitsandbytes/functional.py", line 1666, in gemv_4bit return torch.ops.bitsandbytes.gemv_4bit.default( File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_ops.py", line 716, in __call__ return self._op(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_compile.py", line 32, in inner return disable_fn(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn return fn(*args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/library.py", line 566, in func_no_dynamo return func(*args, **kwargs) File "/models/z50051264/bitsandbytes-main/bitsandbytes/backends/default/ops.py", line 297, in _ B_dq = torch.ops.bitsandbytes.dequantize_4bit.default(B, absmax, blocksize, quant_type, shapeB, A.dtype) File "/models/z50051264/bitsandbytes-main/bitsandbytes/backends/default/ops.py", line 297, in _ B_dq = torch.ops.bitsandbytes.dequantize_4bit.default(B, absmax, blocksize, quant_type, shapeB, A.dtype) File "/usr/local/python3.10.17/lib/python3.10/bdb.py", line 90, in trace_dispatch return self.dispatch_line(frame) File "/usr/local/python3.10.17/lib/python3.10/bdb.py", line 115, in dispatch_line if self.quitting: raise BdbQuit bdb.BdbQuit Traceback (most recent call last): File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/managers.py", line 810, in _callmethod conn = self._tls.connection AttributeError: &#39;ForkAwareLocal&#39; object has no attribute &#39;connection&#39; During handling of the above exception, another exception occurred: ``` 这是什么导致的?分析问题原因
08-01
(vllm) wen@DESKTOP-3H5GS3M:~$ CUDA_DEVICE_ORDER=PCI_BUS_ID PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True CUDA_VISIBLE_DEVICES=1 vllm serve /home/wen/models/Qwen/Qwen3-Next-80B-A3B-Instruct-AWQ-4bit --host 0.0.0.0 --port 8000 --gpu-memory-utilization 0.8 --swap-space 16 --max-num-seqs 16 INFO 09-28 20:05:32 [__init__.py:216] Automatically detected platform cuda. (APIServer pid=1787797) INFO 09-28 20:05:34 [api_server.py:1896] vLLM API server version 0.10.2 (APIServer pid=1787797) INFO 09-28 20:05:34 [utils.py:328] non-default args: {&#39;model_tag&#39;: &#39;/home/wen/models/Qwen/Qwen3-Next-80B-A3B-Instruct-AWQ-4bit&#39;, &#39;host&#39;: &#39;0.0.0.0&#39;, &#39;model&#39;: &#39;/home/wen/models/Qwen/Qwen3-Next-80B-A3B-Instruct-AWQ-4bit&#39;, &#39;gpu_memory_utilization&#39;: 0.8, &#39;swap_space&#39;: 16.0, &#39;max_num_seqs&#39;: 16} (APIServer pid=1787797) INFO 09-28 20:05:39 [__init__.py:742] Resolved architecture: Qwen3NextForCausalLM (APIServer pid=1787797) `torch_dtype` is deprecated! Use `dtype` instead! (APIServer pid=1787797) INFO 09-28 20:05:39 [__init__.py:1815] Using max model len 262144 (APIServer pid=1787797) WARNING 09-28 20:05:39 [_ipex_ops.py:16] Import error msg: No module named &#39;intel_extension_for_pytorch&#39; (APIServer pid=1787797) INFO 09-28 20:05:39 [scheduler.py:222] Chunked prefill is enabled with max_num_batched_tokens=2048. (APIServer pid=1787797) INFO 09-28 20:05:39 [config.py:310] Hybrid or mamba-based model detected: disabling prefix caching since it is not yet supported. (APIServer pid=1787797) INFO 09-28 20:05:39 [config.py:321] Hybrid or mamba-based model detected: setting cudagraph mode to FULL_AND_PIECEWISE in order to optimize performance. (APIServer pid=1787797) INFO 09-28 20:05:39 [config.py:390] Setting attention block size to 544 tokens to ensure that attention page size is >= mamba page size. (APIServer pid=1787797) INFO 09-28 20:05:39 [config.py:411] Padding mamba page size by 1.49% to ensure that mamba page size and attention page size are exactly equal. (APIServer pid=1787797) WARNING 09-28 20:05:39 [cache.py:214] Possibly too large swap space. 16.00 GiB out of the 31.31 GiB total CPU memory is allocated for the swap space. INFO 09-28 20:05:42 [__init__.py:216] Automatically detected platform cuda. (EngineCore_DP0 pid=1787882) INFO 09-28 20:05:44 [core.py:654] Waiting for init message from front-end. (EngineCore_DP0 pid=1787882) INFO 09-28 20:05:44 [core.py:76] Initializing a V1 LLM engine (v0.10.2) with config: model=&#39;/home/wen/models/Qwen/Qwen3-Next-80B-A3B-Instruct-AWQ-4bit&#39;, speculative_config=None, tokenizer=&#39;/home/wen/models/Qwen/Qwen3-Next-80B-A3B-Instruct-AWQ-4bit&#39;, skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=262144, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=compressed-tensors, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend=&#39;auto&#39;, disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=&#39;&#39;), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=/home/wen/models/Qwen/Qwen3-Next-80B-A3B-Instruct-AWQ-4bit, enable_prefix_caching=False, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,1],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"pass_config":{},"max_capture_size":32,"local_cache_dir":null} (EngineCore_DP0 pid=1787882) WARNING 09-28 20:05:44 [interface.py:391] Using &#39;pin_memory=False&#39; as WSL is detected. This may slow down the performance. [W928 20:05:45.576880075 ProcessGroupNCCL.cpp:981] Warning: TORCH_NCCL_AVOID_RECORD_STREAMS is the default now, this environment variable is thus deprecated. (function operator()) [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 (EngineCore_DP0 pid=1787882) INFO 09-28 20:05:45 [parallel_state.py:1165] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 (EngineCore_DP0 pid=1787882) WARNING 09-28 20:05:45 [topk_topp_sampler.py:69] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer. (EngineCore_DP0 pid=1787882) INFO 09-28 20:05:45 [gpu_model_runner.py:2338] Starting to load model /home/wen/models/Qwen/Qwen3-Next-80B-A3B-Instruct-AWQ-4bit... (EngineCore_DP0 pid=1787882) INFO 09-28 20:05:46 [gpu_model_runner.py:2370] Loading model from scratch... (EngineCore_DP0 pid=1787882) `torch_dtype` is deprecated! Use `dtype` instead! (EngineCore_DP0 pid=1787882) INFO 09-28 20:05:46 [compressed_tensors_moe.py:121] Using CompressedTensorsWNA16MarlinMoEMethod (EngineCore_DP0 pid=1787882) INFO 09-28 20:05:46 [compressed_tensors_wNa16.py:95] Using MarlinLinearKernel for CompressedTensorsWNA16 (EngineCore_DP0 pid=1787882) INFO 09-28 20:05:46 [cuda.py:362] Using Flash Attention backend on V1 engine. (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] EngineCore failed to start. (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] Traceback (most recent call last): (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 709, in run_engine_core (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] engine_core = EngineCoreProc(*args, **kwargs) (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 505, in __init__ (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] super().__init__(vllm_config, executor_class, log_stats, (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 82, in __init__ (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] self.model_executor = executor_class(vllm_config) (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 54, in __init__ (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] self._init_executor() (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 49, in _init_executor (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] self.collective_rpc("load_model") (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] answer = run_method(self.driver_worker, method, args, kwargs) (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/utils/__init__.py", line 3060, in run_method (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] return func(*args, **kwargs) (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 213, in load_model (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] self.model_runner.load_model(eep_scale_up=eep_scale_up) (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2371, in load_model (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] self.model = model_loader.load_model( (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/model_loader/base_loader.py", line 45, in load_model (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] model = initialize_model(vllm_config=vllm_config, (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/model_loader/utils.py", line 64, in initialize_model (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] return model_class(vllm_config=vllm_config, prefix=prefix) (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/models/qwen3_next.py", line 1079, in __init__ (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] self.model = Qwen3NextModel(vllm_config=vllm_config, (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/compilation/decorators.py", line 199, in __init__ (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs) (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/models/qwen3_next.py", line 915, in __init__ (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] self.start_layer, self.end_layer, self.layers = make_layers( (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 642, in make_layers (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] [PPMissingLayer() for _ in range(start_layer)] + [ (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 643, in <listcomp> (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}")) (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/models/qwen3_next.py", line 904, in get_layer (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] return Qwen3NextDecoderLayer( (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/models/qwen3_next.py", line 782, in __init__ (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] self.mlp = Qwen3NextSparseMoeBlock( (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/models/qwen3_next.py", line 115, in __init__ (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] self.experts = FusedMoE(num_experts=self.n_routed_experts, (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/layers/fused_moe/layer.py", line 945, in __init__ (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] self.quant_method.create_weights(layer=self, **moe_quant_params) (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py", line 1167, in create_weights (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] w2_weight = torch.nn.Parameter(torch.empty( (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/utils/_device.py", line 103, in __torch_function__ (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] return func(*args, **kwargs) (EngineCore_DP0 pid=1787882) ERROR 09-28 20:05:47 [core.py:718] RuntimeError: CUDA driver error: out of memory (EngineCore_DP0 pid=1787882) Process EngineCore_DP0: (EngineCore_DP0 pid=1787882) Traceback (most recent call last): (EngineCore_DP0 pid=1787882) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap (EngineCore_DP0 pid=1787882) self.run() (EngineCore_DP0 pid=1787882) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/multiprocessing/process.py", line 108, in run (EngineCore_DP0 pid=1787882) self._target(*self._args, **self._kwargs) (EngineCore_DP0 pid=1787882) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 722, in run_engine_core (EngineCore_DP0 pid=1787882) raise e (EngineCore_DP0 pid=1787882) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 709, in run_engine_core (EngineCore_DP0 pid=1787882) engine_core = EngineCoreProc(*args, **kwargs) (EngineCore_DP0 pid=1787882) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 505, in __init__ (EngineCore_DP0 pid=1787882) super().__init__(vllm_config, executor_class, log_stats, (EngineCore_DP0 pid=1787882) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 82, in __init__ (EngineCore_DP0 pid=1787882) self.model_executor = executor_class(vllm_config) (EngineCore_DP0 pid=1787882) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 54, in __init__ (EngineCore_DP0 pid=1787882) self._init_executor() (EngineCore_DP0 pid=1787882) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 49, in _init_executor (EngineCore_DP0 pid=1787882) self.collective_rpc("load_model") (EngineCore_DP0 pid=1787882) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc (EngineCore_DP0 pid=1787882) answer = run_method(self.driver_worker, method, args, kwargs) (EngineCore_DP0 pid=1787882) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/utils/__init__.py", line 3060, in run_method (EngineCore_DP0 pid=1787882) return func(*args, **kwargs) (EngineCore_DP0 pid=1787882) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 213, in load_model (EngineCore_DP0 pid=1787882) self.model_runner.load_model(eep_scale_up=eep_scale_up) (EngineCore_DP0 pid=1787882) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2371, in load_model (EngineCore_DP0 pid=1787882) self.model = model_loader.load_model( (EngineCore_DP0 pid=1787882) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/model_loader/base_loader.py", line 45, in load_model (EngineCore_DP0 pid=1787882) model = initialize_model(vllm_config=vllm_config, (EngineCore_DP0 pid=1787882) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/model_loader/utils.py", line 64, in initialize_model (EngineCore_DP0 pid=1787882) return model_class(vllm_config=vllm_config, prefix=prefix) (EngineCore_DP0 pid=1787882) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/models/qwen3_next.py", line 1079, in __init__ (EngineCore_DP0 pid=1787882) self.model = Qwen3NextModel(vllm_config=vllm_config, (EngineCore_DP0 pid=1787882) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/compilation/decorators.py", line 199, in __init__ (EngineCore_DP0 pid=1787882) old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs) (EngineCore_DP0 pid=1787882) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/models/qwen3_next.py", line 915, in __init__ (EngineCore_DP0 pid=1787882) self.start_layer, self.end_layer, self.layers = make_layers( (EngineCore_DP0 pid=1787882) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 642, in make_layers (EngineCore_DP0 pid=1787882) [PPMissingLayer() for _ in range(start_layer)] + [ (EngineCore_DP0 pid=1787882) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 643, in <listcomp> (EngineCore_DP0 pid=1787882) maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}")) (EngineCore_DP0 pid=1787882) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/models/qwen3_next.py", line 904, in get_layer (EngineCore_DP0 pid=1787882) return Qwen3NextDecoderLayer( (EngineCore_DP0 pid=1787882) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/models/qwen3_next.py", line 782, in __init__ (EngineCore_DP0 pid=1787882) self.mlp = Qwen3NextSparseMoeBlock( (EngineCore_DP0 pid=1787882) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/models/qwen3_next.py", line 115, in __init__ (EngineCore_DP0 pid=1787882) self.experts = FusedMoE(num_experts=self.n_routed_experts, (EngineCore_DP0 pid=1787882) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/layers/fused_moe/layer.py", line 945, in __init__ (EngineCore_DP0 pid=1787882) self.quant_method.create_weights(layer=self, **moe_quant_params) (EngineCore_DP0 pid=1787882) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py", line 1167, in create_weights (EngineCore_DP0 pid=1787882) w2_weight = torch.nn.Parameter(torch.empty( (EngineCore_DP0 pid=1787882) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/utils/_device.py", line 103, in __torch_function__ (EngineCore_DP0 pid=1787882) return func(*args, **kwargs) (EngineCore_DP0 pid=1787882) RuntimeError: CUDA driver error: out of memory [rank0]:[W928 20:05:47.656617536 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) (APIServer pid=1787797) Traceback (most recent call last): (APIServer pid=1787797) File "/home/wen/miniconda3/envs/vllm/bin/vllm", line 7, in <module> (APIServer pid=1787797) sys.exit(main()) (APIServer pid=1787797) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/cli/main.py", line 54, in main (APIServer pid=1787797) args.dispatch_function(args) (APIServer pid=1787797) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/cli/serve.py", line 50, in cmd (APIServer pid=1787797) uvloop.run(run_server(args)) (APIServer pid=1787797) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/uvloop/__init__.py", line 82, in run (APIServer pid=1787797) return loop.run_until_complete(wrapper()) (APIServer pid=1787797) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=1787797) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/uvloop/__init__.py", line 61, in wrapper (APIServer pid=1787797) return await main (APIServer pid=1787797) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 1941, in run_server (APIServer pid=1787797) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=1787797) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 1961, in run_server_worker (APIServer pid=1787797) async with build_async_engine_client( (APIServer pid=1787797) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/contextlib.py", line 199, in __aenter__ (APIServer pid=1787797) return await anext(self.gen) (APIServer pid=1787797) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 179, in build_async_engine_client (APIServer pid=1787797) async with build_async_engine_client_from_engine_args( (APIServer pid=1787797) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/contextlib.py", line 199, in __aenter__ (APIServer pid=1787797) return await anext(self.gen) (APIServer pid=1787797) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 221, in build_async_engine_client_from_engine_args (APIServer pid=1787797) async_llm = AsyncLLM.from_vllm_config( (APIServer pid=1787797) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/utils/__init__.py", line 1589, in inner (APIServer pid=1787797) return fn(*args, **kwargs) (APIServer pid=1787797) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 212, in from_vllm_config (APIServer pid=1787797) return cls( (APIServer pid=1787797) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 136, in __init__ (APIServer pid=1787797) self.engine_core = EngineCoreClient.make_async_mp_client( (APIServer pid=1787797) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client (APIServer pid=1787797) return AsyncMPClient(*client_args) (APIServer pid=1787797) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 769, in __init__ (APIServer pid=1787797) super().__init__( (APIServer pid=1787797) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 448, in __init__ (APIServer pid=1787797) with launch_core_engines(vllm_config, executor_class, (APIServer pid=1787797) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/contextlib.py", line 142, in __exit__ (APIServer pid=1787797) next(self.gen) (APIServer pid=1787797) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 729, in launch_core_engines (APIServer pid=1787797) wait_for_engine_startup( (APIServer pid=1787797) File "/home/wen/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 782, in wait_for_engine_startup (APIServer pid=1787797) raise RuntimeError("Engine core initialization failed. " (APIServer pid=1787797) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
最新发布
09-29
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

给算法爸爸上香

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值