driver-model/platform

本文详细介绍了Linux系统中的平台设备与驱动程序的概念、作用、注册方式、命名规则及设备绑定过程,帮助开发者深入理解如何在Linux系统中管理设备与驱动之间的交互。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Chinese translated version of Documentation/driver-model/platform


If you have any comment or update to the content, please contact the
original document maintainer directly.  However, if you have a problem
communicating in English you can also ask the Chinese maintainer for
help.  Contact the Chinese maintainer if this translation is outdated
or if there is a problem with the translation.


Chinese maintainer: jinxin <575068677@qq.com>
---------------------------------------------------------------------
Documentation/driver-model/platform 的中文翻译


如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文
交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻
译存在问题,请联系中文版维护者。


中文版维护者: 金鑫  <575068677.com>
中文版翻译者: 金鑫  <575068677.com>
中文版校译者: 金鑫  <575068677.com>


Platform Devices and Drivers
平台设备和驱动程序
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
See <linux/platform_device.h> for the driver model interface to the
platform bus:  platform_device, and platform_driver.  This pseudo-bus
is used to connect devices on busses with minimal infrastructure,
like those used to integrate peripherals on many system-on-chip
processors, or some "legacy" PC interconnects; as opposed to large
formally specified ones like PCI or USB.
见<linux/platform_device.h>。平台总线的驱动程序模块接口:平台设备,平台驱动程序。
这种伪总线是用来在总线上连接设备的最小基础设施,像那些用于在许多系统芯片处理器的集成外设,
或一些“遗留”PC互联;而非太正式指定的PCI或USB。




Platform devices
平台设备
~~~~~~~~~~~~~~~~
Platform devices are devices that typically appear as autonomous
entities in the system. This includes legacy port-based devices and
host bridges to peripheral buses, and most controllers integrated
into system-on-chip platforms.  What they usually have in common
is direct addressing from a CPU bus.  Rarely, a platform_device will
be connected through a segment of some other kind of bus; but its
registers will still be directly addressable.
平台设备通常指的是那些在系统中看起来有自治能力的设备, 包括老式的基于端口的设备和连接外设总线的
北桥(host bridges),以及在集成芯片平台上的绝大多数控制器. 它们通常拥有的一个共同特征是直接编址于CPU总线上.
偶尔平台设备会通过某段其他类型的总线连入平台, 它们的寄存器也会被直接编址.


Platform devices are given a name, used in driver binding, and a
list of resources such as addresses and IRQs.
平台设备会被分配到一个名字(用于驱动绑定)以及一系列类似地址和中断请求号(IRQ)的资源.




struct platform_device {
    const char    *name;
    u32        id;
    struct device    dev;
    u32        num_resources;
    struct resource    *resource;
};




Platform drivers
~~~~~~~~~~~~~~~~
Platform drivers follow the standard driver model convention, where
discovery/enumeration is handled outside the drivers, and drivers
provide probe() and remove() methods.  They support power management
and shutdown notifications using the standard conventions.
平台驱动遵循以下驱动模型约定,设备发现和设备例举是在驱动外处理的,驱动程序提供
probe()和remove()函数。它们支持通通过标准惯例来实现电源管理和关机通知。


struct platform_driver {
    int (*probe)(struct platform_device *);
    int (*remove)(struct platform_device *);
    void (*shutdown)(struct platform_device *);
    int (*suspend)(struct platform_device *, pm_message_t state);
    int (*suspend_late)(struct platform_device *, pm_message_t state);
    int (*resume_early)(struct platform_device *);
    int (*resume)(struct platform_device *);
    struct device_driver driver;
};


Note that probe() should general verify that the specified device hardware
actually exists; sometimes platform setup code can't be sure.  The probing
can use device resources, including clocks, and device platform_data.
注意使用Probe()函数之前应验证probe()指定设备的硬件确实存在(因为平台设置的代码并不确定)。
Probe()函数可以使用设备的资源包括时钟和platform_data。


Platform drivers register themselves the normal way:
平台驱动程序的注册方法:
    int platform_driver_register(struct platform_driver *drv);


Or, in common situations where the device is known not to be hot-pluggable,
the probe() routine can live in an init section to reduce the driver's
runtime memory footprint:
或者当知道设备是热设备时,可以把probe()放在init函数中来减小运行时内存的占用。
    int platform_driver_probe(struct platform_driver *drv,
              int (*probe)(struct platform_device *))




Device Enumeration
//设备枚举
~~~~~~~~~~~~~~~~~~
As a rule, platform specific (and often board-specific) setup code will
register platform devices:
作为一个规则,平台各自的(通常在特定的板中)设置代码中会注册平台设备:


    int platform_device_register(struct platform_device *pdev);


    int platform_add_devices(struct platform_device **pdevs, int ndev);


The general rule is to register only those devices that actually exist,
but in some cases extra devices might be registered.  For example, a kernel
might be configured to work with an external network adapter that might not
be populated on all boards, or likewise to work with an integrated controller
that some boards might not hook up to any peripherals.
通常的规则是只注册那些已经确定存在的设备,但是在一些特定情况下,额外的设备也应该要注册。
例如,内核工作可能需要并不在板上的外部网络适配器,或者并不与任何设备相连的控制器。


In some cases, boot firmware will export tables describing the devices
that are populated on a given board.   Without such tables, often the
only way for system setup code to set up the correct devices is to build
a kernel for a specific target board.  Such board-specific kernels are
common with embedded and custom systems development.
在某些时候,固件引导启动时候会输出一张描述所有连接在板子上设备的表。
如果没有这张表,正确启动系统代码的唯一方法是为板子编译一个特定的内核。
这种board-specific内核广泛应用在嵌入和普通系统开发上。


In many cases, the memory and IRQ resources associated with the platform
device are not enough to let the device's driver work.  Board setup code
will often provide additional information using the device's platform_data
field to hold additional information.
通常,内存和IRQ(中断请求)资源和平台联系在一起并不足够让设备驱动工作。板设置代码
还需要用设备的platform_data提供额外的信息。


Embedded systems frequently need one or more clocks for platform devices,
which are normally kept off until they're actively needed (to save power).
System setup also associates those clocks with the device, so that that
calls to clk_get(&pdev->dev, clock_name) return them as needed.
嵌入式系统为了平台设备频繁的需要一个或者多个时钟,这些时钟通常只有在
需要的时候才会被打开(为了省电)。系统启动时通过设备联系和设置这些时钟,
通过clk_get(&pdev->dev, clock_name)来得到需要的时钟。


Legacy Drivers:  Device Probing
Legacy Drivers:设备检测


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Some drivers are not fully converted to the driver model, because they take
on a non-driver role:  the driver registers its platform device, rather than
leaving that for system infrastructure.  Such drivers can't be hotplugged
or coldplugged, since those mechanisms require device creation to be in a
different system component than the driver.
一些驱动不会完全的按照驱动模式来执行,因为他们扮演出一个non-driver的角色:驱动
去注册它自己的平台设备,而不是让系统来完成。 这些驱动不是热插拔或者冷插拔,因为这些
驱动机制需要创建设备当作一个不同的系统组件而不是驱动。


The only "good" reason for this is to handle older system designs which, like
original IBM PCs, rely on error-prone "probe-the-hardware" models for hardware
configuration.  Newer systems have largely abandoned that model, in favor of
bus-level support for dynamic configuration (PCI, USB), or device tables
provided by the boot firmware (e.g. PNPACPI on x86).  There are too many
conflicting options about what might be where, and even educated guesses by
an operating system will be wrong often enough to make trouble.
这种驱动的唯一好处是可以用来处理一些老的系统,像最初的IBM PC,依靠硬件结构上
易于出错的"probe-the-hardware"模式。新的系统已经很大程度上遗弃了那种模式,新的系统
对bus-level提供了动态配置(PCI,USB),或者对设备表提供了引导固件(例如x86上的PNPACPI)。
关于使用什么驱动,在哪里使用有许许多多的有冲突的配置选项,甚至操作系统有根据的推测
也可能出错造成麻烦。


This style of driver is discouraged.  If you're updating such a driver,
please try to move the device enumeration to a more appropriate location,
outside the driver.  This will usually be cleanup, since such drivers
tend to already have "normal" modes, such as ones using device nodes that
were created by PNP or by platform device setup.
这种类型的驱动是不鼓励的。如果你在升级这样的一个驱动,请尝试把设备枚举到
驱动外一个更合适的位置。这通常会清除,因为这类的驱动已经拥有了“标准”模式。
就像使用设备节点时被PNP或者平台设备设置创建一样。


None the less, there are some APIs to support such legacy drivers.  Avoid
using these calls except with such hotplug-deficient drivers.
依然,有一些APIs支持这些Legacy Drivers。 在热插拔设备上要避免使用这些APIs。


    struct platform_device *platform_device_alloc(
            const char *name, int id);


You can use platform_device_alloc() to dynamically allocate a device, which
you will then initialize with resources and platform_device_register().
A better solution is usually:
你可以用platform_device_alloc()来动态的分配一个设备,用资源和
platform_device_register()来初始化它。 
一个更好的方法是:
    struct platform_device *platform_device_register_simple(
            const char *name, int id,
            struct resource *res, unsigned int nres);


You can use platform_device_register_simple() as a one-step call to allocate
and register a device.
通过platform_device_register_simple()一个函数来分配和注册一个设备。


Device Naming and Driver Binding
设备命名和设备绑定
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The platform_device.dev.bus_id is the canonical name for the devices.
It's built from two components:
按照规定platform_device.dev.bus_id是设备的名字,它由两部分组成:
    * platform_device.name ... which is also used to for driver matching.
* platform_device.name ... 是用来设备匹配的
    * platform_device.id ... the device instance number, or else "-1"
      to indicate there's only one.
* platform_device.id ...设备实例号,-1表示同名的设备只有一个。

These are concatenated, so name/id "serial"/0 indicates bus_id "serial.0", and
"serial/3" indicates bus_id "serial.3"; both would use the platform_driver
named "serial".  While "my_rtc"/-1 would be bus_id "my_rtc" (no instance id)
and use the platform_driver called "my_rtc".
它们是串级的。如果name/id 是"serial"/0就代表了bus_id 是"serial.0",
如果是"serial/3" 就表示 bus_id 是"serial.3"; 这两个都是用于名字是"serial“的平台
驱动。同时"my_rtc"/-1 表示 bus_id "my_rtc" (没有实例id),平台驱动叫做 "my_rtc"。


Driver binding is performed automatically by the driver core, invoking
driver probe() after finding a match between device and driver.  If the
probe() succeeds, the driver and device are bound as usual.  There are
three different ways to find such a match:
驱动绑定是通过驱动核心自动完成的,在发现了匹配的设备和驱动之后调用
驱动probe()函数。如果probe()函数成功,驱动和设备就照常绑定起来了。
有三种不同的方法来发现设备和驱动之间的匹配。
    - Whenever a device is registered, the drivers for that bus are
      checked for matches.  Platform devices should be registered very
      early during system boot.
当一个设备注册时,设备所在总线的驱动会检查是否匹配。平台设备应该在系统启动
的早起进行注册。
    - When a driver is registered using platform_driver_register(), all
      unbound devices on that bus are checked for matches.  Drivers
      usually register later during booting, or by module loading.
当一个驱动用platform_driver_register()注册时,所有在对应总线上没有绑定的设备
检查是否有匹配。驱动通常在启动的晚期进行或者模块加载时注册。
    - Registering a driver using platform_driver_probe() works just like
      using platform_driver_register(), except that the driver won't
      be probed later if another device registers.  (Which is OK, since
      this interface is only for use with non-hotpluggable devices.)
用 platform_driver_probe()注册一个驱动就像用platform_driver_register()一样,
不同的是在这之后其他设备注册时,驱动不能再用来检测。(不过这没关系,因为这个接口
只用来供非热插拔设备使用。)


Early Platform Devices and Drivers
早期的平台设备和驱动
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The early platform interfaces provide platform data to platform device
drivers early on during the system boot. The code is built on top of the
early_param() command line parsing and can be executed very early on.
早期的平台接口在系统启动时提供平台数据和平台设备驱动。代码以
early_param() 命令行分析为基础,能在早期被执行。


Example: "earlyprintk" class early serial console in 6 steps
例子:"earlyprintk"控制台在早期的六步


1. Registering early platform device data
注册早期平台设备数据
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The architecture code registers platform device data using the function
early_platform_add_devices(). In the case of early serial console this
should be hardware configuration for the serial port. Devices registered
at this point will later on be matched against early platform drivers.
代码的体系结构用early_platform_add_devices()函数注册平台设备数据。
在早期的情况,串行控制台应该为了串行接口对硬件进行配置。
在这种情况下注册了的设备应该与稍后的平台驱动竞争。


2. Parsing kernel command line
解析内核命令行
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The architecture code calls parse_early_param() to parse the kernel
command line. This will execute all matching early_param() callbacks.
User specified early platform devices will be registered at this point.
For the early serial console case the user can specify port on the
kernel command line as "earlyprintk=serial.0" where "earlyprintk" is
the class string, "serial" is the name of the platform driver and
0 is the platform device id. If the id is -1 then the dot and the
id can be omitted.
体系结构代码调用parse_early_param()来解析内核命令行。这将会执行所有
匹配的early_param() 回调函数。用户特定的早期平台设备将在这是被注册。
为了早期的串行控制台,用户可以在内核命令行通过像"earlyprintk=serial.0" 
where "earlyprintk"的字符串来指定端口。"serial" 是平台驱动的名字,
0是平台驱动的id。如果id是-1,然后是点,那么id可以忽略。


3. Installing early platform drivers belonging to a certain class
安装早期的平台驱动属于一个特定的类
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The architecture code may optionally force registration of all early
platform drivers belonging to a certain class using the function
early_platform_driver_register_all(). User specified devices from
step 2 have priority over these. This step is omitted by the serial
driver example since the early serial driver code should be disabled
unless the user has specified port on the kernel command line.
体系结构代码可能会随意的强迫所有属于一个特定的类的平台驱动用
early_platform_driver_register_all()来注册。用户指定的设备在第2步比
这些平台驱动有更高的权限。这一步被串行驱动忽略,因为早期的串行驱动
程序代码在用户指定内核命令行端口之前不可用。


4. Early platform driver registration
早期的平台驱动注册
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compiled-in platform drivers making use of early_platform_init() are
automatically registered during step 2 or 3. The serial driver example
should use early_platform_init("earlyprintk", &platform_driver).
内编译的平台驱动用early_platform_init()在第2步,第3步自动注册。
串行驱动例子应该用early_platform_init("earlyprintk", &platform_driver)。


5. Probing of early platform drivers belonging to a certain class
早期的平台驱动探测属于一个特定的类
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The architecture code calls early_platform_driver_probe() to match
registered early platform devices associated with a certain class with
registered early platform drivers. Matched devices will get probed().
This step can be executed at any point during the early boot. As soon
as possible may be good for the serial port case.
体系结构代码调用early_platform_driver_probe()函数和一个特定的类
来匹配早期注册的平台设备(和一个特定的类)。匹配的设备会得到probed()。
这一步能执行在系统早期的任何时候。尽可能早的执行对串行端口有益。


6. Inside the early platform driver probe()
在早期平台驱动probe()函数内
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The driver code needs to take special care during early boot, especially
when it comes to memory allocation and interrupt registration. The code
in the probe() function can use is_early_platform_device() to check if
it is called at early platform device or at the regular platform device
time. The early serial driver performs register_console() at this point.
在系统启动的早期驱动代码需要额外的关照,特别是有关内存分配和中断
登记的时候。在probe()函数内的代码能使用is_early_platform_device()来
它是否在平台设备早期还是在长贵的平台设备时间。再起串行驱动在这时执行
register_console()函数。
For further information, see <linux/platform_device.h>.
更多信息请见<linux/platform_device.h>。
(unsloth) root@autodl-container-578c43a3a7-817faf39:~/autodl-tmp/llm_train/src# vllm serve ./ckpts/qwen3-0.6b --enable-auto-tool-choice --tool-call-parser hermes INFO 07-31 23:06:38 [__init__.py:235] Automatically detected platform cuda. INFO 07-31 23:06:41 [api_server.py:1755] vLLM API server version 0.10.0 INFO 07-31 23:06:41 [cli_args.py:261] non-default args: {'model_tag': './ckpts/qwen3-0.6b', 'enable_auto_tool_choice': True, 'tool_call_parser': 'hermes', 'model': './ckpts/qwen3-0.6b'} INFO 07-31 23:06:49 [config.py:1604] Using max model len 40960 WARNING 07-31 23:06:49 [config.py:1084] bitsandbytes quantization is not fully optimized yet. The speed can be slower than non-quantized models. INFO 07-31 23:06:49 [config.py:2434] Chunked prefill is enabled with max_num_batched_tokens=2048. INFO 07-31 23:06:54 [__init__.py:235] Automatically detected platform cuda. INFO 07-31 23:06:57 [core.py:572] Waiting for init message from front-end. INFO 07-31 23:06:57 [core.py:71] Initializing a V1 LLM engine (v0.10.0) with config: model='./ckpts/qwen3-0.6b', speculative_config=None, tokenizer='./ckpts/qwen3-0.6b', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=40960, download_dir=None, load_format=LoadFormat.BITSANDBYTES, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=bitsandbytes, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=./ckpts/qwen3-0.6b, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null} INFO 07-31 23:06:58 [parallel_state.py:1102] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 WARNING 07-31 23:06:58 [topk_topp_sampler.py:59] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer. INFO 07-31 23:06:58 [gpu_model_runner.py:1843] Starting to load model ./ckpts/qwen3-0.6b... INFO 07-31 23:06:58 [gpu_model_runner.py:1875] Loading model from scratch... INFO 07-31 23:06:58 [cuda.py:290] Using Flash Attention backend on V1 engine. INFO 07-31 23:06:58 [bitsandbytes_loader.py:733] Loading weights with BitsAndBytes quantization. May take a while ... Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 31.45it/s] Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s] ERROR 07-31 23:06:59 [core.py:632] EngineCore failed to start. ERROR 07-31 23:06:59 [core.py:632] Traceback (most recent call last): ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 623, in run_engine_core ERROR 07-31 23:06:59 [core.py:632] engine_core = EngineCoreProc(*args, **kwargs) ERROR 07-31 23:06:59 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 441, in __init__ ERROR 07-31 23:06:59 [core.py:632] super().__init__(vllm_config, executor_class, log_stats, ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 77, in __init__ ERROR 07-31 23:06:59 [core.py:632] self.model_executor = executor_class(vllm_config) ERROR 07-31 23:06:59 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 53, in __init__ ERROR 07-31 23:06:59 [core.py:632] self._init_executor() ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 49, in _init_executor ERROR 07-31 23:06:59 [core.py:632] self.collective_rpc("load_model") ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc ERROR 07-31 23:06:59 [core.py:632] answer = run_method(self.driver_worker, method, args, kwargs) ERROR 07-31 23:06:59 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/utils/__init__.py", line 2985, in run_method ERROR 07-31 23:06:59 [core.py:632] return func(*args, **kwargs) ERROR 07-31 23:06:59 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^ ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 201, in load_model ERROR 07-31 23:06:59 [core.py:632] self.model_runner.load_model(eep_scale_up=eep_scale_up) ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1876, in load_model ERROR 07-31 23:06:59 [core.py:632] self.model = model_loader.load_model( ERROR 07-31 23:06:59 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/model_loader/base_loader.py", line 49, in load_model ERROR 07-31 23:06:59 [core.py:632] self.load_weights(model, model_config) ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/model_loader/bitsandbytes_loader.py", line 741, in load_weights ERROR 07-31 23:06:59 [core.py:632] loaded_weights = model.load_weights(qweight_iterator) ERROR 07-31 23:06:59 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/models/qwen3.py", line 321, in load_weights ERROR 07-31 23:06:59 [core.py:632] return loader.load_weights(weights) ERROR 07-31 23:06:59 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 291, in load_weights ERROR 07-31 23:06:59 [core.py:632] autoloaded_weights = set(self._load_module("", self.module, weights)) ERROR 07-31 23:06:59 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 249, in _load_module ERROR 07-31 23:06:59 [core.py:632] yield from self._load_module(prefix, ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 222, in _load_module ERROR 07-31 23:06:59 [core.py:632] loaded_params = module_load_weights(weights) ERROR 07-31 23:06:59 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 404, in load_weights ERROR 07-31 23:06:59 [core.py:632] weight_loader(param, loaded_weight, shard_id) ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 806, in weight_loader ERROR 07-31 23:06:59 [core.py:632] assert param_data.shape == loaded_weight.shape ERROR 07-31 23:06:59 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-31 23:06:59 [core.py:632] AssertionError Process EngineCore_0: Traceback (most recent call last): File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 636, in run_engine_core raise e File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 623, in run_engine_core engine_core = EngineCoreProc(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 441, in __init__ super().__init__(vllm_config, executor_class, log_stats, File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 77, in __init__ self.model_executor = executor_class(vllm_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 53, in __init__ self._init_executor() File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 49, in _init_executor self.collective_rpc("load_model") File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc answer = run_method(self.driver_worker, method, args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/utils/__init__.py", line 2985, in run_method return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 201, in load_model self.model_runner.load_model(eep_scale_up=eep_scale_up) File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1876, in load_model self.model = model_loader.load_model( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/model_loader/base_loader.py", line 49, in load_model self.load_weights(model, model_config) File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/model_loader/bitsandbytes_loader.py", line 741, in load_weights loaded_weights = model.load_weights(qweight_iterator) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/models/qwen3.py", line 321, in load_weights return loader.load_weights(weights) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 291, in load_weights autoloaded_weights = set(self._load_module("", self.module, weights)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 249, in _load_module yield from self._load_module(prefix, File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 222, in _load_module loaded_params = module_load_weights(weights) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 404, in load_weights weight_loader(param, loaded_weight, shard_id) File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 806, in weight_loader assert param_data.shape == loaded_weight.shape ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s] [rank0]:[W731 23:07:00.371522631 ProcessGroupNCCL.cpp:1479] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) Traceback (most recent call last): File "/root/autodl-tmp/miniconda3/envs/unsloth/bin/vllm", line 8, in <module> sys.exit(main()) ^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 54, in main args.dispatch_function(args) File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 52, in cmd uvloop.run(run_server(args)) File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/uvloop/__init__.py", line 105, in run return runner.run(wrapper()) ^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/uvloop/__init__.py", line 61, in wrapper return await main ^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1791, in run_server await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1811, in run_server_worker async with build_async_engine_client(args, client_config) as engine_client: File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 158, in build_async_engine_client async with build_async_engine_client_from_engine_args( File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 194, in build_async_engine_client_from_engine_args async_llm = AsyncLLM.from_vllm_config( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 163, in from_vllm_config return cls( ^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 117, in __init__ self.engine_core = EngineCoreClient.make_async_mp_client( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 98, in make_async_mp_client return AsyncMPClient(*client_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 677, in __init__ super().__init__( File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 408, in __init__ with launch_core_engines(vllm_config, executor_class, File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/contextlib.py", line 144, in __exit__ next(self.gen) File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/engine/utils.py", line 697, in launch_core_engines wait_for_engine_startup( File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/engine/utils.py", line 750, in wait_for_engine_startup raise RuntimeError("Engine core initialization failed. " RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
最新发布
08-01
/home/hadoopmaster/jdk1.8.0_161/bin/java -javaagent:/home/hadoopmaster/idea-IC-221.6008.13/lib/idea_rt.jar=34515:/home/hadoopmaster/idea-IC-221.6008.13/bin -Dfile.encoding=UTF-8 -classpath /home/hadoopmaster/jdk1.8.0_161/jre/lib/charsets.jar:/home/hadoopmaster/jdk1.8.0_161/jre/lib/deploy.jar:/home/hadoopmaster/jdk1.8.0_161/jre/lib/ext/cldrdata.jar:/home/hadoopmaster/jdk1.8.0_161/jre/lib/ext/dnsns.jar:/home/hadoopmaster/jdk1.8.0_161/jre/lib/ext/jaccess.jar:/home/hadoopmaster/jdk1.8.0_161/jre/lib/ext/jfxrt.jar:/home/hadoopmaster/jdk1.8.0_161/jre/lib/ext/localedata.jar:/home/hadoopmaster/jdk1.8.0_161/jre/lib/ext/nashorn.jar:/home/hadoopmaster/jdk1.8.0_161/jre/lib/ext/sunec.jar:/home/hadoopmaster/jdk1.8.0_161/jre/lib/ext/sunjce_provider.jar:/home/hadoopmaster/jdk1.8.0_161/jre/lib/ext/sunpkcs11.jar:/home/hadoopmaster/jdk1.8.0_161/jre/lib/ext/zipfs.jar:/home/hadoopmaster/jdk1.8.0_161/jre/lib/javaws.jar:/home/hadoopmaster/jdk1.8.0_161/jre/lib/jce.jar:/home/hadoopmaster/jdk1.8.0_161/jre/lib/jfr.jar:/home/hadoopmaster/jdk1.8.0_161/jre/lib/jfxswt.jar:/home/hadoopmaster/jdk1.8.0_161/jre/lib/jsse.jar:/home/hadoopmaster/jdk1.8.0_161/jre/lib/management-agent.jar:/home/hadoopmaster/jdk1.8.0_161/jre/lib/plugin.jar:/home/hadoopmaster/jdk1.8.0_161/jre/lib/resources.jar:/home/hadoopmaster/jdk1.8.0_161/jre/lib/rt.jar:/root/IdeaProjects/kkk/out/production/kkk:/home/hadoopmaster/scala-2.12.15/lib/scala-reflect.jar:/home/hadoopmaster/scala-2.12.15/lib/scala-xml_2.12-1.0.6.jar:/home/hadoopmaster/scala-2.12.15/lib/scala-parser-combinators_2.12-1.0.7.jar:/home/hadoopmaster/scala-2.12.15/lib/scala-swing_2.12-2.0.3.jar:/home/hadoopmaster/scala-2.12.15/lib/scala-library.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/xz-1.8.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jta-1.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jpam-1.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/json-1.8.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/ST4-4.0.4.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/guice-3.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/ivy-2.5.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/oro-2.0.8.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/blas-2.2.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/core-1.1.2.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/gson-2.2.4.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/tink-1.6.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/avro-1.10.2.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jsp-api-2.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/okio-1.14.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/opencsv-2.3.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/shims-0.9.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/xmlenc-0.52.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/arpack-2.2.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/guava-14.0.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jetty-6.1.26.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jline-2.14.6.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jsr305-3.0.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/lapack-2.2.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/log4j-1.2.17.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/minlog-1.3.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/stream-2.9.6.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/velocity-1.5.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/generex-1.0.2.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hk2-api-2.6.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/janino-3.0.16.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jdo-api-3.0.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/objenesis-2.6.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/paranamer-2.8.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/py4j-0.10.9.3.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/pyrolite-4.30.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/HikariCP-2.5.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/commons-io-2.4.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hive-cli-2.3.9.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/javax.inject-1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/libfb303-0.9.3.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/lz4-java-1.7.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/okhttp-3.12.12.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/snakeyaml-1.27.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/stax-api-1.0.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/JTransforms-3.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/aopalliance-1.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/avro-ipc-1.10.2.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/breeze_2.12-1.2.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/commons-cli-1.2.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/commons-net-3.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/derby-10.14.2.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hive-jdbc-2.3.9.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hk2-utils-2.6.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/httpcore-4.4.14.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jaxb-api-2.2.11.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jersey-hk2-2.34.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jodd-core-3.5.2.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/orc-core-1.6.12.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/super-csv-2.2.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/xml-apis-1.4.01.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/zookeeper-3.6.2.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/JLargeArrays-1.5.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/activation-1.1.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/automaton-1.11-8.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/commons-dbcp-1.4.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/commons-lang-2.6.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/commons-text-1.6.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hive-serde-2.3.9.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hive-shims-2.3.9.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/javolution-5.5.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/libthrift-0.12.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/orc-shims-1.6.12.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/slf4j-api-1.7.30.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/zjsonpatch-0.3.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/zstd-jni-1.5.0-4.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/chill-java-0.10.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/chill_2.12-0.10.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/guice-servlet-3.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hadoop-auth-2.7.4.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hadoop-hdfs-2.7.4.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hive-common-2.3.9.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hk2-locator-2.6.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/httpclient-4.5.13.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jackson-xc-1.9.13.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jetty-util-6.1.26.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/joda-time-2.10.10.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/kryo-shaded-4.0.2.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/metrics-jmx-4.2.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/metrics-jvm-4.2.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/rocksdbjni-6.20.3.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/spire_2.12-0.17.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/xercesImpl-2.12.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/aircompressor-0.21.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/algebra_2.12-2.0.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/annotations-17.0.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/antlr4-runtime-4.8.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/api-util-1.0.0-M20.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/arrow-format-2.0.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/arrow-vector-2.0.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/avro-mapred-1.10.2.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/commons-codec-1.15.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/commons-pool-1.5.4.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/compress-lzf-1.0.3.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hive-beeline-2.3.9.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/javax.jdo-3.2.0-m3.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jaxb-runtime-2.3.2.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jersey-client-2.34.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jersey-common-2.34.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jersey-server-2.34.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/leveldbjni-all-1.8.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/metrics-core-4.2.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/metrics-json-4.2.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/RoaringBitmap-0.9.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/antlr-runtime-3.5.2.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/commons-math3-3.4.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hadoop-client-2.7.4.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hadoop-common-2.7.4.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jackson-core-2.12.3.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/javassist-3.25.0-GA.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jul-to-slf4j-1.7.30.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/protobuf-java-2.5.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/snappy-java-1.1.8.4.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/transaction-api-1.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/bonecp-0.8.0.RELEASE.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/commons-crypto-1.1.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/commons-digester-1.8.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/commons-lang3-3.12.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/curator-client-2.7.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hive-exec-2.3.9-core.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hive-metastore-2.3.9.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jackson-jaxrs-1.9.13.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jakarta.inject-2.6.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/orc-mapreduce-1.6.12.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/scala-xml_2.12-1.2.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/shapeless_2.12-2.3.3.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/slf4j-log4j12-1.7.30.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/spark-sql_2.12-3.2.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/threeten-extra-1.5.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/zookeeper-jute-3.6.2.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/commons-compress-1.21.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/commons-logging-1.1.3.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/curator-recipes-2.7.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hadoop-yarn-api-2.7.4.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hive-shims-0.23-2.3.9.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jcl-over-slf4j-1.7.30.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/parquet-column-1.12.2.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/parquet-common-1.12.2.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/parquet-hadoop-1.12.2.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/scala-library-2.12.15.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/scala-reflect-2.12.15.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/spark-core_2.12-3.2.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/spark-hive_2.12-3.2.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/spark-repl_2.12-3.2.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/spark-tags_2.12-3.2.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/spark-yarn_2.12-3.2.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/api-asn1-api-1.0.0-M20.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/breeze-macros_2.12-1.2.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/cats-kernel_2.12-2.1.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/commons-httpclient-3.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/flatbuffers-java-1.9.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hive-llap-common-2.3.9.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hive-service-rpc-3.1.2.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hive-storage-api-2.7.2.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jetty-sslengine-6.1.26.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/metrics-graphite-4.2.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/netty-all-4.1.68.Final.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/parquet-jackson-1.12.2.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/scala-compiler-2.12.15.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/spark-mesos_2.12-3.2.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/spark-mllib_2.12-3.2.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/spire-util_2.12-0.17.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/xbean-asm9-shaded-4.20.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/apacheds-i18n-2.0.0-M15.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/arpack_combined_all-0.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/arrow-memory-core-2.0.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/commons-beanutils-1.9.4.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/commons-compiler-3.0.16.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/curator-framework-2.7.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/datanucleus-core-4.1.17.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hive-shims-common-2.3.9.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jackson-core-asl-1.9.13.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jackson-databind-2.12.3.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jakarta.ws.rs-api-2.1.6.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/kubernetes-client-5.4.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/macro-compat_2.12-1.1.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/parquet-encoding-1.12.2.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/spark-graphx_2.12-3.2.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/spark-sketch_2.12-3.2.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/spark-unsafe_2.12-3.2.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/univocity-parsers-2.9.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/arrow-memory-netty-2.0.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/datanucleus-rdbms-4.1.19.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hadoop-annotations-2.7.4.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hadoop-yarn-client-2.7.4.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hadoop-yarn-common-2.7.4.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/spark-kvstore_2.12-3.2.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/spire-macros_2.12-0.17.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/commons-collections-3.2.2.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/commons-configuration-1.6.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/datanucleus-api-jdo-4.2.4.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jackson-mapper-asl-1.9.13.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jakarta.servlet-api-4.0.3.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/json4s-ast_2.12-3.7.0-M11.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/spark-catalyst_2.12-3.2.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/spark-launcher_2.12-3.2.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/audience-annotations-0.5.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hive-shims-scheduler-2.3.9.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hive-vector-code-gen-2.3.9.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jackson-annotations-2.12.3.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jakarta.xml.bind-api-2.3.2.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/json4s-core_2.12-3.7.0-M11.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/spark-streaming_2.12-3.2.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/spire-platform_2.12-0.17.0.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/kubernetes-model-apps-5.4.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/kubernetes-model-core-5.4.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/kubernetes-model-node-5.4.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/kubernetes-model-rbac-5.4.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/logging-interceptor-3.12.12.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/mesos-1.4.0-shaded-protobuf.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/osgi-resource-locator-1.0.3.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/spark-kubernetes_2.12-3.2.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/spark-tags_2.12-3.2.1-tests.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/aopalliance-repackaged-2.6.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/htrace-core-3.1.0-incubating.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/istack-commons-runtime-3.0.8.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jakarta.annotation-api-1.3.5.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jakarta.validation-api-2.0.2.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/json4s-scalap_2.12-3.7.0-M11.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/kubernetes-model-batch-5.4.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/spark-mllib-local_2.12-3.2.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jersey-container-servlet-2.34.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/json4s-jackson_2.12-3.7.0-M11.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/kubernetes-model-common-5.4.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/kubernetes-model-events-5.4.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/kubernetes-model-policy-5.4.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jackson-dataformat-yaml-2.12.3.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jackson-datatype-jsr310-2.11.2.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/kubernetes-model-metrics-5.4.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hadoop-yarn-server-common-2.7.4.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/spark-network-common_2.12-3.2.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jackson-module-scala_2.12-2.12.3.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/kubernetes-model-discovery-5.4.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/parquet-format-structures-1.12.2.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/spark-network-shuffle_2.12-3.2.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/apacheds-kerberos-codec-2.0.0-M15.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hadoop-mapreduce-client-app-2.7.4.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/kubernetes-model-extensions-5.4.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/kubernetes-model-networking-5.4.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/kubernetes-model-scheduling-5.4.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hadoop-mapreduce-client-core-2.7.4.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hadoop-yarn-server-web-proxy-2.7.4.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/jersey-container-servlet-core-2.34.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/kubernetes-model-autoscaling-5.4.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/kubernetes-model-flowcontrol-5.4.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/scala-collection-compat_2.12-2.1.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/spark-hive-thriftserver_2.12-3.2.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/kubernetes-model-certificates-5.4.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/kubernetes-model-coordination-5.4.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/kubernetes-model-storageclass-5.4.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/scala-parser-combinators_2.12-1.1.2.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hadoop-mapreduce-client-common-2.7.4.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/kubernetes-model-apiextensions-5.4.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hadoop-mapreduce-client-shuffle-2.7.4.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/hadoop-mapreduce-client-jobclient-2.7.4.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/kubernetes-model-admissionregistration-5.4.1.jar:/home/hadoopmaster/spark-3.2.1-bin-hadoop2.7/jars/dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar kkk.WordCount Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 25/06/02 20:17:03 INFO SparkContext: Running Spark version 3.2.1 25/06/02 20:17:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 25/06/02 20:17:04 INFO ResourceUtils: ============================================================== 25/06/02 20:17:04 INFO ResourceUtils: No custom resources configured for spark.driver. 25/06/02 20:17:04 INFO ResourceUtils: ============================================================== 25/06/02 20:17:04 INFO SparkContext: Submitted application: WordCount 25/06/02 20:17:04 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) 25/06/02 20:17:04 INFO ResourceProfile: Limiting resource is cpu 25/06/02 20:17:04 INFO ResourceProfileManager: Added ResourceProfile id: 0 25/06/02 20:17:04 INFO SecurityManager: Changing view acls to: root 25/06/02 20:17:04 INFO SecurityManager: Changing modify acls to: root 25/06/02 20:17:04 INFO SecurityManager: Changing view acls groups to: 25/06/02 20:17:04 INFO SecurityManager: Changing modify acls groups to: 25/06/02 20:17:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 25/06/02 20:17:05 INFO Utils: Successfully started service 'sparkDriver' on port 41615. 25/06/02 20:17:05 INFO SparkEnv: Registering MapOutputTracker 25/06/02 20:17:05 INFO SparkEnv: Registering BlockManagerMaster 25/06/02 20:17:05 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 25/06/02 20:17:05 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 25/06/02 20:17:05 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 25/06/02 20:17:05 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-68486311-3f69-48e8-8f69-e7afffcf5979 25/06/02 20:17:05 INFO MemoryStore: MemoryStore started with capacity 258.5 MiB 25/06/02 20:17:05 INFO SparkEnv: Registering OutputCommitCoordinator 25/06/02 20:17:06 INFO Utils: Successfully started service 'SparkUI' on port 4040. 25/06/02 20:17:06 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://hadoopmaster:4040 25/06/02 20:17:06 INFO Executor: Starting executor ID driver on host hadoopmaster 25/06/02 20:17:06 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41907. 25/06/02 20:17:06 INFO NettyBlockTransferService: Server created on hadoopmaster:41907 25/06/02 20:17:06 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 25/06/02 20:17:06 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, hadoopmaster, 41907, None) 25/06/02 20:17:06 INFO BlockManagerMasterEndpoint: Registering block manager hadoopmaster:41907 with 258.5 MiB RAM, BlockManagerId(driver, hadoopmaster, 41907, None) 25/06/02 20:17:06 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, hadoopmaster, 41907, None) 25/06/02 20:17:06 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, hadoopmaster, 41907, None) 25/06/02 20:17:08 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 244.0 KiB, free 258.2 MiB) 25/06/02 20:17:09 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 23.4 KiB, free 258.2 MiB) 25/06/02 20:17:09 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoopmaster:41907 (size: 23.4 KiB, free: 258.5 MiB) 25/06/02 20:17:09 INFO SparkContext: Created broadcast 0 from textFile at WordCount.scala:9 Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/hadoopmaster/words.txt at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.rdd.RDD.partitions(RDD.scala:296) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.rdd.RDD.partitions(RDD.scala:296) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.rdd.RDD.partitions(RDD.scala:296) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.rdd.RDD.partitions(RDD.scala:296) at org.apache.spark.Partitioner$.$anonfun$defaultPartitioner$4(Partitioner.scala:78) at org.apache.spark.Partitioner$.$anonfun$defaultPartitioner$4$adapted(Partitioner.scala:78) at scala.collection.immutable.List.map(List.scala:293) at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:78) at org.apache.spark.rdd.PairRDDFunctions.$anonfun$reduceByKey$4(PairRDDFunctions.scala:322) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:414) at org.apache.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:322) at kkk.WordCount$.main(WordCount.scala:10) at kkk.WordCount.main(WordCount.scala) 25/06/02 20:17:09 INFO SparkContext: Invoking stop() from shutdown hook 25/06/02 20:17:09 INFO SparkUI: Stopped Spark web UI at http://hadoopmaster:4040 25/06/02 20:17:09 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 25/06/02 20:17:09 INFO MemoryStore: MemoryStore cleared 25/06/02 20:17:09 INFO BlockManager: BlockManager stopped 25/06/02 20:17:09 INFO BlockManagerMaster: BlockManagerMaster stopped 25/06/02 20:17:09 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 25/06/02 20:17:10 INFO SparkContext: Successfully stopped SparkContext 25/06/02 20:17:10 INFO ShutdownHookManager: Shutdown hook called 25/06/02 20:17:10 INFO ShutdownHookManager: Deleting directory /tmp/spark-213e3a91-227c-4e0a-8254-ab5c0f00786d Process finished with exit code 1
06-04
https://github.com/sophgo/LLM-TPU/blob/main/docs/Quick_Start.md 按照链接地址的PCIE如何跑通Demo方法 docker run --privileged --name mlir -v /dev:/dev -v $PWD:/workspace -it sophgo/tpuc_dev:latest bash docker exec -it mlir bash 重新启动了容器运行脚本,报错 root@d85ad4b84299:/workspace/LLM-TPU# ./run.sh --model llama2-7b + model_to_demo=(['chatglm2-6b']='ChatGLM2' ['chatglm3-6b']='ChatGLM3' ['llama2-7b']='Llama2' ['llama3-7b']='Llama3' ['qwen-7b']='Qwen' ['qwen1.5-1.8b']='Qwen1_5' ['qwen2.5-7b']='Qwen2_5' ['wizardcoder-15b']='WizardCoder' ['lwm-text-chat']='LWM' ['internvl2-4b']='InternVL2' ['minicpmv2_6']='MiniCPM-V-2_6' ['molmo-7b']='Molmo') + declare -A model_to_demo + parse_args --model llama2-7b + [[ 2 -gt 0 ]] + key=--model + case $key in + model=llama2-7b + shift 2 + [[ 0 -gt 0 ]] + compare_date=20240110 + '[' == pcie ']' ./run.sh: line 51: [: ==: unary operator expected + '[' = soc ']' ./run.sh: line 54: [: =: unary operator expected + '[' '' -lt 20240110 ']' ./run.sh: line 58: [: : integer expression expected + echo 'Driver date is , which is up to date. Continuing...' Driver date is , which is up to date. Continuing... + validate_model llama2-7b + local model=llama2-7b + [[ ! -n Llama2 ]] + return 0 + pushd ./models/Llama2 /workspace/LLM-TPU/models/Llama2 /workspace/LLM-TPU + ./run_demo.sh Bmodel Exists! cpython.so exists! /workspace/LLM-TPU/models/Llama2 Load ./support/token_config ... 2025-05-12 14:18:43.285676: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. Traceback (most recent call last): File "/workspace/LLM-TPU/models/Llama2/python_demo/pipeline.py", line 216, in <module> main(args) File "/workspace/LLM-TPU/models/Llama2/python_demo/pipeline.py", line 197, in main model = Llama2(args) File "/workspace/LLM-TPU/models/Llama2/python_demo/pipeline.py", line 31, in __init__ self.load_model(args) File "/workspace/LLM-TPU/models/Llama2/python_demo/pipeline.py", line 36, in load_model import chat ImportError: libbmrt.so.1.0: cannot open shared object file: No such file or directory root@d85ad4b84299:/workspace/LLM-TPU# ls bmodels/ llama2-7b_int4_1dev_seq512.bmodel qwen1.5-1.8b_int4_1dev_seq512.bmodel
05-14
INFO 07-06 22:09:23 __init__.py:207] Automatically detected platform cuda. (VllmWorkerProcess pid=10542) INFO 07-06 22:09:23 multiproc_worker_utils.py:229] Worker ready; awaiting tasks (VllmWorkerProcess pid=10542) INFO 07-06 22:09:24 cuda.py:178] Cannot use FlashAttention-2 backend for Volta and Turing GPUs. (VllmWorkerProcess pid=10542) INFO 07-06 22:09:24 cuda.py:226] Using XFormers backend. (VllmWorkerProcess pid=10542) INFO 07-06 22:09:25 utils.py:916] Found nccl from library libnccl.so.2 (VllmWorkerProcess pid=10542) INFO 07-06 22:09:25 pynccl.py:69] vLLM is using nccl==2.21.5 INFO 07-06 22:09:25 utils.py:916] Found nccl from library libnccl.so.2 INFO 07-06 22:09:25 pynccl.py:69] vLLM is using nccl==2.21.5 INFO 07-06 22:09:25 custom_all_reduce_utils.py:206] generating GPU P2P access cache in /root/.cache/vllm/gpu_p2p_access_cache_for_0,1.json INFO 07-06 22:09:39 custom_all_reduce_utils.py:244] reading GPU P2P access cache from /root/.cache/vllm/gpu_p2p_access_cache_for_0,1.json (VllmWorkerProcess pid=10542) INFO 07-06 22:09:39 custom_all_reduce_utils.py:244] reading GPU P2P access cache from /root/.cache/vllm/gpu_p2p_access_cache_for_0,1.json WARNING 07-06 22:09:39 custom_all_reduce.py:145] Custom allreduce is disabled because your platform lacks GPU P2P capability or P2P test failed. To silence this warning, specify disable_custom_all_reduce=True explicitly. (VllmWorkerProcess pid=10542) WARNING 07-06 22:09:39 custom_all_reduce.py:145] Custom allreduce is disabled because your platform lacks GPU P2P capability or P2P test failed. To silence this warning, specify disable_custom_all_reduce=True explicitly. INFO 07-06 22:09:39 shm_broadcast.py:258] vLLM message queue communication handle: Handle(connect_ip='127.0.0.1', local_reader_ranks=[1], buffer_handle=(1, 4194304, 6, 'psm_b99a0acb'), local_subscribe_port=57279, remote_subscribe_port=None) INFO 07-06 22:09:39 model_runner.py:1110] Starting to load model /home/user/Desktop/DeepSeek-R1-Distill-Qwen-32B... (VllmWorkerProcess pid=10542) INFO 07-06 22:09:39 model_runner.py:1110] Starting to load model /home/user/Desktop/DeepSeek-R1-Distill-Qwen-32B... ERROR 07-06 22:09:39 engine.py:400] CUDA out of memory. Tried to allocate 136.00 MiB. GPU 0 has a total capacity of 21.49 GiB of which 43.50 MiB is free. Including non-PyTorch memory, this process has 21.44 GiB memory in use. Of the allocated memory 21.19 GiB is allocated by PyTorch, and 20.25 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) ERROR 07-06 22:09:39 engine.py:400] Traceback (most recent call last): ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 391, in run_mp_engine ERROR 07-06 22:09:39 engine.py:400] engine = MQLLMEngine.from_engine_args(engine_args=engine_args, ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 124, in from_engine_args ERROR 07-06 22:09:39 engine.py:400] return cls(ipc_path=ipc_path, ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 76, in __init__ ERROR 07-06 22:09:39 engine.py:400] self.engine = LLMEngine(*args, **kwargs) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 273, in __init__ ERROR 07-06 22:09:39 engine.py:400] self.model_executor = executor_class(vllm_config=vllm_config, ) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 271, in __init__ ERROR 07-06 22:09:39 engine.py:400] super().__init__(*args, **kwargs) ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 52, in __init__ ERROR 07-06 22:09:39 engine.py:400] self._init_executor() ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/mp_distributed_executor.py", line 125, in _init_executor ERROR 07-06 22:09:39 engine.py:400] self._run_workers("load_model", ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers ERROR 07-06 22:09:39 engine.py:400] driver_worker_output = run_method(self.driver_worker, sent_method, ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/utils.py", line 2196, in run_method ERROR 07-06 22:09:39 engine.py:400] return func(*args, **kwargs) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/worker/worker.py", line 183, in load_model ERROR 07-06 22:09:39 engine.py:400] self.model_runner.load_model() ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1112, in load_model ERROR 07-06 22:09:39 engine.py:400] self.model = get_model(vllm_config=self.vllm_config) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model ERROR 07-06 22:09:39 engine.py:400] return loader.load_model(vllm_config=vllm_config) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 406, in load_model ERROR 07-06 22:09:39 engine.py:400] model = _initialize_model(vllm_config=vllm_config) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 125, in _initialize_model ERROR 07-06 22:09:39 engine.py:400] return model_class(vllm_config=vllm_config, prefix=prefix) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 453, in __init__ ERROR 07-06 22:09:39 engine.py:400] self.model = Qwen2Model(vllm_config=vllm_config, ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/compilation/decorators.py", line 151, in __init__ ERROR 07-06 22:09:39 engine.py:400] old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs) ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 307, in __init__ ERROR 07-06 22:09:39 engine.py:400] self.start_layer, self.end_layer, self.layers = make_layers( ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 557, in make_layers ERROR 07-06 22:09:39 engine.py:400] [PPMissingLayer() for _ in range(start_layer)] + [ ERROR 07-06 22:09:39 engine.py:400] ^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 558, in <listcomp> ERROR 07-06 22:09:39 engine.py:400] maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}")) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 309, in <lambda> ERROR 07-06 22:09:39 engine.py:400] lambda prefix: Qwen2DecoderLayer(config=config, ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 220, in __init__ ERROR 07-06 22:09:39 engine.py:400] self.mlp = Qwen2MLP( ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 82, in __init__ ERROR 07-06 22:09:39 engine.py:400] self.down_proj = RowParallelLinear( ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 1062, in __init__ ERROR 07-06 22:09:39 engine.py:400] self.quant_method.create_weights( ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 129, in create_weights ERROR 07-06 22:09:39 engine.py:400] weight = Parameter(torch.empty(sum(output_partition_sizes), ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/torch/utils/_device.py", line 106, in __torch_function__ ERROR 07-06 22:09:39 engine.py:400] return func(*args, **kwargs) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 136.00 MiB. GPU 0 has a total capacity of 21.49 GiB of which 43.50 MiB is free. Including non-PyTorch memory, this process has 21.44 GiB memory in use. Of the allocated memory 21.19 GiB is allocated by PyTorch, and 20.25 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) Process SpawnProcess-1: INFO 07-06 22:09:39 multiproc_worker_utils.py:128] Killing local vLLM worker processes Traceback (most recent call last): File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 402, in run_mp_engine raise e File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 391, in run_mp_engine engine = MQLLMEngine.from_engine_args(engine_args=engine_args, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 124, in from_engine_args return cls(ipc_path=ipc_path, ^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 76, in __init__ self.engine = LLMEngine(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 273, in __init__ self.model_executor = executor_class(vllm_config=vllm_config, ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 271, in __init__ super().__init__(*args, **kwargs) File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 52, in __init__ self._init_executor() File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/mp_distributed_executor.py", line 125, in _init_executor self._run_workers("load_model", File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers driver_worker_output = run_method(self.driver_worker, sent_method, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/utils.py", line 2196, in run_method return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/worker/worker.py", line 183, in load_model self.model_runner.load_model() File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1112, in load_model self.model = get_model(vllm_config=self.vllm_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model return loader.load_model(vllm_config=vllm_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 406, in load_model model = _initialize_model(vllm_config=vllm_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 125, in _initialize_model return model_class(vllm_config=vllm_config, prefix=prefix) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 453, in __init__ self.model = Qwen2Model(vllm_config=vllm_config, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/compilation/decorators.py", line 151, in __init__ old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs) File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 307, in __init__ self.start_layer, self.end_layer, self.layers = make_layers( ^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 557, in make_layers [PPMissingLayer() for _ in range(start_layer)] + [ ^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 558, in <listcomp> maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}")) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 309, in <lambda> lambda prefix: Qwen2DecoderLayer(config=config, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 220, in __init__ self.mlp = Qwen2MLP( ^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 82, in __init__ self.down_proj = RowParallelLinear( ^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 1062, in __init__ self.quant_method.create_weights( File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 129, in create_weights weight = Parameter(torch.empty(sum(output_partition_sizes), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/torch/utils/_device.py", line 106, in __torch_function__ return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 136.00 MiB. GPU 0 has a total capacity of 21.49 GiB of which 43.50 MiB is free. Including non-PyTorch memory, this process has 21.44 GiB memory in use. Of the allocated memory 21.19 GiB is allocated by PyTorch, and 20.25 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) [rank0]:[W706 22:09:40.345098611 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator()) Traceback (most recent call last): File "/root/miniconda3/envs/deepseek_vllm/bin/vllm", line 8, in <module> sys.exit(main()) ^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 73, in main args.dispatch_function(args) File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 34, in cmd uvloop.run(run_server(args)) File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/uvloop/__init__.py", line 105, in run return runner.run(wrapper()) ^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/uvloop/__init__.py", line 61, in wrapper return await main ^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 947, in run_server async with build_async_engine_client(args) as engine_client: File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 139, in build_async_engine_client async with build_async_engine_client_from_engine_args( File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 233, in build_async_engine_client_from_engine_args raise RuntimeError( RuntimeError: Engine process failed to start. See stack trace for the root cause. (deepseek_vllm) root@user-X99:/home/user/Desktop# /root/miniconda3/envs/deepseek_vllm/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' (deepseek_vllm) root@user-X99:/home/user/Desktop#
07-07
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值