using jupyterhub to access multi conda envs on your remote server

本文介绍如何使用conda安装JupyterHub和nb_conda_kernels,这两个工具可以增强你的数据科学工作流程,使你在Jupyter环境中更方便地管理和使用不同的Python环境。
部署运行你感兴趣的模型镜像

conda install jupyterhub
conda install nb_conda_kernels

and you are good to go!

您可能感兴趣的与本文相关的镜像

Python3.8

Python3.8

Conda
Python

Python 是一种高级、解释型、通用的编程语言,以其简洁易读的语法而闻名,适用于广泛的应用,包括Web开发、数据分析、人工智能和自动化脚本

(vllm) zhzx@zhzx-S2600WF-LS:/media/zhzx/ssd2/Qwen3-32B$ vllm serve "/media/zhzx/ssd2/Qwen3-32B" --host 0.0.0.0 --port 8060 --dtype bfloat16 --tensor-parallel-size 2 --cpu-offload-gb 20 --gpu-memory-utilization 0.8 --max-model-len 8126 --api-key token-abc123 --enable-prefix-caching --trust-remote-code \ > INFO 05-10 20:06:33 [__init__.py:239] Automatically detected platform cuda. INFO 05-10 20:06:36 [api_server.py:1043] vLLM API server version 0.8.5.post1 INFO 05-10 20:06:36 [api_server.py:1044] args: Namespace(subparser='serve', model_tag='/media/zhzx/ssd2/Qwen3-32B', config='', host='0.0.0.0', port=8060, uvicorn_log_level='info', disable_uvicorn_access_log=False, allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key='token-abc123', lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, enable_ssl_refresh=False, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, tool_call_parser=None, tool_parser_plugin='', model='/media/zhzx/ssd2/Qwen3-32B', task='auto', tokenizer=None, hf_config_path=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=True, allowed_local_media_path=None, load_format='auto', download_dir=None, model_loader_extra_config={}, use_tqdm_on_load=True, config_format=<ConfigFormat.AUTO: 'auto'>, dtype='bfloat16', max_model_len=8126, guided_decoding_backend='auto', reasoning_parser=None, logits_processor_pattern=None, model_impl='auto', distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=2, data_parallel_size=1, enable_expert_parallel=False, max_parallel_loading_workers=None, ray_workers_use_nsight=False, disable_custom_all_reduce=False, block_size=None, gpu_memory_utilization=0.8, swap_space=4, kv_cache_dtype='auto', num_gpu_blocks_override=None, enable_prefix_caching=True, prefix_caching_hash_algo='builtin', cpu_offload_gb=20.0, calculate_kv_scales=False, disable_sliding_window=False, use_v2_block_manager=True, seed=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_token=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config={}, limit_mm_per_prompt={}, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=None, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=None, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', speculative_config=None, ignore_patterns=[], served_model_name=None, qlora_adapter_name_or_path=None, show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, max_num_batched_tokens=None, max_num_seqs=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, num_lookahead_slots=0, scheduler_delay_factor=0.0, preemption_mode=None, num_scheduler_steps=1, multi_step_stream_outputs=True, scheduling_policy='fcfs', enable_chunked_prefill=None, disable_chunked_mm_input=False, scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', worker_extension_cls='', generation_config='auto', override_generation_config=None, enable_sleep_mode=False, additional_config=None, enable_reasoning=False, disable_cascade_attn=False, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False, enable_server_load_tracking=False, dispatch_function=<function ServeSubcommand.cmd at 0x7f625a071bc0>) WARNING 05-10 20:06:36 [config.py:2972] Casting torch.float16 to torch.bfloat16. INFO 05-10 20:06:42 [config.py:717] This model supports multiple tasks: {'reward', 'embed', 'generate', 'classify', 'score'}. Defaulting to 'generate'. WARNING 05-10 20:06:42 [arg_utils.py:1658] Compute Capability < 8.0 is not supported by the V1 Engine. Falling back to V0. INFO 05-10 20:06:42 [config.py:1770] Defaulting to use mp for distributed inference INFO 05-10 20:06:42 [api_server.py:246] Started engine process with PID 73230 INFO 05-10 20:06:45 [__init__.py:239] Automatically detected platform cuda. INFO 05-10 20:06:47 [llm_engine.py:240] Initializing a V0 LLM engine (v0.8.5.post1) with config: model='/media/zhzx/ssd2/Qwen3-32B', speculative_config=None, tokenizer='/media/zhzx/ssd2/Qwen3-32B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=8126, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='auto', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=/media/zhzx/ssd2/Qwen3-32B, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=True, WARNING 05-10 20:06:48 [multiproc_worker_utils.py:306] Reducing Torch parallelism from 16 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed. INFO 05-10 20:06:48 [cuda.py:240] Cannot use FlashAttention-2 backend for Volta and Turing GPUs. INFO 05-10 20:06:48 [cuda.py:289] Using XFormers backend. INFO 05-10 20:06:50 [__init__.py:239] Automatically detected platform cuda. (VllmWorkerProcess pid=73270) INFO 05-10 20:06:53 [multiproc_worker_utils.py:225] Worker ready; awaiting tasks (VllmWorkerProcess pid=73270) INFO 05-10 20:06:53 [cuda.py:240] Cannot use FlashAttention-2 backend for Volta and Turing GPUs. (VllmWorkerProcess pid=73270) INFO 05-10 20:06:53 [cuda.py:289] Using XFormers backend. ERROR 05-10 20:06:53 [engine.py:448] Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Quadro RTX 6000 GPU has compute capability 7.5. You can use float16 instead by explicitly setting the `dtype` flag in CLI, for example: --dtype=half. ERROR 05-10 20:06:53 [engine.py:448] Traceback (most recent call last): ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 436, in run_mp_engine ERROR 05-10 20:06:53 [engine.py:448] engine = MQLLMEngine.from_vllm_config( ERROR 05-10 20:06:53 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 128, in from_vllm_config ERROR 05-10 20:06:53 [engine.py:448] return cls( ERROR 05-10 20:06:53 [engine.py:448] ^^^^ ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 82, in __init__ ERROR 05-10 20:06:53 [engine.py:448] self.engine = LLMEngine(*args, **kwargs) ERROR 05-10 20:06:53 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 275, in __init__ ERROR 05-10 20:06:53 [engine.py:448] self.model_executor = executor_class(vllm_config=vllm_config) ERROR 05-10 20:06:53 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 286, in __init__ ERROR 05-10 20:06:53 [engine.py:448] super().__init__(*args, **kwargs) ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 52, in __init__ ERROR 05-10 20:06:53 [engine.py:448] self._init_executor() ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/mp_distributed_executor.py", line 124, in _init_executor ERROR 05-10 20:06:53 [engine.py:448] self._run_workers("init_device") ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers ERROR 05-10 20:06:53 [engine.py:448] driver_worker_output = run_method(self.driver_worker, sent_method, ERROR 05-10 20:06:53 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/utils.py", line 2456, in run_method ERROR 05-10 20:06:53 [engine.py:448] return func(*args, **kwargs) ERROR 05-10 20:06:53 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^ ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/worker/worker_base.py", line 604, in init_device ERROR 05-10 20:06:53 [engine.py:448] self.worker.init_device() # type: ignore ERROR 05-10 20:06:53 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/worker/worker.py", line 177, in init_device ERROR 05-10 20:06:53 [engine.py:448] _check_if_gpu_supports_dtype(self.model_config.dtype) ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/worker/worker.py", line 546, in _check_if_gpu_supports_dtype ERROR 05-10 20:06:53 [engine.py:448] raise ValueError( ERROR 05-10 20:06:53 [engine.py:448] ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Quadro RTX 6000 GPU has compute capability 7.5. You can use float16 instead by explicitly setting the `dtype` flag in CLI, for example: --dtype=half. Process SpawnProcess-1: ERROR 05-10 20:06:53 [multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 73270 died, exit code: -15 INFO 05-10 20:06:53 [multiproc_worker_utils.py:124] Killing local vLLM worker processes Traceback (most recent call last): File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 450, in run_mp_engine raise e File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 436, in run_mp_engine engine = MQLLMEngine.from_vllm_config( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 128, in from_vllm_config return cls( ^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 82, in __init__ self.engine = LLMEngine(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 275, in __init__ self.model_executor = executor_class(vllm_config=vllm_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 286, in __init__ super().__init__(*args, **kwargs) File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 52, in __init__ self._init_executor() File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/mp_distributed_executor.py", line 124, in _init_executor self._run_workers("init_device") File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers driver_worker_output = run_method(self.driver_worker, sent_method, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/utils.py", line 2456, in run_method return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/worker/worker_base.py", line 604, in init_device self.worker.init_device() # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/worker/worker.py", line 177, in init_device _check_if_gpu_supports_dtype(self.model_config.dtype) File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/worker/worker.py", line 546, in _check_if_gpu_supports_dtype raise ValueError( ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Quadro RTX 6000 GPU has compute capability 7.5. You can use float16 instead by explicitly setting the `dtype` flag in CLI, for example: --dtype=half. Traceback (most recent call last): File "/home/zhzx/miniconda3/envs/vllm/bin/vllm", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 53, in main args.dispatch_function(args) File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/cli/serve.py", line 27, in cmd uvloop.run(run_server(args)) File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/uvloop/__init__.py", line 109, in run return __asyncio.run( ^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/asyncio/runners.py", line 195, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/uvloop/__init__.py", line 61, in wrapper return await main ^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1078, in run_server async with build_async_engine_client(args) as engine_client: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 146, in build_async_engine_client async with build_async_engine_client_from_engine_args( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 269, in build_async_engine_client_from_engine_args raise RuntimeError( RuntimeError: Engine process failed to start. See stack trace for the root cause.
05-11
(vllm) zhzx@zhzx-S2600WF-LS:/media/zhzx/ssd2/Qwen3-32B-AWQ$ CUDA_VISIBLE_DEVICES=0,1 python -m vllm.entrypoints.api_server \ --model /media/zhzx/ssd2/Qwen3-32B-AWQ \ --tensor-parallel-size 2 \ --quantization awq \ --trust-remote-code INFO 05-12 11:01:48 [__init__.py:239] Automatically detected platform cuda. INFO 05-12 11:01:50 [api_server.py:121] vLLM API server version 0.8.5.post1 INFO 05-12 11:01:50 [api_server.py:122] args: Namespace(host=None, port=8000, ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, enable_ssl_refresh=False, ssl_cert_reqs=0, root_path=None, log_level='debug', model='/media/zhzx/ssd2/Qwen3-32B-AWQ', task='auto', tokenizer=None, hf_config_path=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=True, allowed_local_media_path=None, load_format='auto', download_dir=None, model_loader_extra_config={}, use_tqdm_on_load=True, config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', max_model_len=None, guided_decoding_backend='auto', reasoning_parser=None, logits_processor_pattern=None, model_impl='auto', distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=2, data_parallel_size=1, enable_expert_parallel=False, max_parallel_loading_workers=None, ray_workers_use_nsight=False, disable_custom_all_reduce=False, block_size=None, gpu_memory_utilization=0.9, swap_space=4, kv_cache_dtype='auto', num_gpu_blocks_override=None, enable_prefix_caching=None, prefix_caching_hash_algo='builtin', cpu_offload_gb=0, calculate_kv_scales=False, disable_sliding_window=False, use_v2_block_manager=True, seed=None, max_logprobs=20, disable_log_stats=False, quantization='awq', rope_scaling=None, rope_theta=None, hf_token=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config={}, limit_mm_per_prompt={}, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=None, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=None, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', speculative_config=None, ignore_patterns=[], served_model_name=None, qlora_adapter_name_or_path=None, show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, max_num_batched_tokens=None, max_num_seqs=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, num_lookahead_slots=0, scheduler_delay_factor=0.0, preemption_mode=None, num_scheduler_steps=1, multi_step_stream_outputs=True, scheduling_policy='fcfs', enable_chunked_prefill=None, disable_chunked_mm_input=False, scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', worker_extension_cls='', generation_config='auto', override_generation_config=None, enable_sleep_mode=False, additional_config=None, enable_reasoning=False, disable_cascade_attn=False, disable_log_requests=False) INFO 05-12 11:01:56 [config.py:717] This model supports multiple tasks: {'score', 'reward', 'classify', 'embed', 'generate'}. Defaulting to 'generate'. WARNING 05-12 11:01:57 [config.py:830] awq quantization is not fully optimized yet. The speed can be slower than non-quantized models. WARNING 05-12 11:01:57 [arg_utils.py:1658] Compute Capability < 8.0 is not supported by the V1 Engine. Falling back to V0. WARNING 05-12 11:01:57 [arg_utils.py:1525] Chunked prefill is enabled by default for models with max_model_len > 32K. Chunked prefill might not work with some features or models. If you encounter any issues, please disable by launching with --enable-chunked-prefill=False. INFO 05-12 11:01:57 [config.py:1770] Defaulting to use mp for distributed inference INFO 05-12 11:01:57 [config.py:2003] Chunked prefill is enabled with max_num_batched_tokens=2048. INFO 05-12 11:01:57 [llm_engine.py:240] Initializing a V0 LLM engine (v0.8.5.post1) with config: model='/media/zhzx/ssd2/Qwen3-32B-AWQ', speculative_config=None, tokenizer='/media/zhzx/ssd2/Qwen3-32B-AWQ', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=40960, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=awq, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='auto', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=/media/zhzx/ssd2/Qwen3-32B-AWQ, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_prefill_enabled=True, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False, WARNING 05-12 11:01:57 [multiproc_worker_utils.py:306] Reducing Torch parallelism from 16 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed. (VllmWorkerProcess pid=100643) INFO 05-12 11:01:57 [multiproc_worker_utils.py:225] Worker ready; awaiting tasks INFO 05-12 11:01:57 [cuda.py:240] Cannot use FlashAttention-2 backend for Volta and Turing GPUs. INFO 05-12 11:01:57 [cuda.py:289] Using XFormers backend. (VllmWorkerProcess pid=100643) INFO 05-12 11:01:57 [cuda.py:240] Cannot use FlashAttention-2 backend for Volta and Turing GPUs. (VllmWorkerProcess pid=100643) INFO 05-12 11:01:57 [cuda.py:289] Using XFormers backend. INFO 05-12 11:01:58 [utils.py:1055] Found nccl from library libnccl.so.2 (VllmWorkerProcess pid=100643) INFO 05-12 11:01:58 [utils.py:1055] Found nccl from library libnccl.so.2 INFO 05-12 11:01:58 [pynccl.py:69] vLLM is using nccl==2.21.5 (VllmWorkerProcess pid=100643) INFO 05-12 11:01:58 [pynccl.py:69] vLLM is using nccl==2.21.5 (VllmWorkerProcess pid=100643) INFO 05-12 11:01:58 [custom_all_reduce_utils.py:244] reading GPU P2P access cache from /home/zhzx/.cache/vllm/gpu_p2p_access_cache_for_0,1.json INFO 05-12 11:01:58 [custom_all_reduce_utils.py:244] reading GPU P2P access cache from /home/zhzx/.cache/vllm/gpu_p2p_access_cache_for_0,1.json INFO 05-12 11:01:58 [shm_broadcast.py:266] vLLM message queue communication handle: Handle(local_reader_ranks=[1], buffer_handle=(1, 4194304, 6, 'psm_91501777'), local_subscribe_addr='ipc:///tmp/c7c55d66-4bbe-451b-a9f3-3133e26fdb9b', remote_subscribe_addr=None, remote_addr_ipv6=False) (VllmWorkerProcess pid=100643) INFO 05-12 11:01:58 [parallel_state.py:1004] rank 1 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 1 INFO 05-12 11:01:58 [parallel_state.py:1004] rank 0 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 0 INFO 05-12 11:01:58 [model_runner.py:1108] Starting to load model /media/zhzx/ssd2/Qwen3-32B-AWQ... (VllmWorkerProcess pid=100643) INFO 05-12 11:01:58 [model_runner.py:1108] Starting to load model /media/zhzx/ssd2/Qwen3-32B-AWQ... Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:01<00:04, 1.51s/it] Loading safetensors checkpoint shards: 50% Completed | 2/4 [00:02<00:02, 1.36s/it] Loading safetensors checkpoint shards: 75% Completed | 3/4 [00:04<00:01, 1.52s/it] Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:05<00:00, 1.41s/it] Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:05<00:00, 1.43s/it] (VllmWorkerProcess pid=100643) INFO 05-12 11:02:04 [loader.py:458] Loading weights took 5.72 seconds INFO 05-12 11:02:04 [loader.py:458] Loading weights took 5.79 seconds INFO 05-12 11:02:05 [model_runner.py:1140] Model loading took 9.0568 GiB and 5.991388 seconds (VllmWorkerProcess pid=100643) INFO 05-12 11:02:05 [model_runner.py:1140] Model loading took 9.0568 GiB and 5.932729 seconds (VllmWorkerProcess pid=100643) INFO 05-12 11:02:10 [worker.py:287] Memory profiling takes 5.32 seconds (VllmWorkerProcess pid=100643) INFO 05-12 11:02:10 [worker.py:287] the current vLLM instance can use total_gpu_memory (23.64GiB) x gpu_memory_utilization (0.90) = 21.27GiB (VllmWorkerProcess pid=100643) INFO 05-12 11:02:10 [worker.py:287] model weights take 9.06GiB; non_torch_memory takes 0.13GiB; PyTorch activation peak memory takes 0.41GiB; the rest of the memory reserved for KV Cache is 11.68GiB. INFO 05-12 11:02:10 [worker.py:287] Memory profiling takes 5.46 seconds INFO 05-12 11:02:10 [worker.py:287] the current vLLM instance can use total_gpu_memory (23.64GiB) x gpu_memory_utilization (0.90) = 21.27GiB INFO 05-12 11:02:10 [worker.py:287] model weights take 9.06GiB; non_torch_memory takes 0.13GiB; PyTorch activation peak memory takes 1.40GiB; the rest of the memory reserved for KV Cache is 10.68GiB. INFO 05-12 11:02:11 [executor_base.py:112] # cuda blocks: 5469, # CPU blocks: 2048 INFO 05-12 11:02:11 [executor_base.py:117] Maximum concurrency for 40960 tokens per request: 2.14x INFO 05-12 11:02:13 [model_runner.py:1450] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing `gpu_memory_utilization` or switching to eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage. Capturing CUDA graph shapes: 0%| | 0/35 [00:00<?, ?it/s](VllmWorkerProcess pid=100643) INFO 05-12 11:02:13 [model_runner.py:1450] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing `gpu_memory_utilization` or switching to eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage. Capturing CUDA graph shapes: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:29<00:00, 1.18it/s] INFO 05-12 11:02:43 [custom_all_reduce.py:195] Registering 4515 cuda graph addresses (VllmWorkerProcess pid=100643) INFO 05-12 11:02:43 [custom_all_reduce.py:195] Registering 4515 cuda graph addresses (VllmWorkerProcess pid=100643) INFO 05-12 11:02:43 [model_runner.py:1592] Graph capturing finished in 30 secs, took 0.97 GiB INFO 05-12 11:02:43 [model_runner.py:1592] Graph capturing finished in 30 secs, took 0.97 GiB INFO 05-12 11:02:43 [llm_engine.py:437] init engine (profile, create kv cache, warmup model) took 38.52 seconds INFO 05-12 11:02:43 [launcher.py:28] Available routes are: INFO 05-12 11:02:43 [launcher.py:36] Route: /openapi.json, Methods: GET, HEAD INFO 05-12 11:02:43 [launcher.py:36] Route: /docs, Methods: GET, HEAD INFO 05-12 11:02:43 [launcher.py:36] Route: /docs/oauth2-redirect, Methods: GET, HEAD INFO 05-12 11:02:43 [launcher.py:36] Route: /redoc, Methods: GET, HEAD INFO 05-12 11:02:43 [launcher.py:36] Route: /health, Methods: GET INFO 05-12 11:02:43 [launcher.py:36] Route: /generate, Methods: POST [rank0]: Traceback (most recent call last): [rank0]: File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/starlette/datastructures.py", line 668, in __getattr__ [rank0]: return self._state[key] [rank0]: ~~~~~~~~~~~^^^^^ [rank0]: KeyError: 'engine_client' [rank0]: During handling of the above exception, another exception occurred: [rank0]: Traceback (most recent call last): [rank0]: File "<frozen runpy>", line 198, in _run_module_as_main [rank0]: File "<frozen runpy>", line 88, in _run_code [rank0]: File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/api_server.py", line 177, in <module> [rank0]: asyncio.run(run_server(args)) [rank0]: File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/asyncio/runners.py", line 195, in run [rank0]: return runner.run(main) [rank0]: ^^^^^^^^^^^^^^^^ [rank0]: File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/asyncio/runners.py", line 118, in run [rank0]: return self._loop.run_until_complete(task) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete [rank0]: return future.result() [rank0]: ^^^^^^^^^^^^^^^ [rank0]: File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/api_server.py", line 129, in run_server [rank0]: shutdown_task = await serve_http( [rank0]: ^^^^^^^^^^^^^^^^^ [rank0]: File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/launcher.py", line 46, in serve_http [rank0]: watchdog_loop(server, app.state.engine_client)) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/starlette/datastructures.py", line 671, in __getattr__ [rank0]: raise AttributeError(message.format(self.__class__.__name__, key)) [rank0]: AttributeError: 'State' object has no attribute 'engine_client' [rank0]:[W512 11:02:45.924296727 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) /home/zhzx/miniconda3/envs/vllm/lib/python3.12/multiprocessing/resource_tracker.py:255: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '
05-13
vllm serve /home/fgg/modelscope/Qwen/Qwen2-7b --tensor-parallel-size 2 INFO 12-15 16:17:56 [__init__.py:256] Automatically detected platform cuda. INFO 12-15 16:17:57 [api_server.py:977] vLLM API server version 0.8.0 INFO 12-15 16:17:57 [api_server.py:978] args: Namespace(subparser='serve', model_tag='/home/fgg/modelscope/Qwen/Qwen2-7b', config='', host=None, port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, enable_ssl_refresh=False, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, tool_call_parser=None, tool_parser_plugin='', model='/home/fgg/modelscope/Qwen/Qwen2-7b', task='auto', tokenizer=None, hf_config_path=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, allowed_local_media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', kv_cache_dtype='auto', max_model_len=None, guided_decoding_backend='xgrammar', logits_processor_pattern=None, model_impl='auto', distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=2, enable_expert_parallel=False, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=None, enable_prefix_caching=None, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=None, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, use_tqdm_on_load=True, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=None, qlora_adapter_name_or_path=None, show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', worker_extension_cls='', generation_config='auto', override_generation_config=None, enable_sleep_mode=False, calculate_kv_scales=False, additional_config=None, enable_reasoning=False, reasoning_parser=None, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False, enable_server_load_tracking=False, dispatch_function=<function ServeSubcommand.cmd at 0x7dcce187c360>) `torch_dtype` is deprecated! Use `dtype` instead! INFO 12-15 16:17:59 [config.py:583] This model supports multiple tasks: {'embed', 'generate', 'classify', 'score', 'reward'}. Defaulting to 'generate'. INFO 12-15 16:17:59 [config.py:1515] Defaulting to use mp for distributed inference INFO 12-15 16:17:59 [config.py:1693] Chunked prefill is enabled with max_num_batched_tokens=2048. INFO 12-15 16:18:02 [__init__.py:256] Automatically detected platform cuda. INFO 12-15 16:18:03 [core.py:53] Initializing a V1 LLM engine (v0.8.0) with config: model='/home/fgg/modelscope/Qwen/Qwen2-7b', speculative_config=None, tokenizer='/home/fgg/modelscope/Qwen/Qwen2-7b', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=/home/fgg/modelscope/Qwen/Qwen2-7b, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"level":3,"custom_ops":["none"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":512} WARNING 12-15 16:18:03 [multiproc_worker_utils.py:310] Reducing Torch parallelism from 24 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed. INFO 12-15 16:18:03 [custom_cache_manager.py:19] Setting Triton cache manager to: vllm.triton_utils.custom_cache_manager:CustomCacheManager INFO 12-15 16:18:03 [shm_broadcast.py:258] vLLM message queue communication handle: Handle(local_reader_ranks=[0, 1], buffer_handle=(2, 10485760, 10, 'psm_a8228524'), local_subscribe_addr='ipc:///tmp/e8c2ea83-b990-4e5c-b130-14727f8446ea', remote_subscribe_addr=None, remote_addr_ipv6=False) INFO 12-15 16:18:04 [__init__.py:256] Automatically detected platform cuda. /home/fgg/anaconda3/envs/vllm/lib/python3.12/site-packages/tvm_ffi/_optional_torch_c_dlpack.py:174: UserWarning: Failed to JIT torch c dlpack extension, EnvTensorAllocator will not be enabled. We recommend installing via `pip install torch-c-dlpack-ext` warnings.warn( WARNING 12-15 16:18:05 [utils.py:2282] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7d9b7df17740> (VllmWorker rank=0 pid=3421367) INFO 12-15 16:18:05 [shm_broadcast.py:258] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_c51d43cf'), local_subscribe_addr='ipc:///tmp/88574035-8b98-4161-a458-4490fcd4e6f6', remote_subscribe_addr=None, remote_addr_ipv6=False) INFO 12-15 16:18:07 [__init__.py:256] Automatically detected platform cuda. /home/fgg/anaconda3/envs/vllm/lib/python3.12/site-packages/tvm_ffi/_optional_torch_c_dlpack.py:174: UserWarning: Failed to JIT torch c dlpack extension, EnvTensorAllocator will not be enabled. We recommend installing via `pip install torch-c-dlpack-ext` warnings.warn( WARNING 12-15 16:18:08 [utils.py:2282] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7340826a6c60> (VllmWorker rank=1 pid=3421398) INFO 12-15 16:18:08 [shm_broadcast.py:258] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_098a6598'), local_subscribe_addr='ipc:///tmp/61ebe5ab-789e-4f42-b06b-d0f35704546f', remote_subscribe_addr=None, remote_addr_ipv6=False) (VllmWorker rank=0 pid=3421367) INFO 12-15 16:18:08 [utils.py:925] Found nccl from library libnccl.so.2 (VllmWorker rank=1 pid=3421398) INFO 12-15 16:18:08 [utils.py:925] Found nccl from library libnccl.so.2 (VllmWorker rank=0 pid=3421367) INFO 12-15 16:18:08 [pynccl.py:69] vLLM is using nccl==2.21.5 (VllmWorker rank=1 pid=3421398) INFO 12-15 16:18:08 [pynccl.py:69] vLLM is using nccl==2.21.5 (VllmWorker rank=0 pid=3421367) INFO 12-15 16:18:08 [custom_all_reduce_utils.py:244] reading GPU P2P access cache from /home/fgg/.cache/vllm/gpu_p2p_access_cache_for_0,1.json (VllmWorker rank=1 pid=3421398) INFO 12-15 16:18:08 [custom_all_reduce_utils.py:244] reading GPU P2P access cache from /home/fgg/.cache/vllm/gpu_p2p_access_cache_for_0,1.json (VllmWorker rank=0 pid=3421367) WARNING 12-15 16:18:08 [custom_all_reduce.py:146] Custom allreduce is disabled because your platform lacks GPU P2P capability or P2P test failed. To silence this warning, specify disable_custom_all_reduce=True explicitly. (VllmWorker rank=1 pid=3421398) WARNING 12-15 16:18:08 [custom_all_reduce.py:146] Custom allreduce is disabled because your platform lacks GPU P2P capability or P2P test failed. To silence this warning, specify disable_custom_all_reduce=True explicitly. (VllmWorker rank=0 pid=3421367) INFO 12-15 16:18:08 [shm_broadcast.py:258] vLLM message queue communication handle: Handle(local_reader_ranks=[1], buffer_handle=(1, 4194304, 6, 'psm_7da022e6'), local_subscribe_addr='ipc:///tmp/677d6179-a939-4c2a-945d-e79314a6fa80', remote_subscribe_addr=None, remote_addr_ipv6=False) (VllmWorker rank=1 pid=3421398) INFO 12-15 16:18:08 [parallel_state.py:967] rank 1 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 1 (VllmWorker rank=0 pid=3421367) INFO 12-15 16:18:08 [parallel_state.py:967] rank 0 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 0 (VllmWorker rank=0 pid=3421367) Process SpawnProcess-1:1: (VllmWorker rank=1 pid=3421398) Process SpawnProcess-1:2: CRITICAL 12-15 16:18:08 [multiproc_executor.py:48] MulitprocExecutor got fatal signal from worker processes, shutting down. See stack trace above for root cause issue. CRITICAL 12-15 16:18:08 [core_client.py:269] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue. 已杀死
最新发布
12-16
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值