Xinference：大模型部署与分布式推理框架（三）命令行工具——启动模型、引擎参数、其他操作

最新推荐文章于 2025-11-11 15:35:22 发布

原创最新推荐文章于 2025-11-11 15:35:22 发布 · 4.1k 阅读

11 ·

CC 4.0 BY-SA版权

文章标签：

#分布式 #人工智能 #大模型 #AI大模型 #AI #大模型部署 #Xinference

部署运行你感兴趣的模型镜像

三、命令行工具

1、概述

Xinference提供了管理模型整个生命周期的能力。同样也可以使用命令行、cURL以及Python代码来管理

执行以下命令以安装xinference命令行工具

pip install xinferenc

查看帮助命令

(xinference) root@master:~# xinference --help
Usage: xinference [OPTIONS] COMMAND [ARGS]...

  Xinference command-line interface for serving and deploying models.

Options:
  -v, --version       Show the current version of the Xinference tool.
  --log-level TEXT    Set the logger level. Options listed from most log to
                      least log are: DEBUG > INFO > WARNING > ERROR > CRITICAL
                      (Default level is INFO)
  -H, --host TEXT     Specify the host address for the Xinference server.
  -p, --port INTEGER  Specify the port number for the Xinference server.
  --help              Show this message and exit.

Commands:
  cached         List all cached models in Xinference.
  cal-model-mem  calculate gpu mem usage with specified model size and...
  chat           Chat with a running LLM.
  engine         Query the applicable inference engine by model name.
  generate       Generate text using a running LLM.
  launch         Launch a model with the Xinference framework with the...
  list           List all running models in Xinference.
  login          Login when the cluster is authenticated.
  register       Register a new model with Xinference for deployment.
  registrations  List all registered models in Xinference.
  remove-cache   Remove selected cached models in Xinference.
  stop-cluster   Stop a cluster using the Xinference framework with the...
  terminate      Terminate a deployed model through unique identifier...
  unregister     Unregister a model from Xinference, removing it from...
  vllm-models    Query and display models compatible with vLLM.

2、启动模型

使用Xinference框架启动一个模型，Xinference提供了xinference launch命令帮助查询相关的参数配置。

(xinference) root@master:~# xinference launch --help
Usage: xinference launch [OPTIONS]

  Launch a model with the Xinference framework with the given parameters.

Options:
  -e, --endpoint TEXT             Xinference endpoint.
  -n, --model-name TEXT           Provide the name of the model to be
                                  launched.  [required]
  -t, --model-type TEXT           Specify type of model, LLM as default.
  -en, --model-engine TEXT        Specify the inference engine of the model
                                  when launching LLM.
  -u, --model-uid TEXT            Specify UID of model, default is None.
  -s, --size-in-billions TEXT     Specify the model size in billions of
                                  parameters.
  -f, --model-format TEXT         Specify the format of the model, e.g.
                                  pytorch, ggmlv3, etc.
  -q, --quantization TEXT         Define the quantization settings for the
                                  model.
  -r, --replica INTEGER           The replica count of the model, default is
                                  1.
  --n-gpu TEXT                    The number of GPUs used by the model,
                                  default is "auto".
  -lm, --lora-modules <TEXT TEXT>...
                                  LoRA module configurations in the format
                                  name=path. Multiple modules can be
                                  specified.
  -ld, --image-lora-load-kwargs <TEXT TEXT>...
  -fd, --image-lora-fuse-kwargs <TEXT TEXT>...
  --worker-ip TEXT                Specify which worker this model runs on by
                                  ip, for distributed situation.
  --gpu-idx TEXT                  Specify which GPUs of a worker this model
                                  can run on, separated with commas.
  --trust-remote-code BOOLEAN     Whether or not to allow for custom models
                                  defined on the Hub in their own modeling
                                  files.
  -ak, --api-key TEXT             Api-Key for access xinference api with
                                  authorization.
  --help                          Show this message and exit.
(xinference) root@master:~# xinference launch --help

启动一个模型：

xinference launch --model-engine transformers --model-uid my-llm --model-name chatglm3 --quantization 4-bit --size-in-billions 6 --model-format pytorch

参数说明：

--model-engine transformers：指定模型的推理引擎
--model-uid：指定模型的UID，如果没有指定，则随机生成一个ID
--model-name：指定模型名称
--quantization: 指定模型量化精度
--size-in-billions：指定模型参数大小,以十亿为单位
--model-format：指定模型的格式

成功启动日志如下：

(xinference) root@master:~# xinference launch --model-engine transformers --model-uid myllm --model-name chatglm3 --quantization 4-bit --size-in-billions 6 --model-format pytorch
Launch model name: chatglm3 with kwargs: {}
Model uid: myllm

访问http://localhost:9777，查看已运行的模型

3、引擎参数

当加载LLM模型时，推理引擎与模型的参数息息相关。Xinference提供了xinference engine命令帮助查询相关的参数组合。

(xinference) root@master:~# xinference engine --help
Usage: xinference engine [OPTIONS]

  Query the applicable inference engine by model name.

Options:
  -n, --model-name TEXT           The model name you want to query.
                                  [required]
  -en, --model-engine TEXT        Specify the `model_engine` to query the
                                  corresponding combination of other
                                  parameters.
  -f, --model-format TEXT         Specify the `model_format` to query the
                                  corresponding combination of other
                                  parameters.
  -s, --model-size-in-billions TEXT
                                  Specify the `model_size_in_billions` to
                                  query the corresponding combination of other
                                  parameters.
  -q, --quantization TEXT         Specify the `quantization` to query the
                                  corresponding combination of other
                                  parameters.
  -e, --endpoint TEXT             Xinference endpoint.
  -ak, --api-key TEXT             Api-Key for access xinference api with
                                  authorization.
  --help                          Show this message and exit.

1.查询与chatglm3模型相关的参数组合，以决定它能够怎样跑在各种推理引擎上。

(xinference) root@master:~# xinference engine --model-name chatglm3
Name      Engine        Format      Size (in billions)  Quantization
--------  ------------  --------  --------------------  --------------
chatglm3  Transformers  pytorch                      6  4-bit
chatglm3  Transformers  pytorch                      6  8-bit
chatglm3  Transformers  pytorch                      6  none
chatglm3  vLLM          pytorch                      6  none

2.想将chatglm3跑在vllm、transformers推理引擎上，但是不知道什么样的其他参数符合这个要求

(xinference) root@master:~# xinference engine --model-name chatglm3 --model-engine vllm
Name      Engine    Format      Size (in billions)  Quantization
--------  --------  --------  --------------------  --------------
chatglm3  vLLM      pytorch                      6  none

(xinference) root@master:~#  xinference engine --model-name chatglm3 --model-engine transformers
Name      Engine        Format      Size (in billions)  Quantization
--------  ------------  --------  --------------------  --------------
chatglm3  Transformers  pytorch                      6  4-bit
chatglm3  Transformers  pytorch                      6  8-bit
chatglm3  Transformers  pytorch                      6  none

3.加载GGUF格式的qwen-chat模型，需要知道其余的参数组合

chatglm3模型不支持参数： --model-format ggufv2

(xinference) root@master:~# xinference engine --model-name qwen-chat -f ggufv2
Name       Engine     Format      Size (in billions)  Quantization
---------  ---------  --------  --------------------  --------------
qwen-chat  llama.cpp  ggufv2                       7  Q4_K_M
qwen-chat  llama.cpp  ggufv2                      14  Q4_K_M

4、其他操作

列出所有 Xinference 支持的指定类型的模型：

xinference registrations -t LLM

列出所有在运行的模型：

xinference list

当不需要某个正在运行的模型，可以通过以下的方式来停止它并释放资源：

xinference terminate --model-uid "my-llm"

如何学习大模型？

学习AI大模型是一个系统的过程，需要从基础开始，逐步深入到更高级的技术。

这里给大家精心整理了一份全面的AI大模型学习资源，包括：AI大模型全套学习路线图（从入门到实战）、精品AI大模型学习书籍手册、视频教程、实战学习、面试题等，资料免费分享！

1. 成长路线图&学习规划

要学习一门新的技术，作为新手一定要先学习成长路线图，方向不对，努力白费。

这里，我们为新手和想要进一步提升的专业人士准备了一份详细的学习成长路线图和规划。可以说是最科学最系统的学习成长路线。
在这里插入图片描述

2. 大模型经典PDF书籍

书籍和学习文档资料是学习大模型过程中必不可少的，我们精选了一系列深入探讨大模型技术的书籍和学习文档，它们由领域内的顶尖专家撰写，内容全面、深入、详尽，为你学习大模型提供坚实的理论基础。（书籍含电子版PDF）

在这里插入图片描述

3. 大模型视频教程

对于很多自学或者没有基础的同学来说，书籍这些纯文字类的学习教材会觉得比较晦涩难以理解，因此，我们提供了丰富的大模型视频教程，以动态、形象的方式展示技术概念，帮助你更快、更轻松地掌握核心知识。

在这里插入图片描述

4. 大模型项目实战

学以致用 ，当你的理论知识积累到一定程度，就需要通过项目实战，在实际操作中检验和巩固你所学到的知识，同时为你找工作和职业发展打下坚实的基础。

在这里插入图片描述

5. 大模型面试题

面试不仅是技术的较量，更需要充分的准备。

在你已经掌握了大模型技术之后，就需要开始准备面试，我们将提供精心整理的大模型面试题库，涵盖当前面试中可能遇到的各种技术问题，让你在面试中游刃有余。

在这里插入图片描述

全套的AI大模型学习资源已经整理打包，有需要的小伙伴可以微信扫描下方优快云官方认证二维码，免费领取【保证100%免费】

如有侵权，请联系删除

您可能感兴趣的与本文相关的镜像

Python3.9

Conda

Python

Python 是一种高级、解释型、通用的编程语言，以其简洁易读的语法而闻名，适用于广泛的应用，包括Web开发、数据分析、人工智能和自动化脚本