xinference Error code: 400 - {‘detail‘: ‘[address=0.0.0.0:23015, pid=8661] Model not found,uid

最新推荐文章于 2025-08-31 03:33:55 发布

原创最新推荐文章于 2025-08-31 03:33:55 发布 · 3.2k 阅读

24 ·

CC 4.0 BY-SA版权

文章标签：

#linux #python

该文章已生成可运行项目，

今天使用xinferece 时遇到1个错误,开启的是qwen2-vl-instruct ,7B 模型，GPTQ int4 量化版本，xinference 0.16.1,ubuntu 22.04 系统

送入的message 如下

completion = client_local.chat.completions.create(
    model="qwen2-vl-instruct",   
    messages=[
            {
            "role": "user",
            "content": [
                {
                "type": "image_url",
                "image_url": {"url": f"data:image/jpeg;base64,{step_0}"},
                },
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/jpeg;base64,{step_1}"},
                },
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/jpeg;base64,{step_2}"},
            },
            {"type": "text","text": "这三个步骤有什么区别与联系"},
            ]
            }
    ]
)

从messages 可以看出，有3个图片，一个文本

openai.BadRequestError: Error code: 400 - {'detail': '[address=0.0.0.0:23015, pid=8661] Model not found, uid: qwen2-vl-instruct-1-0'}
我传递的model参数是qwen2-vl-instruct,已确认model="qwen2-vl-instruct"，

同时，在xinference启动的模型里，Model uid 没有进行特别设置，采取的默认，即model uid =model name

刚开始，一直以为是传递参数错了，反复检查传递参数，并进行打印，直到底层openai/_base_client.py，在函数def _request，有一行语句：options = self._prepare_options(options)，得到参数options,获得模型的方式是option.dict()["json_data"]["model"],再向后就是httpx 直接post 调用，目前是打印参数的最佳时机，我每次调用都把它打印出来，结果确认，这个模型参数一直是qwen2-vl-instruct,没有变化

这么说来，每次调用，传递的模型参数是没有错误的，但是问题出现在哪里呢

因为后面还有几行

response = self._client.send(
request,
stream=stream or self._should_stream_response_body(request=request),
**kwargs,
)

这个是httpx 直接调用接口的语句，想到把每次的结果都打印一下，可以看到如下：

in openai/_base_client.py,model:qwen2-vl-instruct
in openai/_base_client.py,response:<Response [500 Internal Server Error]>,response.content:b'{"detail":"[address=0.0.0.0:35301, pid=515686] You set image=2 (or defaulted to 1) in `--limit-mm-per-prompt`, but found 3 items in the same prompt."}'
in openai/_base_client.py,model:qwen2-vl-instruct
in openai/_base_client.py,response:<Response [500 Internal Server Error]>,response.content:b'{"detail":"Remote server 0.0.0.0:35301 closed"}'
in openai/_base_client.py,model:qwen2-vl-instruct
in openai/_base_client.py,response:<Response [400 Bad Request]>,response.content:b'{"detail":"[address=0.0.0.0:23015, pid=8661] Model not found, uid: qwen2-vl-instruct-1-0"}'
可以看到错误不是一次产生的，首先第一次请求时错误，发现了3张图片，它可以处理2张图片，但不能同时处理3张图片，这个xinference model worker (侦听端口35301)无法处理，需要设置参数，

第二次请求时代表这个model worker (侦听端口35301)，由于遇到上述错误，已经关闭了

第三次请求报的错误也就是最终遇到的错误，Model not found，

但问题来了，问什么第三次请求报的错误才是最终的错误，

这里有1个重试机制，就是在出错时，进行重试，代码里的语句是

remaining_retries = options.get_max_retries(self.max_retries) - retries_taken

把剩下的次数也打印出来，再执行一次

执行前保留一下数据，nvidia-smi查看目前model worker 占用进程号

0 N/A N/A 516171 C ...naconda3/envs/xinference/bin/python 16008MiB

516171的是xinference model worker 的进程号，查询一下它目前在那些端口上进行侦听

tcp 0 0 0.0.0.0:39257 0.0.0.0:* LISTEN 516171/python
tcp 0 0 10.18.120.185:46623 0.0.0.0:* LISTEN 516171/python
tcp 0 0 10.18.120.185:43229 0.0.0.0:* LISTEN 516171/python
tcp 0 0 10.18.120.185:43973 0.0.0.0:* LISTEN 516171/python
tcp6 0 0 :::44781 :::* LISTEN 516171/python
可以看出来，它在39257端口上进行侦听

in openai/_base_client.py,model:qwen2-vl-instruct,remain_retries:2
in openai/_base_client.py,response:<Response [500 Internal Server Error]>,response.content:b'{"detail":"[address=0.0.0.0:39257, pid=516171] You set image=2 (or defaulted to 1) in `--limit-mm-per-prompt`, but found 3 items in the same prompt."}'
in openai/_base_client.py,model:qwen2-vl-instruct,remain_retries:1
in openai/_base_client.py,response:<Response [500 Internal Server Error]>,response.content:b'{"detail":"Remote server 0.0.0.0:39257 closed"}'
in openai/_base_client.py,model:qwen2-vl-instruct,remain_retries:0
in openai/_base_client.py,response:<Response [400 Bad Request]>,response.content:b'{"detail":"[address=0.0.0.0:23015, pid=8661] Model not found, uid: qwen2-vl-instruct-1-0"}'

可以看到尝试了3次，model worker 进程是在39257上进行侦听的，并且这个worker 已经死掉了，咱们再看下目前的model worker 是谁，侦听端口号是哪些

0 N/A N/A 516875 C ...naconda3/envs/xinference/bin/python 16008MiB

tcp 0 0 0.0.0.0:43461 0.0.0.0:* LISTEN 516875/python
tcp 0 0 10.18.120.185:37921 0.0.0.0:* LISTEN 516875/python
tcp 0 0 10.18.120.185:40091 0.0.0.0:* LISTEN 516875/python
tcp 0 0 10.18.120.185:36551 0.0.0.0:* LISTEN 516875/python
tcp6 0 0 :::55813 :::* LISTEN 516875/python

进程号和侦听端口已全部换成新的，那么这个问题基本已经明朗了

总共请求次数3次

第一次请求，由于xinference model worker 无法处理我的prompt，所以报错，在此时model worker 由于出错正在关闭进程中

第二次请求，由于model worker 已经属于关闭的状态，所以报model worker closed 错误

第三次请求，目前没有model worker,xinference 在端口23015上寻找model worker 找不到，所以返回了这个错误，而且同时model worker 正在重启，由于加载模型较慢，还没完成重启过程

由于重试3次，最后报的错误的是第三次，并且每次的错误都不一样，第一次的错误才是最真实的错误，所以比较难区分。

xinference 可能是对prompt 进行了检查和分析，所以报了第一个错误，同时它还对其他的prompt 进行了分析,之前也遇到过

解决办法：按照第一个错误的提示，按照参考资料3,launch 模型时，设置参数--limit-mm-per-prompt 为10,再试一次