Xinference：大模型部署与分布式推理框架（二）API接口——对话接口、模型列表、嵌入模型、Rerank模型、使用Xinference SDK...

最新推荐文章于 2025-02-13 19:32:58 发布

大模型面试

最新推荐文章于 2025-02-13 19:32:58 发布

阅读量3k

点赞数 23

文章标签：分布式人工智能大模型 AI大模型 LLM AI Rerank

本文链接：https://blog.youkuaiyun.com/Code1994/article/details/142592945

版权

二、API接口

1、概述

除了使用LLM模型的Web界面进行操作外，Xinference还提供了API接口，通过调用API接口来使用LLM模型。

在API文档中，存在大量API接口，不仅有LLM模型的接口，还有其他模型(如Embedding)的接口，并且这些接口都是兼容OpenAI API的接口。

通过访问http://localhost:9997/docs来查看API文档。

2、对话接口

使用Curl工具调用对话接口

curl -X 'POST' \
  'http://localhost:9997/v1/chat/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "chatglm3",
    "messages": [
      {
        "role": "user",
        "content": "你好啊"
      }
    ]
  }'
  
{"id":"chat73f8c754-4898-11ef-89f6-000c2981d002","object":"chat.completion","created":1721700508,"model":"chatglm3","choices":[{"index":0,"message":{"role":"assistant","content":"你好👋！我是人工智能助手 ChatGLM3-6B，很高兴见到你，欢迎问我任何问题。"},"finish_reason":"stop"}],"usage":{"prompt_tokens":-1,"completion_tokens":-1,"total_tokens":-1}}root@master:~#

3、模型列表

使用Curl工具调用获取模型列表

curl -X 'GET' \
  'http://localhost:9997/v1/models' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \

{"object":"list","data":[{"id":"chatglm3","object":"model","created":0,"owned_by":"xinference","model_type":"LLM","address":"0.0.0.0:38145","accelerators":["0"],"model_name":"chatglm3","model_lang":["en","zh"],"model_ability":["chat","tools"],"model_description":"ChatGLM3 is the third generation of ChatGLM, still open-source and trained on Chinese and English data.","model_format":"pytorch","model_size_in_billions":6,"model_family":"chatglm3","quantization":"4-bit","model_hub":"modelscope","revision":"v1.0.2","context_length":8192,"replica":1}]}

4、嵌入模型

使用Curl工具调用嵌入模型接口

curl -X 'POST' \
  'http://localhost:9997/v1/embeddings' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "嵌入模型名称、UID",
    "input": "你好啊"
  }'

5、Rerank模型

使用Curl工具调用Rerank模型接口

curl -X 'POST' \
  'http://localhost:9997/v1/rerank' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
   "model": "bge-reranker-base",
   "query": "你是谁?",
   "documents": [
  "你是一名乐于助人的AI助手。",
  "你的名字叫'rerank'"
   ]
 }'

6、使用Xinference SDK

安装Xinference的Python SDK，使用以下命令安装最少依赖。注意: 版本必须和Xinference服务的版本保持匹配。

pip install xinference-client==${SERVER_VERSION}

from xinference.client import RESTfulClient

client = RESTfulClient("http://127.0.0.1:9997")
# 注意：my-llm是参数`--model-uid`指定的值
model = client.get_model("my-llm")
print(model.chat(
    prompt="你好啊",
    system_prompt="你是一个乐于助人的AI助手。",
    chat_history=[]
))

7、使用OpenAI SDK

Xinference提供了与OpenAI兼容的API，所以可以将Xinference运行的模型当成OpenAI的本地替代。

from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:9997/v1", api_key="")

response = client.chat.completions.create(
    model="my-llm",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the largest animal?"}
    ]
)
print(response)