qwen vllm function_call

最新推荐文章于 2025-03-25 15:40:02 发布

xnuscd

最新推荐文章于 2025-03-25 15:40:02 发布

阅读量4k

点赞数 10

文章标签： java

本文链接：https://blog.youkuaiyun.com/xnuscd/article/details/143624418

版权

以下是基于给定内容编写的关于使用vLLM部署Qwen模型并实现工具调用的教程。

教程：使用vLLM部署Qwen模型并实现工具调用

前言

vLLM 是一个高效、易于使用的大型语言模型推理和部署库。自vLLM v0.6.0版本起，它支持工具调用（Tools Calling），允许模型在适当情况下自动解析和调用工具。本文将通过使用Qwen2.5模型的示例，介绍如何利用vLLM实现工具调用，并使用OpenAI兼容API与客户端交互。

环境准备

Python: 3.8及以上
vLLM版本: v0.6.1.post2
CUDA: 如果使用GPU，请确保CUDA版本兼容

依赖库:

pip install torch transformers vllm openai

步骤 1: 启动OpenAI兼容API服务

首先启动vLLM OpenAI兼容API，以便我们可以通过OpenAI客户端库与Qwen2.5模型交互。运行以下命令启动服务：

vllm serve Qwen/Qwen2.5-7B-Instruct --enable-auto-tool-choice --tool-call-parser hermes

这里，我们启用了自动工具选择并使用Hermes风格的工具调用解析器。此API服务会在本地的localhost:8000/v1端口运行。

步骤 2: 初始化API客户端

接下来，使用OpenAI Python库初始化API客户端。请设置API base URL为本地API服务地址。

from openai import OpenAI

openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

步骤 3: 准备消息与工具配置

在生成响应时，模型需要接收到用户消息及工具调用信息。定义需要的工具（例如获取当前温度或未来日期的温度）。

tools = [
    {
        "name": "get_current_temperature",
        "description": "获取指定位置的当前温度",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "地点，例如'San Francisco, CA, USA'"},
            },
            "required": ["location"]
        }
    },
    {
        "name": "get_temperature_date",
        "description": "获取指定位置的未来日期的温度",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "地点"},
                "date": {"type": "string", "description": "日期，例如'2024-10-01'"},
            },
            "required": ["location", "date"]
        }
    }
]

messages = [
    {"role": "system", "content": "You are Qwen, a helpful assistant."},
    {"role": "user", "content": "What's the temperature in San Francisco now? How about tomorrow?"}
]

步骤 4: 生成工具调用响应

使用client.chat.completions.create()方法向模型发送消息，并根据设置的工具生成响应。

response = client.chat.completions.create(
    model="Qwen/Qwen2.5-7B-Instruct",
    messages=messages,
    tools=tools,
    temperature=0.7,
    top_p=0.8,
    max_tokens=512,
    extra_body={
        "repetition_penalty": 1.05,
    },
)

模型会在response.choices[0]中返回一个包含工具调用的字段，如下所示：

{
  "tool_calls": [
    {
      "id": "tool-924d705adb044ff88e0ef3afdd155f15",
      "function": {"name": "get_current_temperature", "arguments": "{\"location\": \"San Francisco, CA, USA\"}"}
    },
    {
      "id": "tool-7e30313081944b11b6e5ebfd02e8e501",
      "function": {"name": "get_temperature_date", "arguments": "{\"location\": \"San Francisco, CA, USA\", \"date\": \"2024-10-01\"}"}
    }
  ]
}

步骤 5: 解析工具调用并获取结果

从工具调用中提取function信息，并根据每个工具的参数调用相应的实际功能。

import json

if tool_calls := response.choices[0].message.get("tool_calls", None):
    for tool_call in tool_calls:
        fn_name = tool_call["function"]["name"]
        fn_args = json.loads(tool_call["function"]["arguments"])
        
        # 假设有函数库 get_function_by_name
        fn_result = json.dumps(get_function_by_name(fn_name)(**fn_args))
        
        messages.append({
            "role": "tool",
            "content": fn_result,
            "tool_call_id": tool_call["id"]
        })

步骤 6: 获取最终回复

将工具调用的结果反馈给模型，生成最终的用户回复。

response = client.chat.completions.create(
    model="Qwen/Qwen2.5-7B-Instruct",
    messages=messages,
    tools=tools,
    temperature=0.7,
    top_p=0.8,
    max_tokens=512,
    extra_body={
        "repetition_penalty": 1.05,
    },
)
print(response.choices[0].message['content'])