vllm 聊天模板

最新推荐文章于 2025-04-21 18:07:03 发布

wildland

最新推荐文章于 2025-04-21 18:07:03 发布

阅读量4.9k

点赞数 17

文章标签： llama 语言模型 python

本文链接：https://blog.youkuaiyun.com/wildland/article/details/140447921

版权

vllm 聊天模板

背景

最近在使用vllm来运行大模型，使用了文档提供的代码如下所示，发现模型只是在补全我的话，像一个base的大模型一样，而我使用的是经过指令微调的有聊天能力的大模型。回过头看huggingface提供的使用大模型的代码，发现有一个方法是apply_apply_chat_template，并且对话还通常有着角色，例如"user"或"system"，这让我意识到使用大模型的聊天功能并不是直接将输入提供给模型即可。因此需要对大模型聊天能力背后的细节进行一些了解。实现将prompt转为对话信息的代码见：https://github.com/JinFish/EasyChatTemplating

from vllm import LLM, SamplingParams
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(model="../../pretrained_models/llama3-chat")
outputs = llm.generate(prompts, sampling_params)

# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {
     prompt!r}, Generated text: {
     generated_text!r}")
    
>>> Prompt: 'Hello, my name is', Generated text: ' Helen and I am a 35 year old mother of two. I am a'
>>> Prompt: 'The president of the United States is', Generated text: ' the head of the executive branch of the federal government, and is the highest-ranking'
>>> Prompt: 'The capital of France is', Generated text: ' Paris, and it is also the largest city in the country. It is situated'
>>> Prompt: 'The future of AI is', Generated text: ' full of endless possibilities, but it also poses significant challenges and risks. As AI'

当前的大模型通常是decoder-only的模型，无论是单轮对话还是多轮对话都是一股脑地丢进模型，而区分对话中的角色和对话需要一些特殊的标记。例如：在用户输入的时候，格式是user：我今早上吃了炒米粉。assistant：炒米粉在广东是蛮常见的早餐，但是油太多，可以偶尔吃吃。而输入给模型的则是：<s><intp>我今早上吃了炒米粉。</intp> [ASST] 炒米粉在广东是蛮常见的早餐，但是油太多，可以偶尔吃吃。[/ASST] eos_token。其中<intp>和</intp>用来表示用户的输入，[ASST]和[/ASST]表示模型的回复。eos_token表示会话的结束。

此外，目前大模型最常见的应用便是“对话”，在对话的上下文中，往往语言模型不是像往常那样延续一个单独的文本字符串，而是要延续由一个或多个**“messages”（消息）组成的会话**，并且每个消息都会包含一个**“role”（角色）**，例如"user"或者"assistant"，以及对应的消息内容。

就像不同的模型有不同的分词方式、特殊标记和形式一样，不同的大模型也有不同的chat template，这是tokenizer的一部分，其主要指定了如何将以消息列表呈现的会话转换成模型所期望的单个token化的字符串格式。以mistralai/Mistral-7B-Instruct-v0.1为例，其会使用<s>表示一个会话的开始，</s>表示回合的结束，即用来表示回合的边界，其会使用[INST]以及[/INST]来表示用户输入的信息：

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")

chat = [
  {
   "role": "user", "content": "Hello, how are you?"},
  {
   "role": "assistant", "content": "I'm doing great. How can I help you today?"},
  {
   "role": "user", "content": "I'd like to show off how chat templating works!"},
]

tokenizer.apply_chat_template(chat, tokenize=False)
"<s>[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today?</s> [INST] I'd like to show off how chat templating works! [/INST]"

如何使用chat template

从上个例子上来看，使用chat template是比较简单的，首先就是定义一个带有”role“和，”content“为键的消息列表，然后将该列表传入给tokenizer的apply_chat_template方法即可。

from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "HuggingFaceH4/zephyr-7b-beta"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint

最低0.47元/天解锁文章