如何使用Llama3-ChatQA-1.5-70B模型进行对话式问答-优快云博客

本文链接：https://blog.youkuaiyun.com/gitblog_02505/article/details/144344301

如何使用Llama3-ChatQA-1.5-70B模型进行对话式问答

Llama3-ChatQA-1.5-70B 项目地址: https://gitcode.com/hf_mirrors/ai-gitcode/Llama3-ChatQA-1.5-70B

引言

在当今信息爆炸的时代，对话式问答（Conversational Question Answering, QA）系统变得越来越重要。无论是客服机器人、智能助手还是教育工具，高效的对话式问答系统都能显著提升用户体验。Llama3-ChatQA-1.5-70B模型，作为一款专为对话式问答和检索增强生成（Retrieval-Augmented Generation, RAG）设计的先进模型，能够在这方面提供卓越的性能。本文将详细介绍如何使用Llama3-ChatQA-1.5-70B模型来完成对话式问答任务，并展示其在实际应用中的优势。

主体

准备工作

环境配置要求

在开始使用Llama3-ChatQA-1.5-70B模型之前，确保您的环境满足以下要求：

硬件要求：建议使用具有至少16GB显存的GPU。对于70B参数的模型，更高的显存（如32GB或更多）将显著提升性能。
软件要求：Python 3.8或更高版本，以及PyTorch 1.10或更高版本。此外，还需要安装Transformers库，可以通过以下命令安装：
```
pip install transformers
```

所需数据和工具

为了充分利用Llama3-ChatQA-1.5-70B模型，您需要准备以下数据和工具：

对话数据：用于训练和评估模型的对话数据集。可以从这里获取。
检索工具：为了处理长文档，建议使用Dragon-multiturn检索工具。可以从这里获取。

模型使用步骤

数据预处理方法

在使用模型之前，需要对输入数据进行预处理。以下是预处理步骤：

分词：使用模型的分词器对输入文本进行分词。
格式化：按照模型推荐的格式（如系统提示、上下文、用户问题等）组织输入数据。

模型加载和配置

加载Llama3-ChatQA-1.5-70B模型并进行配置的步骤如下：

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "nvidia/Llama3-ChatQA-1.5-70B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

任务执行流程

以下是如何使用模型进行对话式问答的示例代码：

messages = [
    {"role": "user", "content": "what is the percentage change of the net income from Q4 FY23 to Q4 FY24?"}
]

document = """NVIDIA (NASDAQ: NVDA) today reported revenue for the fourth quarter ended January 28, 2024, of $22.1 billion, up 22% from the previous quarter and up 265% from a year ago.\nFor the quarter, GAAP earnings per diluted share was $4.93, up 33% from the previous quarter and up 765% from a year ago. Non-GAAP earnings per diluted share was $5.16, up 28% from the previous quarter and up 486% from a year ago.\nQ4 Fiscal 2024 Summary\nGAAP\n| $ in millions, except earnings per share | Q4 FY24 | Q3 FY24 | Q4 FY23 | Q/Q | Y/Y |\n| Revenue | $22,103 | $18,120 | $6,051 | Up 22% | Up 265% |\n| Gross margin | 76.0% | 74.0% | 63.3% | Up 2.0 pts | Up 12.7 pts |\n| Operating expenses | $3,176 | $2,983 | $2,576 | Up 6% | Up 23% |\n| Operating income | $13,615 | $10,417 | $1,257 | Up 31% | Up 983% |\n| Net income | $12,285 | $9,243 | $1,414 | Up 33% | Up 769% |\n| Diluted earnings per share | $4.93 | $3.71 | $0.57 | Up 33% | Up 765% |"""

def get_formatted_input(messages, context):
    system = "System: This is a chat between a user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions based on the context. The assistant should also indicate when the answer cannot be found in the context."
    instruction = "Please give a full and complete answer for the question."

    for item in messages:
        if item['role'] == "user":
            ## only apply this instruction for the first user turn
            item['content'] = instruction + " " + item['content']
            break

    conversation = '\n\n'.join(["User: " + item["content"] if item["role"] == "user" else "Assistant: " + item["content"] for item in messages]) + "\n\nAssistant:"
    formatted_input = system + "\n\n" + context + "\n\n" + conversation
    
    return formatted_input

formatted_input = get_formatted_input(messages, document)
tokenized_prompt = tokenizer(tokenizer.bos_token + formatted_input, return_tensors="pt").to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(input_ids=tokenized_prompt.input_ids, attention_mask=tokenized_prompt.attention_mask, max_new_tokens=128, eos_token_id=terminators)

response = outputs[0][tokenized_prompt.input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

结果分析

输出结果的解读

模型的输出结果将根据输入的上下文和用户问题生成详细的回答。例如，对于上述示例，模型可能会输出类似于以下内容的结果：

Assistant: The net income for Q4 FY24 was $12,285 million, compared to $1,414 million in Q4 FY23. This represents a percentage change of 769%.

性能评估指标

为了评估模型的性能，可以使用ChatRAG Bench数据集进行测试。该数据集包含多种类型的对话式问答任务，涵盖了从简单的单轮问答到复杂的多轮对话。通过在ChatRAG Bench上的表现，可以全面评估模型在不同场景下的性能。

结论

Llama3-ChatQA-1.5-70B模型在对话式问答任务中表现出色，能够处理复杂的对话场景并生成高质量的回答。通过本文的介绍，您可以轻松上手使用该模型，并将其应用于实际的对话式问答系统中。未来，可以通过进一步优化模型训练数据和检索机制，进一步提升模型的性能。

Llama3-ChatQA-1.5-70B 项目地址: https://gitcode.com/hf_mirrors/ai-gitcode/Llama3-ChatQA-1.5-70B

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考