GPT实战系列-构建本地知识库RAG的LLM Agent_占卜算卦gpt知识库-优快云博客

Chat Models.聊天机器人基于消息的界面，而不是raw text，需要的LangChain界面是最适合聊天模型而不是raw text的 LLM 。聊天模型具有对话语气和天生支持消息接口。
Prompt Templates，简化提示的过程，包括结合默认消息、用户输入、聊天记录和（可选）额外检索。
Chat History，允许聊天机器人“记住”过去互动，并响应后续行动时将其考虑在内。
Retrievers（可选），如果需要使用特定领域的最新知识作为背景，以增强对话的响应。

如何将上述组件组合在一起，创建一个强大的对话式聊天机器人。

初始化配置

初始化聊天模型，将作为聊天机器人的大脑：

from langchain_openai import ChatOpenAI

chat = ChatOpenAI(model="gpt-3.5-turbo-1106", temperature=0.2)

如果调用聊天模型，则输出为：AIMessage

from langchain_core.messages import HumanMessage

chat.invoke(
    [
        HumanMessage(
            content="Translate this sentence from English to French: I love programming."
        )
    ]
)

AIMessage(content="J'adore programmer.")

目前，该模型没有任何状态。

为了解决这个问题，需要将整个对话历史记录传递到模型。

from langchain_core.messages import AIMessage

chat.invoke(
    [
        HumanMessage(
            content="Translate this sentence from English to French: I love programming."
        ),
        AIMessage(content="J'adore la programmation."),
        HumanMessage(content="What did you just say?"),
    ]
)

AIMessage(content='I said "J\'adore la programmation," which means "I love programming" in French.')

这样可以看到得到了很好的有记忆的回应！

是我们想要的，具备交互能力的聊天机器人。

Prompt模板

定义提示模板，使格式设置简单些。可以通过将链通过管道，连接到模型中来创建链：

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant. Answer all questions to the best of your ability.",
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

chain = prompt | chat

上面就是插入交谈消息，先传递到链的输入，直接进入提示，然后触发调用链：

chain.invoke(
    {
   
        "messages": [
            HumanMessage(
                content="Translate this sentence from English to French: I love programming."
            ),
            AIMessage(content="J'adore la programmation."),
            HumanMessage(content="What did you just say?"),
        ],
    }
)

AIMessage(content='I said "J\'adore la programmation," which means "I love programming" in French.')

历史消息

作为一个有记性的聊天机器人，需管理聊天记录，其快捷方法可以使用 MessageHistory 类，它是负责保存和加载聊天消息。已经集成很多内置历史消息功能，可将消息保存到各种的数据库，这里仅仅用内存来演示。ChatMessageHistory

下面是直接使用它的示例：

from langchain.memory import ChatMessageHistory

demo_ephemeral_chat_history = ChatMessageHistory()
demo_ephemeral_chat_history.add_user_message("hi!")
demo_ephemeral_chat_history.add_ai_message("whats up?")

[HumanMessage(content='hi!'), AIMessage(content='whats up?')]

这样，我们就可以将存储的消息，作为参数直接传递到链中：

demo_ephemeral_chat_history.add_user_message(
    "Translate this sentence from English to French: I love programming."
)

response = chain.invoke({
   "messages": demo_ephemeral_chat_history.messages})

AIMessage(content='The translation of "I love programming" in French is "J\'adore la programmation."')

demo_ephemeral_chat_history.add_ai_message(response)
demo_ephemeral_chat_history.add_user_message("What did you just say?")

chain.invoke({
   "messages": demo_ephemeral_chat_history.messages})

AIMessage(content='I said "J\'adore la programmation," which is the French translation for "I love programming."')

现在进一步结合历史信息，构建一个基本的聊天机器人！

虽然这也可以作为一个有用的聊天机器人了，但我们还想让它更厉害！就需给它连接内部知识，通过某种形式的，或简称 RAG，来获取浅的，特定领域的，很有用知识，使聊天机器人更加强大。

接下来将介绍 retrieval-augmented generation`

Retrievers

可以设置并使用 Retriever 来获得聊天机器人的特定领域知识。例如，让聊天机器人，能够回答有关LangSmith 文档的问题。你也可以让他回答别的问题。

将使用 LangSmith 文档作为源材料，并将其存储在 vectorstore 中，以供以后检索。请注意，这个例子将跳过一些关于解析和存储数据源。

使用文档加载器从网页中提取数据：

from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://docs.smith.langchain.com/overview")
data = loader.load()

接下来，将其拆分为更小的块，LLM 的上下文窗口可以处理，并将其存储在向量数据库中：

from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)