创新实训2024.05.28日志：记忆化机制、基于MTPE与CoT技术的混合LLM对话机制_未找到相关文档,该回答为大模型自身能力解答!-优快云博客

本文链接：https://blog.youkuaiyun.com/lyh20021209/article/details/139251773

1. 带有记忆的会话

1.1. 查询会话历史记录

在利用大模型自身能力进行对话与解答时，最好对用户当前会话的历史记录进行还原，大模型能够更好地联系上下文进行解答。

在langchain chat chat的chat函数中，通过实现langchain框架提供的ChatMemory。就可以建立一个对话记录的缓冲区，随后读取历史会话记录到缓冲区，在对话时作为memory参数传入。

memory = ConversationBufferDBMemory(conversation_id=conversation_id,
                                                llm=model,
                                                message_limit=history_len)
                                                
chain = LLMChain(prompt=chat_prompt, llm=model, memory=memory)

而这个history_len，就会在Buffer初始化时，负责查找id为conversation_id最近的history_len的历史会话记录：

例如，如果history_len = 4，那么就查询前四条记录。

1.2. 与大模型进行对话

我将与大模型对话的过程封装起来，作为一个能够使用http协议进行访问的接口。

async def request_llm_chat(ca: LLMChat) -> dict:
    """
    生成llm对话请求
    包含参数有:
    1. query
    2. conv_id
    3. history_len
    4. model_name
    5. temperature
    6. prompt_name
    """
    request_body = {
        "query": ca.query,
        "conversation_id": ca.conv_id,
        "history_len": CHAT_ARGS["history_len"],
        "model_name": CHAT_ARGS["llm_models"][0],
        "temperature": CHAT_ARGS["temperature"],
        "prompt_name": ca.prompt_name
    }

    return await request(url=CHAT_ARGS["url"], request_body=request_body, prefix="data: ")

这里的参数解释一下：

query：用户询问大模型的内容
conv_id：会话的id
history_len：前向检索的历史会话记录
model_name:请求大模型的名字（因为可以同时部署多个大模型进行对话）
temperature：LLM采样温度，用于控制文本生成的随机性。这个随机性不宜过高，过高大模型会随性发挥导致回答不准确；也不宜过低，过低大模型不敢做出回答。

prompt_name：使用的prompt的模板，例如，with_history的模板如下：

"with_history":
    'The following is a friendly conversation between a human and an AI. '
    'The AI is talkative and provides lots of specific details from its context. '
    'If the AI does not know the answer to a question, it truthfully says it does not know.\\n\\n'
    'Current conversation:\\n'
    '{history}\\n'
    'Human: {input}\\n'
    'AI:',

提示AI，以下是AI之前与人类的对话记录，现在人类又给了一个输入，AI需要根据历史记录，对这个输入进行回答。