[从MapReduceDocumentsChain迁移到LangGraph：提升文本处理的效率和灵活性]

本文链接：https://blog.youkuaiyun.com/tt_jishu/article/details/143613542

引言

在处理长文本时，MapReduceDocumentsChain 提供了一种高效的 Map-Reduce 策略，广泛应用于诸如文本摘要等任务中。然而，随着需求的增长和技术的进步，在一些场景下我们可能需要更灵活、更可控的解决方案。LangGraph 是一个新的工具，支持更先进的 Map-Reduce 工作流，在执行控制、错误恢复、以及与人类互动的工作流扩展方面提供了显著优势。本文将探讨如何向 LangGraph 迁移，并通过实践示例展示其应用。

主要内容

MapReduceDocumentsChain 的基本实现

MapReduceDocumentsChain 使用 Map-Reduce 策略来处理长文本，其基本步骤包括：

文本分割：将长文本切分为小文档。
Map 步骤：并行地处理这些小文档。
Reduce 步骤：合并处理结果，生成最终输出。

在 Reduce 步骤中，可以递归“折叠”摘要，这对于处理上下文窗口较小的模型非常有用。这一过程通常由一个 while 循环来实现。

from langchain.chains import MapReduceDocumentsChain, ReduceDocumentsChain
from langchain.chains.llm import LLMChain
from langchain_core.prompts import ChatPromptTemplate

# 示例代码
map_template = "Write a concise summary of the following: {docs}."
reduce_template = """
The following is a set of summaries:
{docs}
Take these and distill it into a final, consolidated summary
of the main themes.
"""

map_prompt = ChatPromptTemplate([("human", map_template)])
reduce_prompt = ChatPromptTemplate([("human", reduce_template)])

# 此处省略详细实现
map_reduce_chain = MapReduceDocumentsChain(...)
result = map_reduce_chain.invoke(documents)
print(result["output_text"])

LangGraph 的实现

LangGraph 引入了一种新的图结构，支持流式执行，使得你可以在执行过程中观察和干预。其优点包括更易扩展和容错能力的提高。以下是 LangGraph 的基本实现示例：

from langgraph.graph import StateGraph
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

map_template = "Write a concise summary of the following: {context}."
reduce_template = """
The following is a set of summaries:
{docs}
Take these and distill it into a final, consolidated summary
of the main themes.
"""
map_prompt = ChatPromptTemplate([("human", map_template)])
reduce_prompt = ChatPromptTemplate([("human", reduce_template)])
map_chain = map_prompt | llm | StrOutputParser()
reduce_chain = reduce_prompt | llm | StrOutputParser()

# Graph 实现
graph = StateGraph(...)
graph.add_node("generate_summary", generate_summary)
graph.add_node("generate_final_summary", generate_final_summary)
app = graph.compile()

# 调用 Graph
async for step in app.astream({"contents": [doc.page_content for doc in documents]}):
    print(step)