想象一下,你有一家面包店,你派出了甜食商情报团队来收集竞争对手的数据。他们会汇报竞争情况,他们有很多很棒的想法,你想把它们应用到你的业务中。然而,数据是非结构化的!您如何分析这些数据,以了解最需要什么,并为您业务的下一步计划做出最佳的策略?在第1部分中,我们使用“PydanticOutputParser”来分析我们的数据并添加所需的结构。在第2部分中,我们将创建一个LangChain Agent来进行数据分析。
为了探索这个用例,创建了一个玩具数据集[1]。以下是数据集中的一个示例样本:
At Velvet Frosting Cupcakes, our team learned about the unveiling of a seasonal pastry menu that changes monthly. Introducing a rotating seasonal menu at our bakery using the “SeasonalJoy” subscription platform and adding a special touch to our cookies with the “FloralStamp” cookie stamper could keep our offerings fresh and exciting for customers.
第一部分:从非结构化数据抽取结构化信息
方法一:create_extract_chain
定义数据抽取的结构,并且使用LangChain创建一个提取链。
from langchain.chains import create_extraction_chain
from langchain.chat_models import ChatOpenAI
# Schema
schema = {
"properties": {
"company": {"type": "string"},
"offering": {"type": "string"},
"advantage": {"type": "string"},
"products_and_services": {"type": "string"},
"additional_details": {"type": "string"},
}
}
定义测试样本
# Inputs
in1 = """Sweet Delights Bakery introduced lavender-infused vanilla cupcakes with a honey buttercream frosting, using the "Frosting-Spreader-3000". This innovation could inspire our next cupcake creation"""
in2 = """Whisked Away Cupcakes introduced a dessert subscription service, ensuring regular customers receive fresh batches of various sweets. Exploring a similar subscription model using the "SweetSubs" program could boost customer loyalty."""
in3 = """At Velvet Frosting Cupcakes, our team learned about the unveiling of a seasonal pastry menu that changes monthly. Introducing a rotating seasonal menu at our bakery using the "SeasonalJoy" subscription platform and adding a special touch to our cookies with the "FloralStamp" cookie stamper could keep our offerings fresh and exciting for customers."""
inputs = [in1, in2