前端到AI，LangChain.Js(四)

最新推荐文章于 2025-04-23 20:48:33 发布

coderlin_

最新推荐文章于 2025-04-23 20:48:33 发布

阅读量680

点赞数 27

文章标签：前端人工智能 langchain

本文链接：https://blog.youkuaiyun.com/lin_fightin/article/details/145708093

版权

学习地址：
学习小册

Function calling

Function calling本质上就是给LLM了解和调用外界函数的能力，LLM会根据他的理解，在合适的时间返回对函数的调用和参数，然后根据函数调用结果进行回答。比如一个旅游的chatbot，你问他规划一个2.11日北京的旅游行程，帮我选择最合适天气的衣服，LLM就会判断需要调用获取2.11实时天气的API来获取北京在2.11的天气，根据返回的结果回答问题。

async function main() {
  const completion = await openai.chat.completions.create({
    model: "deepseek-chat",
    messages: [
      {
        role: "user",
        content: "What's the weather like in Beijing",
      },
    ],
    // 调用tool时机，none不调用，auto由LLM判断调不调用
    tool_choice: "auto",
    tools: [
      {
        type: "function",
        function: {
          name: "getCurrentWeather",
          description: "Get the current weather in a given location",
          // 参数信息
          parameters: {
            type: "object", //对象
            properties: {
              // 第一个location
              location: {
                type: "string",
                description: "The city and state, e.g. San Francisco, CA",
              },
              // 第二个unit，可选值为celsius和fahrenheit
              unit: { type: "string", enum: ["celsius", "fahrenheit"] },
            },
            required: ["location"],
          },
        },
      },
    ],
  });

  const functions = {
    getCurrentWeather: getCurrentWeather,
  };
  // 获取LLM调用函数的信息
  const functionInfo = completion.choices[0].message.tool_calls[0].function;
  const functionName = functionInfo.name;
  const functionParams = functionInfo.arguments;

  const functionResult = functions[functionName](functionParams);

  console.log(functionResult);
}

main();

function getCurrentWeather({ location, unit = "fahrenheit" }) {
  const weather_info = {
    location: location,
    temperature: "72",
    unit: unit,
    forecast: ["sunny", "windy"],
  };
  return JSON.stringify(weather_info);
}

tool_choice是指定llm调用函数的行为

none禁止LLm调用函数
auto 让LLM i集决定是否使用
{type: 'function', function: {name: "test_function"}}，指定一定得调用这个函数

根据函数结果进行回答

messages.push(completion.choices[0].message);

  const functions = {
    getCurrentWeather: getCurrentWeather,
    getCurrentTime: getCurrentTime,
  };

  // 获取LLM调用函数的信息
  const cell = completion.choices[0].message.tool_calls[0];
  console.log('cell==', cell);
  
  const functionInfo = cell.function;
  const functionName = functionInfo.name;
  const functionParams = functionInfo.arguments;
  const functionResult = functions[functionName](functionParams);

  messages.push({
    tool_call_id: cell.id as any,
    role: "tool",
    name: functionName,
    content: functionResult,
  });

  console.log('messages==', messages);

  const response = await openai.chat.completions.create({
    model: 'deepseek-chat',
    messages,
  });
  console.log(response.choices[0].message);

结果
在这里插入图片描述
至此，我们完成了提供外部函数，调用外部函数，传递给LLM，让LLM根据结果进行回答的完整闭环。

在langchain中使用tools

zod

zod 是 js 生态中常见的类型定义和验证的工具库

import { z } from "zod";

const stringSchema = z.string();
stringSchema.parse("Hello, Zod!");

stringSchema.parse(2323);

//报错
ZodError: [
  {
    "code": "invalid_type",
    "expected": "string",
    "received": "number",
    "path": [],
    "message": "Expected string, received number"
  }
]

如上，报错信息可读性很高，适合给LLM，让其自己纠正错误。

用 zod 去定义我们函数参数的 schem，以上述获取天气的函数为例子。

const getCurrentWeatherSchema = z.object({
  location: z.string().describe("The city and state, e.g. San Francisco, CA"),
  unit: z.enum(["celsius", "fahrenheit"]).describe("The unit of temperature"),
});

location是string类型，且添加描述。
unit是枚举类型。

可以用zod-to-json-schema将zod定义的shcema转换成json shcema

import { zodToJsonSchema } from "zod-to-json-schema";

const paramSchema = zodToJsonSchema(getCurrentWeatherSchema)

结果就是

{
  type: "object",
  properties: {
    location: {
      type: "string",
      description: "The city and state, e.g. San Francisco, CA"
    },
    unit: {
      type: "string",
      enum: [ "celsius", "fahrenheit" ],
      description: "The unit of temperature"
    }
  },
  required: [ "location", "unit" ],
  additionalProperties: false,
  "$schema": "http://json-schema.org/draft-07/schema#"
}

在model使用该定义。

const model = new ChatOpenAI({
    temperature: 0 
})

const modelWithTools = model.bind({
    tools: [
        {
            type: "function",
            function: {
                name: "getCurrentWeather",
                description: "Get the current weather in a given location",
                parameters: zodToJsonSchema(getCurrentWeatherSchema),
            }
        }
    ]
})

const prompt = ChatPromptTemplate.fromMessages([
    ["system", "You are a helpful assistant"],
    ["human", "{input}"]
])

const chain = prompt.pipe(modelWithTools)

await chain.invoke("北京的天气怎么样");

Agent

Agents 是一个自主的决策和执行过程，其核心是将 llm 作为推理引擎，根据 llm 对任务和环境的理解，并根据提供的各种工具，自主决策一系列的行动
Agent可以理解为在某种能自主理解、规划决策、执行复杂任务的智能体
Agent并非ChatGPT升级版，它不仅告诉你“如何做”，更会帮你去做。如果CoPilot是副驾驶，那么Agent就是主驾驶

LLM像是大脑，而提供的各种工具像是Agent的身体，LLM可以根据自己的理解，自主决策这副身体要做什么。

Agent的执行过程：

根据用户提出的问题进行思考，列出解决该问题需要执行的第一个任务/一系列任务
根据现有的工具集找到合适的工具，传递合适的参数，执行工具。
观察工具输出结果
根据工具输出的结果和现有环境信息，决策下一个任务的工具和参数
如果LLM认为问题已解决，输出答案。

上述是常见单体Agent的执行流程，在运行和调用工具的时候，会有memory模块记录当前的信息，作为下一步决策的依据

另一个流行的理念，多代理系统（MAS, Multi-Agent Systems），一个通过多个有固定角色Agent构成的系统，并且每个Agent有独立的memory系统，对应自己角色的标准任务处理流程，独立进行决策和执行。

在整体协作上，一般会定义完整的结构化的操作规程（SOP）来优化各代理间的协作和通信，各个 Agents 遵守定义好的标准流程执行自己的任务，并根据需要跟其他的 Agents 进行沟通。

一个Agent像是一个人，多代理系统像是模拟人类真实的协作模式。

无论是多Agents还是单Agents，都是以LLM为推理引擎，通过各种tool来提供感知环境和改变环境的能力。

RunnableBranch

把任务进行分类，路由到擅长不同任务的chain中，对chain的结果机械能处理，格式化操作，这就是RunnableBranch的应用。

比如有个场景，我们需要设计一个对话机器人，目标是来自多个领域的用户，同时我们拥有多个领域的高质量向量数据库，那我们应该是构建一个从多个数据库检索知识的 RAG 还是构建多个 RAG？

答案一定是构建多个RAG，针对每个领域去设计prompt和RAG策略，能有效避免数据源之间的互相污染。

从架构设计上看，需要一个入口LLM，能将用户的提问进行分类，然后路由到不同的chain。


const classifySchema = z.object({
  type: z.enum(["科普", "编程", "一般问题"]).describe("用户提问的分类")
})


const modelWithTools = chatModel.bind({
  tools: [
      {
          type: "function",
          function: {
              name: "classifyQuestion",
              description: "对用户的提问进行分类",
              parameters: zodToJsonSchema(classifySchema),
          }
      }
  ],
  // 一定要调用这个函数
  tool_choice: {
      type: "function",
      function: {
         name: "classifyQuestion"
      }
  }
})

const prompt = ChatPromptTemplate.fromMessages([
  ["system", `仔细思考，你有充足的时间进行严谨的思考，然后对用户的问题进行分类，
  当你无法分类到特定分类时，可以分类到 "一般问题"`],
  ["human", "{input}"]
])

const classifyChain = RunnableSequence.from([
  prompt,
  modelWithTools,
  new JsonOutputToolsParser(),
  (input) => {
      const type = input[0]?.args?.type
      return type ? type : "一般问题"
  }
])

 classifyChain.invoke({
  "input": "鲸鱼是哺乳动物么？"
}).then(res=>{
  console.log('res==', res);
  
})

在这里插入图片描述

这样我们就实现了将提问进行分类的chain。

import { StringOutputParser } from "@langchain/core/output_parsers";

const answeringModel = new ChatOpenAI({
    temperature: 0.7,
})

const sciencePrompt = PromptTemplate.fromTemplate(
  `作为一位科普专家，你需要解答以下问题，尽可能提供详细、准确和易于理解的答案：

问题：{input}
答案：`
)
    
const programmingPrompt = PromptTemplate.fromTemplate(
  `作为一位编程专家，你需要解答以下编程相关的问题，尽可能提供详细、准确和实用的答案：

问题：{input}
答案：`
)

const generalPrompt = PromptTemplate.fromTemplate(
  `请回答以下一般性问题，尽可能提供全面和有深度的答案：

问题：{input}
答案：`
)


const scienceChain = RunnableSequence.from([
    sciencePrompt,
    answeringModel,
    new StringOutputParser(),
    {
        output: input => input,
        role: () => "科普专家"
    }
    
])

const programmingChain = RunnableSequence.from([
    programmingPrompt,
    answeringModel,
    new StringOutputParser(),
    {
        output: input => input,
        role: () => "编程大师"
    }
    
])

const generalChain = RunnableSequence.from([
    generalPrompt,
    answeringModel,
    new StringOutputParser(),
    {
        output: input => input,
        role: () => "通识专家"
    }
    
])


const branch = RunnableBranch.from([
  [
    (input => input.type.includes("科普")),
    scienceChain,
  ],
  [
    (input => input.type.includes("编程")),
    programmingChain,
  ],
  generalChain
]);

const outputTemplate = PromptTemplate.fromTemplate(
`感谢您的提问，这是来自 {role} 的专业回答：

{output}
`)


const finalChain = RunnableSequence.from([
    {
    // 分类
        type: classifyChain,
        input: input => input.input
    },
    // 给对应的chain
    branch,
    // 输出
    (input) => outputTemplate.format(input),
])

const res = await finalChain.invoke({
    "input": "鲸鱼是哺乳动物么？"
})

console.log(res)

RunnableBranch 用起来非常方便，传入的是二维数组，每个数组的第一个参数就是该分支条件，成立即返回对应的Runnable对象，如果没有传入条件，就默认执行这个RUnnable。

在这里插入图片描述

通过借鉴agents对任务进行分配，组合，输出的处理流程，来达成质量，稳定性，效果都很好的chat bot。

这套方案具有更大的想象力，比如

我们可以使用 RunnableMap 去执行多个查询，从网络、数据库、其他 agents 查询
通过一个总结 llm 进行汇总，然后根据汇总后的信息去判断该问题的难度/类型，甚至是是否缓存中是否有类似的问题
使用 route 去决定后续的处理方式，问题复杂就用更好的LLM处理，简单的就用开元模型，缓存问题，从缓存库中使用向量相似度对比找到类型问题。

相对agents通过prompt固定处理某类任务的标准化流程，我们的方案是使用代码来确定处理流程，优点是灵活，稳定，但缺失了anents自主解决问题的通用性和创造性。

基础Agent实现和解析

将LLM作为推理引擎，通过RAG，web等方式去获取解决问题的充足信息，然后将LLM作为自然语言的理解和推理引擎据此给出答案。 Agents就是自动去做这个过程。

ReAct

Agents内部可能有复杂和多伦的各种LLM和chain的调用。

lang smith。可视化的追踪和分析 agents/llm-app 的内部处理流程是 langchain 官方和社区都看好的路线

ReAct 框架是非常流行的 agent 框架，其结合了推理（reasoning）和行动（acting），其流程大概是让 llm 推理完成任务需要的步骤，然后根据环境和提供的工具进行调用，观察工具的结果，推理下一步任务。就这样推理-调用-观察交错调用，直到模型认为完成推理，输出结果。

这个框架将LLM的推理能力，调用工具能力，观察能力结合在一起，让LLM能适应更多的任务和动态的环境，强化了推理和行动的协同作用。

demo: 提供给Agents的工具集，这里提供的tool分别是SerpAPI(网路搜索能力), Calculator计算能力。

 const tools = [new SerpAPI(process.env.SERP_KEY), new Calculator()];

  const prompt = await pull<PromptTemplate>("hwchase17/react");

  console.log('prompt==', prompt);

看一下reAct提供的prompt

// 定义任务，因为事业通用的agents，尽可能的回答用户的问题。
Answer the following questions as best you can. 

// 确定了模型有哪些工具能用。
You have access to the following tools:

{tools}

// 定义固定的格式和思考的路线
Use the following format:

//定义用户的问题，也是整个推理的最终目标，也是模型推理的起点 
Question: the input question you must answer

// 引导模型进行思考，考虑下一步采取的行动，构建解决问题的策略和步骤，也就是推理阶段。这部分也会记录在 prompt 中，方便我们去理解 llm 推理和思考的过程
Thought: you should always think about what to do

//定义模型需要采取的行动，这里需要是 tools 中提供的 tool，这就是模型需要采取的行动
Action: the action to take, should be one of [{tool_names}]

// 调用工具的参数，参数是连接用户的问题、模型的思考和实际行动的关键环节
Action Input: the input to the action

//  Action 的调用结果，给模型提供反馈，帮助模型根据前一 Action 的行动结果，决定后续的推理和行动步骤
Observation: the result of the action

// 反复操作
... (this Thought/Action/Action Input/Observation can repeat N times)

Thought: I now know the final answer
// 上面的步骤会重复多次，直到模型认为现有的推理、思考和观察已经能够得出答案，就根据信息总结出最终答案
Final Answer: the final answer to the original input question

Begin!

Question: {input}
Thought:{agent_scratchpad}

提问

async function main() {
  const tools = [new SerpAPI(process.env.SERP_KEY), new Calculator()];

  const prompt = await pull<PromptTemplate>("hwchase17/react");
  const llm = new ChatOpenAI({
    temperature: 0,
  });

  const agent = await createReactAgent({
    llm,
    tools,
    prompt,
  });

  const agentExecutor = new AgentExecutor({
    agent,
    tools,
  });

  const result = await agentExecutor.invoke({
    input: "我有 17 美元，现在相当于多少人民币？",
  });

  console.log(result);
}

main();

我有 17 美元，现在相当于多少人民币？需要网络搜索+计算的问题，目的是测试agents多次推理的效果。

看下流程，首先第一次llm调用

nswer the following questions as best you can. You have access to the following tools:

search: a search engine. useful for when you need to answer questions about current events. input should be a search query.
calculator: Useful for getting the result of a math expression. The input to this tool should be a valid mathematical expression that could be executed by a simple calculator.

Use the following format:
...
Final Answer: the final answer to the original input question

Begin!

Question: 我有 17 美元，现在相当于多少人民币？
Thought:

传入了大模型的两个工具，search和cacluator。

我需要知道当前的美元对人民币的汇率。
Action: search
Action Input: current USD to CNY exchange rate

型的第一个思考就是我需要知道当前的美元对人民币的汇率，这是模型认为解决问题他所欠缺的信息，然后调用 search 进行查询

Answer the following questions as best you can.
...

Begin!

Question: 我有 17 美元，现在相当于多少人民币？
Thought:我需要知道当前的美元对人民币的汇率。
Action: search
Action Input: current USD to CNY exchange rate

Observation: 7.24 Chinese Yuan
Thought:

这里将SerpAPI的结果作为ACtion行动的Observation传入，引导模型进行下一步思考。

现在我知道了当前的美元对人民币的汇率，我可以通过计算得到17美元相当于多少人民币。
Action: calculator
Action Input: 17 * 7.24

模型根据最新的观察，推理出目前的已知信息，并且决策下一步的行动，调用calculator

Answer the following questions as best you can. 

...

Begin!

Question: 我有 17 美元，现在相当于多少人民币？
Thought:我需要知道当前的美元对人民币的汇率。
Action: search
Action Input: current USD to CNY exchange rate

Observation: 7.24 Chinese Yuan
Thought: 现在我知道了当前的美元对人民币的汇率，我可以通过计算得到17美元相当于多少人民币。
Action: calculator
Action Input: 17 * 7.24


Observation: 123.08
Thought: 

我现在知道17美元相当于123.08人民币。
Final Answer: 123.08人民币

最终，agentExecutor 读取到 Final Answer，获取到这是 agent 执行的最终结果，然后 agent 执行完成，输出结果。

如上，我们可以通过非常简洁的prompt去引导模型自主的进行思考，推理，调用人类现有的工具，然后循环这些步骤最终得出结论。

但就目前 reAct 框架还是有一些问题：

复杂性和开销
最终结果的正确性依赖于每一步的精准操作，而每一步的操作是我们很难控制的。从 prompt 来看，其实只定义了基本的思考方式，后续都是 llm 根据自己理解进行推理。
对外部数据源准确性的依赖,reAct 假设外部信息源都是真实而确定的，并不会引导模型对外部数据源进行辩证的思考，所以如果数据源出现问题，推理结果就会出问题。
错误传播和幻觉问题，推理初期的错误会在后续推理中放大，特别是外部数据源有噪声或者数据不完整时。在缺乏足够量数据支持时，模型在推理和调用阶段可能会出现幻觉。
速度问题，reAct 会让模型 “慢下来” 一步步的去思考来得出结论，所以即使是简单的问题也会涉及到多次 llm 调用，很难应用在实时的 chat 场景中。

深入定制Agents

OpeanAI tools Agents

openAi的tools功能，可以满足稳定和使用的agents。Agetns所做的事情就是自我规划任务，调用外部函数和输出答案。而openAi的tools功能恰好如此，提供了tools接口，并且由llm去决定何时以及如何调用tools，并根据tools的运行结果生成改用户的输出。

 const tools = [new SerpAPI(process.env.SERP_KEY), new Calculator()];
  const prompt = await pull<ChatPromptTemplate>("hwchase17/openai-tools-agent");

拉openai-tools-agent的prompt

[
    ["system", "You are a helpful assistant"],
    {chat_history},
    ["HUMAN", "{input}"],
    {agent_scratchpad}
]

自定义Tool

无论是reAct还是openAI tools，提供给agents的tool都影响着agents的应用范围和效果。
自定义tool: 需要的参数是工具的名称，描述和真实调用的函数，这里名称和描述将影响llm何时调用，具有一定的语义。
在函数的实现上，不要抛出错误，而是返回错误信息的字符串。llm可以据此决定下一步行动。

DynamicTool，只支持单一的字符串作为函数输入。因为向 reAct 框架，并不支持多输入的 tool
DynamicStructuredTool，支持使用 zod schema 定义复杂的输入格式，适合在 openAI tools 中使用

单一输入

const stringReverseTool = new DynamicTool({
    name: "string-reverser",
    description: "reverses a string. input should be the string you want to reverse.",
    func: async (input: string) => input.split("").reverse().join(""),
  });

更常见的是，可以将之前的rag chain作为工具提供给agents

 const retrieverTool = new DynamicTool({
    name: "get-qiu-answer",
    func: async (input: string) => {
      const res = await retrieverChain.invoke({ input });
      return res.answer;
    },
    description: "获取小说 《骆驼祥子》相关问题的答案",
  });

复杂输入的tool

onst dateDiffTool = new DynamicStructuredTool({
    name: "date-difference-calculator",
    description: "计算两个日期之间的天数差",
    schema: z.object({
      date1: z.string().describe("第一个日期，以YYYY-MM-DD格式表示"),
      date2: z.string().describe("第二个日期，以YYYY-MM-DD格式表示"),
    }),
    func: async ({ date1, date2 }) => {
      const d1 = new Date(date1);
      const d2 = new Date(date2);
      const difference = Math.abs(d2.getTime() - d1.getTime());
      const days = Math.ceil(difference / (1000 * 60 * 60 * 24));
      return days.toString();
    },
  });

调用

 const tools = [retrieverTool, dateDiffTool, new Calculator()];
  // 创建 agents 的代码省略
  const result = await agentExecutor.invoke({
      input: "祥子是谁",
    });
    
  const res = await agents.invoke({
      input: "今年是 2024 年，今年 5.1 和 10.1 之间有多少天？",
  });