LangChainJS项目实战：使用LangSmith进行LLM应用评估与优化-优快云博客

LangChainJS项目实战：使用LangSmith进行LLM应用评估与优化

langchainjs 项目地址: https://gitcode.com/gh_mirrors/lan/langchainjs

引言

在构建基于大语言模型(LLM)的应用时，开发者常常面临一个关键挑战：如何确保应用在生产环境中的表现符合预期？LangChainJS作为流行的LLM应用开发框架，虽然简化了原型开发过程，但要将应用真正推向生产环境，还需要解决调试、评估和持续优化等问题。这正是LangSmith平台的价值所在。

LangSmith核心功能概述

LangSmith是专为LLM应用设计的全生命周期管理平台，主要提供以下核心能力：

实时调试：可视化追踪LLM应用的完整执行流程
数据集管理：创建和管理用于评估和优化的数据集
回归测试：确保应用迭代过程中性能不会退化
生产分析：收集运行时数据用于产品洞察

环境准备与配置

1. 获取LangSmith访问权限

首先需要获取LangSmith平台的访问权限和API密钥。由于平台目前处于封闭测试阶段，可能需要申请才能获得访问资格。

2. 安装必要依赖

npm install @langchain/openai @langchain/community langsmith uuid

3. 配置环境变量

import { v4 as uuidv4 } from "uuid";
const uniqueId = uuidv4().slice(0, 8);

// 启用LangSmith追踪
process.env.LANGCHAIN_TRACING_V2 = "true";
process.env.LANGCHAIN_PROJECT = `JS Tracing Walkthrough - ${uniqueId}`;
process.env.LANGCHAIN_ENDPOINT = "https://api.smith.langchain.com";
process.env.LANGCHAIN_API_KEY = "<YOUR-API-KEY>";

// 配置LLM和工具API密钥
process.env.OPENAI_API_KEY = "<YOUR-OPENAI-API-KEY>";
process.env.TAVILY_API_KEY = "<YOUR-TAVILY-API-KEY>";

构建并追踪LLM应用

1. 创建LangSmith客户端

import { Client } from "langsmith";
const client = new Client();

2. 构建函数调用代理

以下示例创建一个基于OpenAI函数调用功能的代理，并集成Tavily搜索工具：

import { AgentExecutor, createOpenAIFunctionsAgent } from "langchain/agents";
import { pull } from "langchain/hub";
import { TavilySearchResults } from "@langchain/community/tools/tavily_search";
import { ChatOpenAI } from "@langchain/openai";
import type { ChatPromptTemplate } from "@langchain/core/prompts";

const tools = [new TavilySearchResults()];

// 从Hub获取预定义提示模板
const prompt = await pull<ChatPromptTemplate>("hwchase17/openai-functions-agent");

const llm = new ChatOpenAI({
  modelName: "gpt-3.5-turbo-1106",
  temperature: 0,
});

const agent = await createOpenAIFunctionsAgent({
  llm,
  tools,
  prompt,
});

const agentExecutor = new AgentExecutor({
  agent,
  tools,
});

3. 批量执行并追踪

const inputs = [
  { input: "What is LangChain?" },
  { input: "What's LangSmith?" },
  // 更多输入...
];

const results = await agentExecutor.batch(inputs);

执行后，所有运行轨迹会自动记录到LangSmith平台，可以在项目页面查看详细执行流程。

评估与优化LLM应用

1. 创建评估数据集

const referenceOutputs = [
  { output: "LangChain is an open-source framework..." },
  // 更多参考输出...
];

const datasetName = `lcjs-qa-${uniqueId}`;
const dataset = await client.createDataset(datasetName);

await Promise.all(
  inputs.map(async (input, i) => {
    await client.createExample(input, referenceOutputs[i], {
      datasetId: dataset.id,
    });
  })
);

2. 配置评估指标

LangSmith支持多种评估方式，包括内置评估器和自定义评估逻辑：

import type { RunEvalType, DynamicRunEvaluatorParams } from "langchain/smith";

// 自定义评估器示例：检查输出是否包含不确定表述
const notUnsure = async (params: DynamicRunEvaluatorParams) => {
  if (typeof params.prediction?.output !== "string") {
    throw new Error("Invalid prediction format");
  }
  return {
    key: "not_unsure",
    score: !params.prediction.output.includes("not sure"),
  };
};

const evaluators: RunEvalType[] = [
  // 内置评估器：正确性
  LabeledCriteria("correctness"),
  
  // 内置评估器：简洁性
  Criteria("conciseness", {
    formatEvaluatorInputs: (run) => ({
      input: run.rawInput.question,
      prediction: run.rawPrediction.output,
      reference: run.rawReferenceOutput.answer,
    }),
  }),
  
  // 自定义评估器
  notUnsure,
];

3. 执行基准测试

import { runOnDataset } from "langchain/smith";

await runOnDataset(agentExecutor, datasetName, {
  evaluators,
  projectName: "Name of the evaluation run",
});

测试完成后，可以在LangSmith界面查看详细的评估结果，包括每个指标的得分情况。