CVHub | langchain的LCEL_langchain lcel-优快云博客

本文来源公众号“CVHub”，仅用于学术分享，侵权删，干货满满。

阅读本文之前，假设你已经对langchain有了一定的了解，不了解可以看之前的一篇文章：CVHub | 初识langchain，3分钟快速了解！-优快云博客

1 举个例子

LCEL即 Langchain Expression language，是一种声明式的方式来构建链的方法，通过类似Linux管道的简单语法既可组装出比较复杂的链。这句话听起来可能有点绕，我来举个例子说明：

from typing import List

import openai

prompt_template = "Tell me a short joke about {topic}"
client = openai.OpenAI()

def call_chat_model(messages: List[dict]) -> str:
    response = client.chat.completions.create(
        model="gpt-3.5-turbo", 
        messages=messages,
    )
    return response.choices[0].message.content

def invoke_chain(topic: str) -> str:
    prompt_value = prompt_template.format(topic=topic)
    messages = [{"role": "user", "content": prompt_value}]
    return call_chat_model(messages)

invoke_chain("ice cream")

这个例子我们可以看到，主要分为三部分：

定义LLM模型，使用OpenAI接口
定义了一个prompt，用来让模型将指定topic的笑话
调用大模型，并提取返回的内容

上面看起来应该算比较清晰，但是在复杂的链的构建的情况下，可能就比较乱了，而通过LCEL语法，下面的代码将能实现等价的效果：

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

prompt = ChatPromptTemplate.from_template(
    "Tell me a short joke about {topic}"
)
output_parser = StrOutputParser()
model = ChatOpenAI(model="gpt-3.5-turbo")
chain = (
    {"topic": RunnablePassthrough()} 
    | prompt
    | model
    | output_parser
)

chain.invoke("ice cream")

这样看起来是不是清晰多了？所以更加简单地理解LCEL就是：通过 | 来将 Chain中的各个部分组合起来，组合之后的内容仍然是一个Chain （更准确的是，组合的内容仍然是一个Runable类）。

到此为止，其实我们已经可以知道这套语法能够很好的完成了顺序执行的能力。要是我再加上判断分支，并行分支甚至是循环分支，那岂不是一套编程语言又出来了（笑）。接下来我们浅读下源码，看下LCEL这套语法的原理是怎样的，以及能创造出什么更好玩的东西。

langchain核心类UML图

2 承前启后的Runable

class Runnable(Generic[Input, Output], ABC):
    """A unit of work that can be invoked, batched, streamed, transformed and composed.
     Key Methods
     ===========
    * invoke/ainvoke: Transforms a single input into an output.
    * batch/abatch: Efficiently transforms multiple inputs into outputs.
    * stream/astream: Streams output from a single input as it's produced.
    * astream_log: Streams output and selected intermediate results from an input.
    """
    def __or__() -> RunnableSerializable[Input, Other]:
        """Compose this runnable with another object to create a RunnableSequence."""
        return RunnableSequence(self, coerce_to_runnable(other))
    def __ror__() -> RunnableSerializable[Other, Output]:
        """Compose this runnable with another object to create a RunnableSequence."""
        return RunnableSequence(coerce_to_runnable(other), self)

我们可以看到，Runable是一个泛型抽象类，我们刚刚上面的例子里面都是其子类的具体实现。通过类的描述我们可以知道Runable类有这么几个关键方法：

invoke/ainvoke：就是调用链的一个方法，其中ainvoke是一个异步版本
batch/abatch：批量调用
stream/astream：流式调用
astream_log：在异步流式调用的过程返回中间结果

除了描述的几个方法，我又摘了两个方法出来，__or__ 和 __ror__。熟悉的盆友应该就知道了，这是运算符重载，改变了 | 的行为，使其可以串联Runable类的输入和输出。

细心的读者可能已经发现了，__or__ 和 __ror__ 连接的是两个 RunnableSequence类型，而不是Runnable类型，而这个才是LCEL中最关键的一个类，是用于创建实际可运行对象的一个类，基本上chain的所有节点都会继承这个类。

3 RunnableSequence

class RunnableSequence(RunnableSerializable[Input, Output]):
 """A sequence of runnables, where the output of each is the input of the next.

    RunnableSequence is the most important composition operator in LangChain as it is
    used in virtually every chain.

    A RunnableSequence can be instantiated directly or more commonly by using the `|`
    operator where either the left or right operands (or both) must be a Runnable.

    Any RunnableSequence automatically supports sync, async, batch."""

从类的描述上面，我们可以看到这个类的地位得到了官方认可，这就是最重要的一个组合运算符。这个类可以直接创建一个可运行的对象，并且可以将每个对象的输出，作为下一个对象的输入，几乎所有chain都会用到。

并且在这个类中，官方还给了一个例子：如何把一个函数转换为Runable类型

from langchain_core.runnables import RunnableLambda

def add_one(x: int) -> int:
    return x + 1

def mul_two(x: int) -> int:
    return x * 2

runnable_1 = RunnableLambda(add_one)
runnable_2 = RunnableLambda(mul_two)
sequence = runnable_1 | runnable_2
# Or equivalently:
# sequence = RunnableSequence(first=runnable_1, last=runnable_2)
sequence.invoke(1)
await sequence.ainvoke(1)

sequence.batch([1, 2, 3])
await sequence.abatch([1, 2, 3])

RunnableLambda 可以把一个Callable类转成Runnable类（python所有可调用对象都是Callable 类型），从而可以将你自定义的函数集成到chain中，比如提取当前的时间信息、或者调用接口等，这将会大大拓展chain节点的能力。但是，目前为止，介绍的都还只是串行能能力，接下来是时候介绍一下LCEL的并行能力：RunableParallel。

4 RunableParallel

class RunnableParallel(RunnableSerializable[Input, Dict[str, Any]]):
    """A runnable that runs a mapping of runnables in parallel, and returns a mapping
    of their outputs.

    RunnableParallel is one of the two main composition primitives for the LCEL,
    alongside RunnableSequence. It invokes runnables concurrently, providing the same
    input to each.

    A RunnableParallel can be instantiated directly or by using a dict literal within a
    sequence."""

顾名思义，parallel当然指的就是并行运算。通过类的声明我们可以看到，返回的是一个结果map。通常用于串联两个没有上下文依赖的两个串行的链路，比如我要构建一个完成旅游攻略的chain，希望能够很好的规划旅游景点并且规划好顺路打卡美食，那么，可以这么构建：

链A:查找并总结指定城市的美食，并且给出具体的地址
链B:查找并总结指定城市的旅游景点，并给出具体的地址
链C:根据美食、旅游景点的综合信息，给出具体的旅游路线规划

其中，查找链A 、链B这两个是没有依赖关系的，可以并行后将输出结果送到链C去完成最后的规划工作。具体写法可以看下面这个例子：

from langchain_core.runnables import RunnableLambda

def add_one(x: int) -> int:
    return x + 1

def mul_two(x: int) -> int:
    return x * 2

def mul_three(x: int) -> int:
    return x * 3

runnable_1 = RunnableLambda(add_one)
runnable_2 = RunnableLambda(mul_two)
runnable_3 = RunnableLambda(mul_three)

sequence = runnable_1 | {  # this dict is coerced to a RunnableParallel
    "mul_two": runnable_2,
    "mul_three": runnable_3,
}

sequence.invoke(1)

我们可以看到，上面的代码定义了三个Runnable对象，通过LCEL语法将其串联了起来，实现的效果是：

将输入的数字x调用runnable_1，完成 x+1
将runnable_1的输出，分别作为runnable_2和runnable_3的输入
分别执行runnable_2将结果赋值到mul_two，执行runnable_3将结果赋值到mul_three
所以最终会输出一个字典 {'mul_two':4, 'mul_three':6}

5 RunnableBranch

class RunnableBranch(RunnableSerializable[Input, Output]):
    """A Runnable that selects which branch to run based on a condition.

    The runnable is initialized with a list of (condition, runnable) pairs and
    a default branch.

    When operating on an input, the first condition that evaluates to True is
    selected, and the corresponding runnable is run on the input.

    If no condition evaluates to True, the default branch is run on the input."""

RunnableBranch实现的是一个分支判断能力，你可以理解为就是langchian里面的 if 关键字。通过下面这个例子，其实很容易就能理解这个语法。

from langchain_core.runnables import RunnableBranch

branch = RunnableBranch(
    (lambda x: isinstance(x, str), lambda x: x.upper()),
    (lambda x: isinstance(x, int), lambda x: x + 1),
    (lambda x: isinstance(x, float), lambda x: x * 2),
    lambda x: "goodbye",
)

branch.invoke("hello") # "HELLO"
branch.invoke(None) # "goodbye"

我们可以看到这个RunnableBranch里面有4个分支，为了让大家更好理解，我给大家翻译一下:

def branch(x):
 if isinstance(x, str):
  return x.upper()
 elif isinstance(x, int):
  return x + 1
 elif isinstance(x, float):
  return x*2
 else:
  return "goodbye"

是不是特别简单？其实就是前面的为判断语句，后面部分则是具体的分支逻辑，假如没有任何判断逻辑，则就是最后的兜底逻辑。

上面介绍的部分算是langchain里面比较核心的类了，当然，还有许多其他的比如：Router、Passthrough、Fallback、History 等等。如果大家对这个比较感兴趣我就再写点吧，感觉langchain还是挺好玩的，并且在一定程度上能够让LLM有一定的可控性，创造出一些更加复杂、能成为生产力的链，而不是单纯的聊聊天，感兴趣的可以加一下我们的大模型的交流群~vx: cv_huber，备注“大模型”。

THE END!

文章结束，感谢阅读。您的点赞，收藏，评论是我继续更新的动力。大家有推荐的公众号可以评论区留言，共同学习，一起进步。