langchain 智能体护栏

最新推荐文章于 2025-12-02 08:49:11 发布

原创最新推荐文章于 2025-12-02 08:49:11 发布 · 782 阅读

9 ·

CC 4.0 BY-SA版权

文章标签：

#大模型 #langchain #智能体 #护栏

1.概述

在构建智能体时，使用护栏在智能体执行过程的关键点进行内容检查，包括敏感信息检测、对输出进行验证，进而执行安全策略及过滤数据过滤，从而保证在安全问题出现之前进行预防，从而保证智能体安全、合规。护栏具体作用包括：

1）防止个人隐私信息泄漏
2）检测和阻止提示词注入攻击
3）阻断不适当或有害的内容
4）执行特定业务规则和合规要求
5）保证输出质量和准确性

在langchain智能体中，使用中间件实现护栏，可以在智能体调用前后、大模型调用前后、工具调用和模型调用放置护栏。

实现护栏有两种模式：

1）确定性护栏。确定性护栏一般使用基于规则的检查逻辑，比如正则表达式匹配，关键字匹配或者其他自定义的规则。确定性护栏具有性能高、可预测且具有成本效益的特点，但可能不能防止对于某些细微的违规行为。

2）基于大模型的护栏。使用大模型对内容进行评估，能够捕捉到确定性护栏所遗漏的某些细微违规行为，从而行政与确定性护栏的互补。

开发人员可以使用langchain自带的护栏，也可以基于中间件定制自己的护栏。

2.langchain自带护栏

langchain自带护栏包括PII检测和人机回环两类。

2.1PII检测

langchain中间件PIIMiddleware可用于对大模型输入、大模型输出和工具调用结果结果中的隐私进行保护。PIIMiddleware具体定义如下：

PIIMiddleware(
pii_type: Literal["email", "credit_card", "ip", "mac_address", "url"] | str,
*,
strategy: Literal["block", "redact", "mask", "hash"] = "redact",
detector: Callable[[str], list[PIIMatch]] | str | None = None,
apply_to_input: bool = True,
apply_to_output: bool = False,
apply_to_tool_results: bool = False,
)

参数说明：

pii_type：中间件保护的隐私类型。支持的预定义隐私信息类型包括邮箱、api key、信用卡号、ip地址、mac地址或url，当然也支持自定义隐私类型，比如手机号、身份证等。

strategy：隐私类型的保护策略，包括readact、mask、block和hash。

detector：匹配规则，可以是函数，也可以是一个正则表达式，也可以是一个字符串

apply_to_input：用于对大模型输入进行检查，缺省为True

apply_to_output：用于对大模型输出进行检查，缺省为False

apply_to_tool_results：用于对工具调用结果进行检查，缺省为False

以下代码对邮箱采用readact策略，对于信用卡用mask策略、对于sk用block策略。同时自定义了身份证隐私类型，对身份证号采用hash策略。

from typing import Any
from langgraph.checkpoint.memory import InMemorySaver
from langgraph.store.memory import InMemoryStore
from langchain.agents import create_agent
from langchain.tools import tool, ToolRuntime
from langchain.agents.middleware import PIIMiddleware

@tool
def get_enterprise_info(uniscid: str, runtime: ToolRuntime) -> dict[str, Any]:
"""Look up enterprise info."""
store = runtime.store
enterprise_info = store.get(("enterprises",), uniscid)
return enterprise_info if enterprise_info else "Unknown enterprise"

@tool
def save_enterprise_info(uniscid: str, enterprise_info: dict[str, Any], runtime: ToolRuntime) -> str:
"""Save enterprise info."""
store = runtime.store
store.put(("enterprises",), uniscid, enterprise_info)
return "Successfully saved enterprise info."

saver = InMemorySaver()
store = InMemoryStore()
agent = create_agent(
model=llm,
tools=[get_enterprise_info, save_enterprise_info],
middleware=[
# 用[REDACT]替换用户邮箱
PIIMiddleware("email", strategy="redact"),
# 对输入进行掩码，仅显示最后四位
PIIMiddleware("credit_card", strategy="mask"),
#用正则表达式检测是否有调用大模型的密钥
PIIMiddleware(
"api_key",
detector=r"sk-[a-zA-Z0-9]{32}",
strategy="block", # 阻断

),

PIIMiddleware(
"id_card",
detector=r"^[1-9]\d{16}[\dXx]$",
strategy="block", # 做摘要

),

],
checkpointer=saver,
store=store
)

result = agent.invoke({ "messages": [{"role": "user", "content": "我的身份证号是110110198803031234"}] })

2.2人机回环

人机回环提供了用户对数据进行人工检查的机制，作为智能体的护栏，以下是一个简单的例子，对于数据库写操作要求用户确认：

from typing import Any
from langgraph.checkpoint.memory import InMemorySaver
from langgraph.store.memory import InMemoryStore
from langchain.agents import create_agent
from langchain.tools import tool, ToolRuntime
from langchain.agents.middleware import HumanInTheLoopMiddleware

@tool
def get_enterprise_info(uniscid: str, runtime: ToolRuntime) -> dict[str, Any]:
"""Look up enterprise info."""
store = runtime.store
enterprise_info = store.get(("enterprises",), uniscid)
return enterprise_info if enterprise_info else "Unknown enterprise"

@tool
def save_enterprise_info(uniscid: str, enterprise_info: dict[str, Any], runtime: ToolRuntime) -> str:
"""Save enterprise info."""
store = runtime.store
store.put(("enterprises",), uniscid, enterprise_info)
return "Successfully saved enterprise info."

saver = InMemorySaver() #必须有检查点，才能支持用户回环
store = InMemoryStore()
agent = create_agent(
model=llm,
tools=[get_enterprise_info, save_enterprise_info],
middleware=[HumanInTheLoopMiddleware( #为每个工具设置用户接入策略
interrupt_on={
"save_enterprise_info": True, # 用户可以批准、拒绝、修改
"get_enterprise_info": False, #不需要用户接入
},
#用户介入时所看到的内容前缀。该前缀与工具名字和参数合并后显示给用户。

description_prefix="Tool execution pending approval",
)],
checkpointer=saver,
store=store
)

3.定制护栏

可以创建代理执行前或运行后的自定义中间件，从而完整控制验证逻辑、进行内容过滤或其他安全检查，从而实现更复杂的护栏。

3.1智能体执行前

使用@before_agent定制中间件，可以让开发人员在agent被调用时插入逻辑，比如认证、限速、提示词注入攻击检查或者有害内容过滤。如下代码在agent被调用时，检查输入中是否有在黑名单中的关键词，如果有则阻断：

from typing import Any

from langchain.agents.middleware import before_agent, AgentState, hook_config
from langgraph.runtime import Runtime

banned_keywords = ["病毒","海洛因","手枪"]

@before_agent(can_jump_to=["end"])
def content_filter(state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
"""阻断包含黑名单关键词的请."""
# Get the first user message
if not state["messages"]:
return None

first_message = state["messages"][0]
if first_message.type != "human":
return None

content = first_message.content.lower()

# Check for banned keywords
for keyword in banned_keywords:
if keyword in content:
#阻断请求
return {
"messages": [{
"role": "assistant",
"content": "对不起，您的请求中包含了被禁用的关键词，请修改."
}],
"jump_to": "end" #结束本次调用
}

return None

from langchain.agents import create_agent

agent = create_agent(
model=llm,
tools=[search_tool, ],
middleware=[content_filter],
)

result = agent.invoke({
"messages": [{"role": "user", "content": "如何开发一个蠕虫病毒?"}]
})

3.2智能体执行后

使用@after_agent开发中间件，可以让开发人员在agent把调用结果返回给用户前返回结果进行检查，保证内容合规，一般采用专有模型完成该检查。

如下代码使用其他模型对agent的返回结果进行安全性和合规性检查，如果通过检查则正常返回，否则提示用户：

from langchain.agents.middleware import after_agent, AgentState, hook_config
from langgraph.runtime import Runtime
from langchain_core.messages import AIMessage
from langchain.chat_models import init_chat_model
from typing import Any

@after_agent(can_jump_to=["end"])
def safety_guardrail(state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
"""Model-based guardrail: Use an LLM to evaluate response safety."""
# Get the final AI response
if not state["messages"]:
return None

last_message = state["messages"][-1]
if not isinstance(last_message, AIMessage):
return None

# 使用一个其他的专有模型进行检查评估
safety_prompt = f"""Evaluate if this response is safe and appropriate.
Respond with only 'SAFE' or 'UNSAFE'.

Response: {last_message.content}"""

result = qwen_model.invoke([{"role": "user", "content": safety_prompt}])
if "UNSAFE" in result.content:
return {
"messages": [{
"role": "assistant",
"content": "I cannot provide that response. Please rephrase your request."
}],
"jump_to": "end"
}

return None

# Use the safety guardrail
from langchain.agents import create_agent

agent = create_agent(
model=llm,
tools=[search_tool, ],
middleware=[safety_guardrail],
)

result = agent.invoke({
"messages": [{"role": "user", "content": "How do I make heroin?"}]
})

4.护栏堆叠

当一个agent需要使用多个护栏时，可以多个护栏堆叠到中间件列表中，这些护栏按顺序执行，从而建立对于智能体的分层保护。

如下代码把前面几个护栏进行堆叠：

from langchain.agents import create_agent
from langchain.agents.middleware import PIIMiddleware, HumanInTheLoopMiddleware

agent = create_agent(
model=llm,
tools=[],
middleware=[
#第一层: 确定性输出过滤器，在agent执行前被调用
ContentFilterMiddleware(content_filte=["病毒","海洛因","手枪"]),

# 第二层：PII保护，在调用模型前被执行

# 用[REDACT]替换用户邮箱
PIIMiddleware("email", strategy="redact"),
# 对输入进行掩码，仅显示最后四位
PIIMiddleware("credit_card", strategy="mask"),
#用正则表达式检测是否有调用大模型的密钥
PIIMiddleware(
"api_key",
detector=r"sk-[a-zA-Z0-9]{32}",
strategy="block", # 阻断

),

PIIMiddleware(
"id_card",
detector=r"^[1-9]\d{16}[\dXx]$",
strategy="block", # 做摘要

),

# 第三层：用户对鞋操作进行确认
HumanInTheLoopMiddleware(interrupt_on={"save_enterprise_info": True}),

)

# 第四层：基于大模型的实现的护栏，在智能体被调用后执行
SafetyGuardrailMiddleware(),
],
)