构建LangChain应用程序的示例代码：30、使用Gymnasium，定义代理在模拟环境中进行交互教程

最新推荐文章于 2024-10-24 08:30:00 发布

Hugo_Hoo

最新推荐文章于 2024-10-24 08:30:00 发布

阅读量364

点赞数 5

CC 4.0 BY-SA版权

分类专栏：构建LangChain应用程序的示例代码文章标签： langchain 交互 python AI编程人工智能

本文链接：https://blog.youkuaiyun.com/wangjiansui/article/details/139632130

在许多大型语言模型（LLM）代理的应用中，环境是真实的（如互联网、数据库、REPL等）。然而，我们也可以定义代理在模拟环境中进行交互，例如基于文本的游戏。以下是使用Gymnasium（前称OpenAI Gym）创建简单的代理-环境交互循环的一个示例。

安装Gymnasium

!pip install gymnasium

导入必要的库

import tenacity
from langchain.output_parsers import RegexParser
from langchain.schema import (
    HumanMessage,
    SystemMessage,
)

定义代理

class GymnasiumAgent:
    @classmethod
    def get_docs(cls, env):
        return env.unwrapped.__doc__

    def __init__(self, model, env):
        self.model = model
        self.env = env
        self.docs = self.get_docs(env)

        self.instructions = """
Your goal is to maximize your return, i.e. the sum of the rewards you receive.
I will give you an observation, reward, terminiation flag, truncation flag, and the return so far, formatted as:

Observation: <observation>
Reward: <reward>
Termination: <termination>
Truncation: <truncation>
Return: <sum_of_rewards>

You will respond with an action, formatted as:

Action: <action>

where you replace <action> with your actual action.
Do nothing else but return the action.
"""
        self.action_parser = RegexParser(
            regex=r"Action: (.*)", output_keys=