在许多大型语言模型(LLM)代理的应用中,环境是真实的(如互联网、数据库、REPL等)。然而,我们也可以定义代理在模拟环境中进行交互,例如基于文本的游戏。以下是使用Gymnasium(前称OpenAI Gym)创建简单的代理-环境交互循环的一个示例。
安装Gymnasium
!pip install gymnasium
导入必要的库
import tenacity
from langchain.output_parsers import RegexParser
from langchain.schema import (
HumanMessage,
SystemMessage,
)
定义代理
class GymnasiumAgent:
@classmethod
def get_docs(cls, env):
return env.unwrapped.__doc__
def __init__(self, model, env):
self.model = model
self.env = env
self.docs = self.get_docs(env)
self.instructions = """
Your goal is to maximize your return, i.e. the sum of the rewards you receive.
I will give you an observation, reward, terminiation flag, truncation flag, and the return so far, formatted as:
Observation: <observation>
Reward: <reward>
Termination: <termination>
Truncation: <truncation>
Return: <sum_of_rewards>
You will respond with an action, formatted as:
Action: <action>
where you replace <action> with your actual action.
Do nothing else but return the action.
"""
self.action_parser = RegexParser(
regex=r"Action: (.*)", output_keys=