[Reinforcement Learning] Agent,Environment, Value,Policy, Actor, Critic, Player, Model

Policy Network(Actor):
The policy network (actor) maps the current state of the system (e.g., robot joint angles, cube’s pose) to an action distribution. The agent samples actions from this distribution to interact with the environment.

Value Network (Critic):
The critic network uses the current state to estimate the value function (how good it is to be in that state). This helps the PPO algorithm update the policy more efficiently.

Likely Entry Point:
If you search through amp_models.py or hrl_models.py, you will often find a call like AMPBuilder.build(...) or something similar. That’s where the network is instantiated from the builder. After this, the common_agent.py or amp_players.py code uses that constructed network to run through the RL pipeline.

CommonAgent (in common_agent.py) is created and sets up the training run.

the RL process uses ModelAMPContinuous.build() in amp_models.py, which then calls self.network_builder.build('amp', **config) to instantiate the policy network.

Flow of Data (Observations) During Training:

  • The training loop (likely inside Runner or code called by Runner) repeatedly:
    1. Resets the environment(s).
    2. Retrieves Observations from the environment.
    3. Passes these observations to the Player (in this case, AMPPlayerContinuous).
    4. The Player normalizes/preprocesses the observations if needed and then provides them to the Agent.
    5. The Agent (e.g., AMPAgent) uses the Model (e.g., ModelAMPContinuous) which in turn calls the Network built by AMPBuilder.
    6. The Network takes the observation as input and produces action probabilities (the policy) and value estimates (the critic).
    7. The Agent selects actions, steps the environment, and the process repeats.<
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值