一、环境适当调整
- 数据收集:
RecordEpisodeStatistics
- 进行起始跳过n帧:
baseSkipFrame
- 一条生命结束记录为done:
EpisodicLifeEnv
- 得分处理成0或1:
ClipRewardEnv
- 叠帧:
FrameStack
- 向量空间reset处理修复
gym.vector.SyncVectorEnv
: 原始代码中的reset是随机的
- 继承重写的
spSyncVectorEnv
方法,支持每个向量的环境的seed一致,利于同一seed下环境的训练
class spSyncVectorEnv(gym.vector.SyncVectorEnv):
"""
step_await _terminateds reset
"""
def __init__(
self,
env_fns: Iterable[Callable[[], Env]],
observation_space: Space = None,
action_space: Space = None,
copy: bool = True,
random_reset: bool = False,
seed: int = None
):
super().__init__(env_fns, observation_space, action_space, copy)
self.random_reset = random_reset
self.seed = seed
def step_wait(self) -> Tuple[Any, NDArray[Any], NDArray[Any], NDArray[Any], dict]:
"""Steps through each of the environments returning the batched results.
Returns:
The batched environment step results
"""
observations, infos = [], {
}
for i, (env, action) in enumerate(zip(self.envs, self._actions)):
(
observation,
self._rewards[i],
self._terminateds[i]