关于蒙特祖玛的复仇之实验_怎么用python玩蒙特祖玛的复仇-优快云博客

本文链接：https://blog.youkuaiyun.com/fangzhang233/article/details/134807951

本文讨论了在Atari游戏中，不同探索方法如RND、NOSIYNET和作者的新方法CFN在MontezumasRevenge等挑战性任务中的表现。CFN在高随机性环境中优于预测误差为基础的方法，且大回放缓冲区对CFN性能有提升。对比结果显示，CFN在右侧更复杂的版本中超越RND，挑战传统的雷氏对比法。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

ON BONUS-BASED EXPLORATION METHODS IN THE ARCADE LEARNING ENVIRONMENT

atari里对于游戏的分类，主要关注右边一列

这么看起来，CTS上限更高啊

感觉pitfall好难啊，大家都差不多

具体的

RND似乎只是在蒙特祖玛上提升特别多，别的不咋地

nosiy net对于非地图类的提升也不错。

长期到one billion来看，RND还是可以的

最新的coin-fliping呢？看看论文里说的（ON BONUS-BASED EXPLORATION METHODS IN THE ARCADE LEARNING ENVIRONMENT）

4.5. Performance in MONTEZUMA’S REVENGE

Finally, we test our method on the challenging exploration benchmark: MONTEZUMA’S REVENGE. We follow the experimental design suggested by Machado et al. (2015) and compare CFN to baseline Rainbow, PixelCNN and RND. Figure 7 shows that we comfortably outperform Rainbow in this task. All exploration algorithms perform similarly, a result also corroborated by Taiga et al. (2020). Since all exploration methods perform similarly on the default task, we created a more challenging versions of MONTEZUMA’S REVENGE by varying the amount of transition noise (via the “sticky action” probability (Machado et al., 2018)). Figure 7 (right) shows that CFN outperforms RND at higher levels of stochasticity; this supports our hypothesis that count-based bonuses are better suited for stochastic environments than prediction-error based methods. Notably, we find that having a large replay buffer for CFN slightly improves performance, which increases memory requirements for this experiment.

从左边来看，不如RND

从右边来看，强于RND

原来是右边只到了100 million就停止了，左边持续到 200 million！好险恶的雷氏对比法！！