关于蒙特祖玛的复仇之实验

本文讨论了在Atari游戏中,不同探索方法如RND、NOSIYNET和作者的新方法CFN在MontezumasRevenge等挑战性任务中的表现。CFN在高随机性环境中优于预测误差为基础的方法,且大回放缓冲区对CFN性能有提升。对比结果显示,CFN在右侧更复杂的版本中超越RND,挑战传统的雷氏对比法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

ON BONUS-BASED EXPLORATION METHODS IN THE ARCADE LEARNING ENVIRONMENT

atari里对于游戏的分类,主要关注右边一列

这么看起来,CTS上限更高啊

感觉pitfall好难啊,大家都差不多

具体的

RND似乎只是在蒙特祖玛上提升特别多,别的不咋地

nosiy net对于非地图类的提升也不错。

长期到one billion来看,RND还是可以的

最新的coin-fliping呢?看看论文里说的(ON BONUS-BASED EXPLORATION METHODS IN THE ARCADE LEARNING ENVIRONMENT)

4.5. Performance in MONTEZUMA’S REVENGE

Finally, we test our method on the challenging exploration benchmark: MONTEZUMA’S REVENGE. We follow the experimental design suggested by Machado et al. (2015) and compare CFN to baseline Rainbow, PixelCNN and RND. Figure 7 shows that we comfortably outperform Rainbow in this task. All exploration algorithms perform similarly, a result also corroborated by Taiga et al. (2020). Since all exploration methods perform similarly on the default task, we created a more challenging versions of MONTEZUMA’S REVENGE by varying the amount of transition noise (via the “sticky action” probability (Machado et al., 2018)). Figure 7 (right) shows that CFN outperforms RND at higher levels of stochasticity; this supports our hypothesis that count-based bonuses are better suited for stochastic environments than prediction-error based methods. Notably, we find that having a large replay buffer for CFN slightly improves performance, which increases memory requirements for this experiment. 

从左边来看,不如RND

从右边来看,强于RND

原来是右边只到了100 million就停止了,左边持续到 200 million!好险恶的雷氏对比法!!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值