关于蒙特祖玛的复仇之实验

本文讨论了在Atari游戏中,不同探索方法如RND、NOSIYNET和作者的新方法CFN在MontezumasRevenge等挑战性任务中的表现。CFN在高随机性环境中优于预测误差为基础的方法,且大回放缓冲区对CFN性能有提升。对比结果显示,CFN在右侧更复杂的版本中超越RND,挑战传统的雷氏对比法。

ON BONUS-BASED EXPLORATION METHODS IN THE ARCADE LEARNING ENVIRONMENT

atari里对于游戏的分类,主要关注右边一列

这么看起来,CTS上限更高啊

感觉pitfall好难啊,大家都差不多

具体的

RND似乎只是在蒙特祖玛上提升特别多,别的不咋地

nosiy net对于非地图类的提升也不错。

长期到one billion来看,RND还是可以的

最新的coin-fliping呢?看看论文里说的(ON BONUS-BASED EXPLORATION METHODS IN THE ARCADE LEARNING ENVIRONMENT)

4.5. Performance in MONTEZUMA’S REVENGE

Finally, we test our method on the challenging exploration benchmark: MONTEZUMA’S REVENGE. We follow the experimental design suggested by Machado et al. (2015) and compare CFN to baseline Rainbow, PixelCNN and RND. Figure 7 shows that we comfortably outperform Rainbow in this task. All exploration algorithms perform similarly, a result also corroborated by Taiga et al. (2020). Since all exploration methods perform similarly on the default task, we created a more challenging versions of MONTEZUMA’S REVENGE by varying the amount of transition noise (via the “sticky action” probability (Machado et al., 2018)). Figure 7 (right) shows that CFN outperforms RND at higher levels of stochasticity; this supports our hypothesis that count-based bonuses are better suited for stochastic environments than prediction-error based methods. Notably, we find that having a large replay buffer for CFN slightly improves performance, which increases memory requirements for this experiment. 

从左边来看,不如RND

从右边来看,强于RND

原来是右边只到了100 million就停止了,左边持续到 200 million!好险恶的雷氏对比法!!

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值