[Note In progress] Model-based Reinforcement Learning

本文探讨了模型驱动方法在控制理论领域的应用,强调了环境假设与近似的重要性。通过监督学习,如游戏训练世界模型,介绍了几种关键方法:世界模型利用变分自编码器实现梦境学习;Imagination-Augmented Agents结合预测与计划;Model-Based Priors for Model-Free Reinforcement Learning融合了有模型和无模型学习;Model-Based Value Expansion改善了价值估计,减少了学习复杂度。

 

Model based methods can be used in Control Theory. Environment has assumptions and approximations.

  1. Learn the model. By supervised learning, for instance. Play the game then train the world model.
    1. World models: one of my favorite approaches in which the agent can learn from it’s own “dreams” due to the Variable Auto-encoders, See paper and code.
    2. Imagination-Augmented Agents (I2A): learns to interpret predictions from a learned environment model to construct implicit plans in arbitrary ways, by using the predictions as additional context in deep policy networks. BAsically it’s a hybrid learning method because it combines model-baes and model-free methods. Paper and implementation.
    3. Model-Based Priors for Model-Free Reinforcement Learning (MBMF): aims to bridge tge gap between model-free and model-based reinforcement learning. See paper and code.
    4. Model-Based Value Expansion (MBVE): Authors of the paper state that this method controls for uncertainty in the model by only allowing imagination to fixed depth. By enabling wider use of learned dynamics models within a model-free reinforcement learning algorithm, we improve value estimation, which, in turn, reduces the sample complexity of learning.
  2. Learn given the model
    1. Check alphaGo-zero
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值