由于组里新同学进来,需要带着他入门RL,选择从silver的课程开始。
对于我自己,增加一个仔细阅读《reinforcement learning:an introduction》的要求。
因为之前读的不太认真,这一次希望可以认真一点,将对应的知识点也做一个简单总结。
Reinforcement learning problems involve learning what to do - how to map situations to actions - so as to maximize a numerical reward signal.
RL is different from supervised learning/unsupervised learning.
There is no supervisor (to tell what is best!), only a reward signal, must discover which actions yield the most reward by trying them out
action influence the environment and sub-sequential data; data distribution is not iid
Feedback is (sometimes) delayed, not instantaneous
trade-off between exploration and exploitation
for stochastic task, each action must be tried many times to gain a reliable estimate of its expected reward

本文是对Sutton的《reinforcement learning:an introduction》第一章的总结,介绍了强化学习的基本概念,如无监督学习的区别、环境互动、延迟反馈、探索与利用的权衡。强调了价值函数和策略在RL中的作用,对比了进化算法和RL的效率,并通过井字游戏示例解释了传统方法的局限性。同时概述了RL的历史和四个主要子元素:策略、奖励信号、价值函数及环境模型。
最低0.47元/天 解锁文章
1093

被折叠的 条评论
为什么被折叠?



