强化学习
wang2008start
这个作者很懒,什么都没留下…
展开
专栏收录文章
- 默认排序
- 最新发布
- 最早发布
- 最多阅读
- 最少阅读
-
Reinforcement Learning_By David Silver笔记六: Value Function Approximation
Value Function Approximation原创 2017-12-11 17:06:08 · 352 阅读 · 0 评论 -
Reinforcement Learning_By David Silver笔记七: Policy Gradient Methods
Policy Gradient Methods原创 2017-12-11 17:06:57 · 487 阅读 · 0 评论 -
Reinforcement Learning_By David Silver笔记九: Exploration and Exploitation
Exploration and Exploitation原创 2017-12-11 17:08:20 · 400 阅读 · 0 评论 -
Reinforcement Learning_By David Silver笔记八: Integrating Learning and Planning
Integrating Learning and Planning原创 2017-12-11 17:07:40 · 291 阅读 · 0 评论 -
Reinforcement Learning_By David Silver笔记一: Introduction
Introduction Agent and Environment,History and state, Agent state, Environment state, Information stat, Fully observable enviroments, Partially observable enviroments 环境完全可观测 环境部分可观测 Policy:原创 2017-12-11 15:52:20 · 328 阅读 · 0 评论 -
Reinforcement Learning_By David Silver笔记二: Markov Decision Processes
Markov Process Markov Reward Process Markov Decision Process (Markov reward process with decisions) a policy is a distribution over actions given states. GIven an MDP and policy,原创 2017-12-11 17:00:03 · 321 阅读 · 0 评论 -
Reinforcement Learning_By David Silver笔记三: Planning by Dynamic Programming
Policy Evaluation Policy Iteration Value Itera 2. Policy Iteration(Any optimal policy can be subdivided into two components:An optimal first action A,Followed by an optimal poli原创 2017-12-11 17:01:53 · 378 阅读 · 0 评论 -
Reinforcement Learning_By David Silver笔记四: Model Free Prediction
前面的动态规划主要用来解决model已知的MDP问题,这里主要解决model/环境未知时的MDP预估价值函数问题,方法主要有: MC方法:不需要知道转移矩阵或回报矩阵,在非马尔科夫环境中高效 时序差分方法:Monte-Carlo Learning 直接从experience的episode中学习 不需要MDP的transition、rewards 主要思想:value = mean return原创 2017-12-11 17:03:36 · 270 阅读 · 0 评论 -
Reinforcement Learning_By David Silver笔记五: Model Free Control
(Optimise the value function of an unknown MDP)On-policy learning —— Learn about policy π from experience sampled from πOff-policy learning —— Learn about policy π from experience sampled from u On-Po原创 2017-12-11 17:05:04 · 307 阅读 · 0 评论
分享