
David Silver 强化学习
文章平均质量分 94
SuperFeHanHan
人生得意须尽憨!
展开
-
强化学习实践 | DQN和OpenAI Gym中的CartPole
强化学习实践 | DQN1. 直觉介绍2. Experience replay 和 Fixed Q-targets3. 伪代码4. PyTorch实现4.1 CartPole介绍4.2 Dummy Policy4.3 DQN4.4 完整代码(附注释):原论文:Playing Atari with Deep Reinforcement Learning参考:https://mofanpy.com/tutorials/machine-learning/reinforcement-learning/int原创 2021-04-17 06:00:19 · 931 阅读 · 1 评论 -
强化学习实践 | Sarsa / Sarsa(lambda) 例子
强化学习实践 | Sarsa 例子Q-Learning和Sarsa的区别Sarsa 伪代码二维走迷宫的例子:Sarsa(λ\lambdaλ)参考:https://mofanpy.com/tutorials/machine-learning/reinforcement-learning/tabular-sarsa2/Q-Learning和Sarsa的区别epsilon-greedy使用时机不一样:Q-Learning用epsilon-Greedy选择当前时刻(s)的动作。Sarsa用epsilon-G原创 2021-04-16 22:47:01 · 444 阅读 · 1 评论 -
强化学习实践 | Q-Learning 几个例子
强化学习 Q-Learning 几个例子Q-Learning 算法回顾1. 一维找宝藏总结:2. 二维总结:参考:例子1:https://mofanpy.com/tutorials/machine-learning/reinforcement-learning/general-rl/例子2:https://mofanpy.com/tutorials/machine-learning/reinforcement-learning/tabular-q1/Q-Learning 算法回顾初始化Q表原创 2021-04-16 20:58:22 · 739 阅读 · 2 评论 -
David Silver Lecture 6 | Value Function Approximation
David Silver Lecture 6 | Value Function Approximation1. Introduction2. Incremental Methods2.1 Value Function 预测 | Gradient Descent2.1.1 v^(S,w)\hat{v}(S,w)v^(S,w): Feature Vectors | Linear Function ApproximationTable Lookup Features2.1.2 vπ(S)v_\pi(S)vπ(S原创 2021-04-14 21:22:59 · 528 阅读 · 1 评论 -
David Silver Lecture 5 | Model-Free Control
David Silver Lecture 5 | Model-Free Control1 Introduction1.1 On-policy Learning1.2 Off-policy Learning2 On-Policy Monte-Carlo Control2.1 Generalised Policy Iteration - MC Policy Iteration2.2 ϵ−\epsilon-ϵ−Greedy2.3 GLIE (Greedy in the Limit with Infinite Ex原创 2021-04-08 00:11:39 · 599 阅读 · 0 评论 -
David Silver Lecture 4 | Model-Free Prediction
David Silver Lecture 4 | Model-Free Prediction1. Introduction2. Monte-Carlo Learning2.1 First-Visit Monte-Carlo Policy Evaluation2.2 Every-Visit Monte-Carlo Policy Evaluation例子:21点 Blackjack3. Temporal-Difference Learning4. TD(λ)(\lambda)(λ)参考:https://zhu原创 2021-04-06 19:19:54 · 509 阅读 · 1 评论 -
David Silver Lecture 3 | 动态规划寻找最优策略
David Silver Lecture 3 | 动态规划寻找最优策略1 Introduction2 Policy Evaluation | 得到一个Policy下不同状态的Value2.1 Iterative Policy Evaluation例子:Small Grid World | 给定一个策略,估计该策略下不同State的Value3 Policy Iteration | 找到MDP下最优的Policy3.1 How to improve a policy4 Value Iteration5 Ext原创 2021-03-26 01:39:22 · 686 阅读 · 0 评论 -
David Silver Lecture 2|马尔科夫决策过程
David Silver Lecture 2|马尔科夫决策过程1. Markov Processes1.0 Introduction1.1 Markov Property1.2 Markov Chain例子:2 Markov Reward Processes (MRP)2.1 Markov Reward Processes (MRP) | 引入了R和Discount Factor例子:2.2 Return Gt [是一个随机变量]2.3 Value Function v(s) [是Gt的期望]例子:2.4原创 2021-03-23 01:54:01 · 983 阅读 · 0 评论 -
David Silver Lecture 1 | 强化学习简介
David Silver Course 1 | 深度强化算法学习 +项目讲解0. 强化学习特点1.1 Reward1.2 Sequential Decision Making1.3 History & State1.4 Fully / Partial Observable Environments2.0 Major Components of an RL agent2.1 Policy2.2 Value Function (给定一个Policy之下评估一个状态)2.3 ModelMaze Examp原创 2021-03-16 17:30:43 · 390 阅读 · 0 评论