- Markov Process
Markov Reward Process
直接求解的时间复杂度是O(N^3), 对于small MRPs,可使用直接计算的方法,对于large MRPs使用如下迭代法:动态规划,蒙特卡洛评估,时序差分学习
Markov Decision Process (Markov reward process with decisions)
- a policy is a distribution over actions given states. GIven an MDP and policy, the state sequence is Markov process, the state and reward sequence is Markov reward process.
- state-value function of an MDP is the expected return starting from state and then following policy
- action-value function is the expected return starting from state, taking action and following policy
Reinforcement Learning_By David Silver笔记二: Markov Decision Processes
最新推荐文章于 2020-11-20 13:50:24 发布
本文探讨了马尔可夫过程的基本概念,包括马尔可夫过程、马尔可夫奖励过程及其时间复杂度。此外,还介绍了针对大规模马尔可夫奖励过程的迭代方法,如动态规划、蒙特卡洛评估与时序差分学习。文中进一步解释了马尔可夫决策过程的概念及策略定义,并阐述了状态价值函数与动作价值函数。

3071

被折叠的 条评论
为什么被折叠?



