目录
文章目录
Motivation Examples
TD learning of State Values
TD learning of Action Values: Sarsa
TD learning of Action Values: Expected Sarsa
TD learning of Action Values: n-step Sarsa
TD learning of Optimal Action Values: Q-learning
A Unified Point od View
参考资料
《Mathematical Foundations of Reinforcement Learning》-Zhao Shiyu
《强化学习》-UCAS研究生课程