Reinforcement Learning Exercise 3.29

最新推荐文章于 2024-01-03 16:23:28 发布

原创最新推荐文章于 2024-01-03 16:23:28 发布 · 502 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#reinforcement learning

reinforcement learning 专栏收录该内容

37 篇文章

订阅专栏

This exercise focuses on reformulating the Bellman equations for four key value functions in Reinforcement Learning: vπ, v∗, qπ, and q∗ using the state transition probability function p and reward function r. The derivations are provided for each function, illustrating how they can be expressed in terms of these functions." 111434171,10296849,Python爬虫分析2017-2018欧洲五大联赛,"['Python', '数据可视化', '足球分析', '数据爬取', '数据分析']

Exercise 3.29 Rewrite the four Bellman equations for the four value functions ( $v_\pi$ , $v *$ , $q_\pi$ , and $q_*$ ) in terms of the three argument function p (3.4) and the two-argument function r(3.5).

For $v_\pi$ :
$\begin{aligned} v_\pi(s) &= \sum_a \pi(a|s) \sum_{s', r}p(s', r | s, a) \bigl [ r + \gamma v_\pi(s') \bigr ] \\ &= \sum_a \pi(a|s) \bigl [ \sum_{s', r}rp(s', r | s, a) + \sum_{s',r}\gamma v_\pi(s') p(s', r | s, a) \bigr ] \\ &= \sum_a \pi(a|s) \bigl [ \sum_{r}rp( r | s, a) + \sum_{s'}\gamma v_\pi(s') p(s' | s, a) \bigr ] \\ &= \sum_a \pi(a|s) \bigl [ r(s,a)+ \sum_{s'}\gamma v_\pi(s') p(s' | s, a) \bigr ] \\ \end{aligned}$
For $v_*$ :
$\begin{aligned} v_*(s) &= \max_a \Bigl \{ \sum_{s',r} p(s',r|s,a) \bigl [ r + \gamma v_*(s') \bigr ] \Bigr \} \\ &= \max_a \Bigl \{ \sum_{s',r} r p(s',r|s,a) + \sum_{s',r}\gamma v_*(s') p(s',r|s,a) \Bigr \} \\ &= \max_a \Bigl \{ \sum_{r} r p(r|s,a) + \sum_{s'}\gamma v_*(s') p(s'|s,a) \Bigr \} \\ &= \max_a \Bigl \{ r(s,a) + \sum_{s'}\gamma v_*(s') p(s'|s,a) \Bigr \} \\ \end{aligned}$
For $q_\pi$ , please look into exercise 3.19. https://blog.youkuaiyun.com/ballade2012/article/details/89164995

For $q_*$ :
$\begin{aligned} q_*(s,a) &= \sum_{s',r} p(s', r|s,a) \bigl [ r + \gamma \max_{a'} q_*(s', a') \bigr ] \\ &= \sum_{s', r} r p(s',r|s,a) + \sum_{s',r} p(s',r|s,a) \gamma \max_{a'} q_*(s', a') \\ &= \sum_{ r} r p(r|s,a) + \sum_{s'} p(s'|s,a) \gamma \max_{a'} q_*(s', a') \\ &= r (s,a) + \gamma \sum_{s'} P_{s,s'}^a \max_{a'} q_*(s', a') \\ \end{aligned}$