Reinforcement Learning Exercise 3.12

最新推荐文章于 2024-01-03 16:23:28 发布

YeXiang\^-^/

最新推荐文章于 2024-01-03 16:23:28 发布

阅读量962

点赞数 1

CC 4.0 BY-SA版权

分类专栏： reinforcement learning 文章标签： reinforcement learning

本文链接：https://blog.youkuaiyun.com/ballade2012/article/details/90578348

reinforcement learning 专栏收录该内容

37 篇文章

订阅专栏

博客围绕强化学习展开，给出了用qπ和π表示vπ的方程推导过程。通过一系列概率公式的转换和定义，最终得出vπ(s)=∑[π(a∣s)⋅qπ(s,a)]的结果，体现了强化学习中不同概念间的数学关系。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Exercise 3.12 Give an equation for $vπv_\pi$ in terms of $qπq_\pi$ and $π\pi$ .

$\begin{aligned} v_\pi(s) &= \mathbb E_\pi(G_t|S_t=s) \\ &=\sum_{g_t}\bigl [ g_t \cdot p(g_t|s) \bigr ] \\ &=\sum_{g_t}\bigl [ g_t \cdot \frac {p(g_t, s)}{p(s)} \bigr ] \\ &=\sum_{g_t}\bigl [ g_t \cdot \frac{ \sum_{a \in \mathcal A} p(g_t, s, a)}{p(s)} \bigr ] \\ &=\sum_{g_t}\Bigl \{ g_t \cdot \frac{ \sum_{a \in \mathcal A} \bigl [p(g_t| s, a) \cdot p(s, a) \bigr ] }{p(s)} \Bigr \} \\ &=\sum_{g_t}\Bigl \{ g_t \cdot \frac{ \sum_{a \in \mathcal A} \bigl [p(g_t| s, a) \cdot p(a | s) \cdot p(s) \bigr ]}{p(s) \bigr ] } \Bigr \} \\ &=\sum_{g_t}\Bigl \{ g_t \cdot \sum_{a \in \mathcal A} \bigl [p(g_t| s, a) \cdot p(a | s) \bigr ] \Bigr \} \\ &=\sum_{a \in \mathcal A} \Bigl \{ p(a|s) \sum_{g_t} \bigl [ g_t \cdot p(g_t | s, a) \bigr ] \Bigr \} \end{aligned}$
According to definition, $\pi(a|s)$ , $∑gt[gt⋅p(gt∣s,a)]=qπ(s,a)\sum_{g_t} \bigl [ g_t \cdot p(g_t | s, a) \bigr ] = q_\pi(s,a)$ , so there is:
$v_\pi(s) = \sum_{a \in \mathcal A} \bigl [ \pi(a|s) \cdot q_\pi(s,a) \bigr ]$