Exercise 3.12 Give an equation for vπv_\pivπ in terms of qπq_\piqπ and π\piπ.
vπ(s)=Eπ(Gt∣St=s)=∑gt[gt⋅p(gt∣s)]=∑gt[gt⋅p(gt,s)p(s)]=∑gt[gt⋅∑a∈Ap(gt,s,a)p(s)]=∑gt{gt⋅∑a∈A[p(gt∣s,a)⋅p(s,a)]p(s)}=∑gt{gt⋅∑a∈A[p(gt∣s,a)⋅p(a∣s)⋅p(s)]p(s)]}=∑gt{gt⋅∑a∈A[p(gt∣s,a)⋅p(a∣s)]}=∑a∈A{p(a∣s)∑gt[gt⋅p(gt∣s,a)]}
\begin{aligned}
v_\pi(s) &= \mathbb E_\pi(G_t|S_t=s) \\
&=\sum_{g_t}\bigl [ g_t \cdot p(g_t|s) \bigr ] \\
&=\sum_{g_t}\bigl [ g_t \cdot \frac {p(g_t, s)}{p(s)} \bigr ] \\
&=\sum_{g_t}\bigl [ g_t \cdot \frac{ \sum_{a \in \mathcal A} p(g_t, s, a)}{p(s)} \bigr ] \\
&=\sum_{g_t}\Bigl \{ g_t \cdot \frac{ \sum_{a \in \mathcal A} \bigl [p(g_t| s, a) \cdot p(s, a) \bigr ] }{p(s)} \Bigr \} \\
&=\sum_{g_t}\Bigl \{ g_t \cdot \frac{ \sum_{a \in \mathcal A} \bigl [p(g_t| s, a) \cdot p(a | s) \cdot p(s) \bigr ]}{p(s) \bigr ] } \Bigr \} \\
&=\sum_{g_t}\Bigl \{ g_t \cdot \sum_{a \in \mathcal A} \bigl [p(g_t| s, a) \cdot p(a | s) \bigr ] \Bigr \} \\
&=\sum_{a \in \mathcal A} \Bigl \{ p(a|s) \sum_{g_t} \bigl [ g_t \cdot p(g_t | s, a) \bigr ] \Bigr \}
\end{aligned}
vπ(s)=Eπ(Gt∣St=s)=gt∑[gt⋅p(gt∣s)]=gt∑[gt⋅p(s)p(gt,s)]=gt∑[gt⋅p(s)∑a∈Ap(gt,s,a)]=gt∑{gt⋅p(s)∑a∈A[p(gt∣s,a)⋅p(s,a)]}=gt∑{gt⋅p(s)]∑a∈A[p(gt∣s,a)⋅p(a∣s)⋅p(s)]}=gt∑{gt⋅a∈A∑[p(gt∣s,a)⋅p(a∣s)]}=a∈A∑{p(a∣s)gt∑[gt⋅p(gt∣s,a)]}
According to definition, p(a∣s)=π(a∣s)p(a|s) = \pi(a|s)p(a∣s)=π(a∣s), ∑gt[gt⋅p(gt∣s,a)]=qπ(s,a)\sum_{g_t} \bigl [ g_t \cdot p(g_t | s, a) \bigr ] = q_\pi(s,a)∑gt[gt⋅p(gt∣s,a)]=qπ(s,a), so there is:
vπ(s)=∑a∈A[π(a∣s)⋅qπ(s,a)]
v_\pi(s) = \sum_{a \in \mathcal A} \bigl [ \pi(a|s) \cdot q_\pi(s,a) \bigr ]
vπ(s)=a∈A∑[π(a∣s)⋅qπ(s,a)]