Reinforcement Learning Exercise 5.10

Exercise 5.10 Derive the weighted-average update rule (5.8) from (5.7). Follow the pattern of the derivation of the unweighted rule (2.3)

According to:
Vn≐∑k=1n−1WkGk∑k=1n−1Wk,n≥2(5.7) V_{n} \doteq \frac{\sum_{k=1}^{n - 1}W_k G_k}{\sum_{k=1}^{n - 1}W_k} \text{,} \qquad n \geq 2 \qquad \text{(5.7)} \\ Vnk=1n1Wkk=1n1WkGk,n2(5.7)
and denote CnC_nCn as the weights given to the first n returns. So formula (5.7) is transferred to:
Vn≐∑k=1n−1WkGkCn−1,n≥2 V_{n} \doteq \frac{\sum_{k=1}^{n - 1}W_k G_k}{C_{n-1}} \text{,} \qquad n \geq 2 VnCn1k=1n1WkGk,n2
then we have:
Vn+1≐∑k=1nWkGkCn,n≥1 V_{n+1} \doteq \frac{\sum_{k=1}^{n}W_k G_k}{C_n} \text{,} \qquad n \geq 1 Vn+1Cnk=1nWkGk,n1
∴Vn+1=∑k=1n−1WkGkCn+WnGnCn=Cn−1CnVn+WnGnCn=(1−WnCn)Vn+WnGnCn=Vn+WnCn(Gn−Vn),n≥1,(5.8) \begin{aligned} \therefore V_{n+1} &= \frac{\sum_{k=1}^{n - 1}W_k G_k}{C_n}+\frac{W_nG_n}{C_n}\\ &=\frac{C_{n-1}}{C_{n}} V_n +\frac{W_nG_n}{C_n} \\ &= (1 - \frac{W_n}{C_n})V_n + \frac{W_nG_n}{C_n} \\ &=V_n + \frac{W_n}{C_n}(G_n - V_n), \qquad n \geq 1, \qquad \text{(5.8)} \end{aligned} Vn+1=Cnk=1n1WkGk+CnWnGn=CnCn1Vn+CnWnGn=(1CnWn)Vn+CnWnGn=Vn+CnWn(GnVn),n1,(5.8)
This derivation is very easy.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值