深度Q网络(DQN)的优化:优先经验回放与改进实现
1. 双DQN算法与目标网络
双DQN(Double DQN)结合目标网络的算法流程如下:
Algorithm 5.2
Double DQN with a target network
1: Initialize learning rate α
2: Initialize τ
3: Initialize number of batches per training step, B
4: Initialize number of updates per batch, U
5: Initialize batch size N
6: Initialize experience replay memory with max size K
7: Initialize target network update frequency F
8: Randomly initialize the network parameters θ
9: Initialize the target network parameters ϕ = θ
10: for m = 1 ... MAX_STEPS do
11:
Gather and store h experiences (si, ai, ri, s′i) using the current policy
12:
for b = 1 ... B do
13:
Sample a batch, b, of experiences from the experience replay memory
14:
for u
超级会员免费看
订阅专栏 解锁全文
70

被折叠的 条评论
为什么被折叠?



