深度Q网络(DQN)的改进:优先经验回放(PER)
1. 双DQN与目标网络算法
双DQN与目标网络算法的具体步骤如下:
Algorithm 5.2
Double DQN with a target network
1: Initialize learning rate α
2: Initialize τ
3: Initialize number of batches per training step, B
4: Initialize number of updates per batch, U
5: Initialize batch size N
6: Initialize experience replay memory with max size K
7: Initialize target network update frequency F
8: Randomly initialize the network parameters θ
9: Initialize the target network parameters ϕ = θ
10: for m = 1 . . . MAX_STEPS do
11:
Gather and store h experiences (si, ai, ri, s′i) using the current policy
12:
for b = 1 . . . B do
13:
Sample a batch, b, of experiences from the experience replay memory
14:
for u = 1 .
超级会员免费看
订阅专栏 解锁全文
76

被折叠的 条评论
为什么被折叠?



