13、深度Q网络（DQN）的改进：优先经验回放（PER）

脸先着地天使

于 2025-09-09 15:31:57 发布

阅读量25

点赞数

CC 4.0 BY-SA版权

分类专栏：深度强化学习实战指南文章标签： DQN 双DQN 目标网络

本文链接：https://blog.youkuaiyun.com/jwt8token/article/details/152340358

深度强化学习实战指南专栏收录该内容

35 篇文章 ¥499.90

订阅专栏¥69.90

会员秒杀 ¥9.9 重磅福利

超级会员免费看

深度Q网络（DQN）的改进：优先经验回放（PER）

1. 双DQN与目标网络算法

双DQN与目标网络算法的具体步骤如下：

Algorithm 5.2
Double DQN with a target network
1: Initialize learning rate α
2: Initialize τ
3: Initialize number of batches per training step, B
4: Initialize number of updates per batch, U
5: Initialize batch size N
6: Initialize experience replay memory with max size K
7: Initialize target network update frequency F
8: Randomly initialize the network parameters θ
9: Initialize the target network parameters ϕ = θ
10: for m = 1 . . . MAX_STEPS do
11:
    Gather and store h experiences (si, ai, ri, s′i) using the current policy
12:
    for b = 1 . . . B do
13:
        Sample a batch, b, of experiences from the experience replay memory
14:
        for u = 1 .