13、深度Q网络（DQN）的优化：优先经验回放与改进实现

脑补型产品

于 2025-10-18 11:02:07 发布

阅读量11

点赞数

CC 4.0 BY-SA版权

分类专栏：深度强化学习入门指南文章标签： DQN 双DQN 目标网络

本文链接：https://blog.youkuaiyun.com/mongodb5scout/article/details/154598544

深度强化学习入门指南专栏收录该内容

35 篇文章 ¥499.90

订阅专栏¥69.90

会员秒杀 ¥9.9 重磅福利

超级会员免费看

深度Q网络（DQN）的优化：优先经验回放与改进实现

1. 双DQN算法与目标网络

双DQN（Double DQN）结合目标网络的算法流程如下：

Algorithm 5.2
Double DQN with a target network
1: Initialize learning rate α
2: Initialize τ
3: Initialize number of batches per training step, B
4: Initialize number of updates per batch, U
5: Initialize batch size N
6: Initialize experience replay memory with max size K
7: Initialize target network update frequency F
8: Randomly initialize the network parameters θ
9: Initialize the target network parameters ϕ = θ
10: for m = 1 ... MAX_STEPS do
11:
    Gather and store h experiences (si, ai, ri, s′i) using the current policy
12:
    for b = 1 ... B do
13:
        Sample a batch, b, of experiences from the experience replay memory
14:
        for u