强化学习系列(5) - DQN及其改进

本文介绍深度Q网络(DQN)在强化学习中的应用,通过两个实例:爬山小车和小拖车车杆,展示了如何使用DQN解决连续状态空间问题。文章深入解析了DQN的工作原理,包括状态到动作的映射学习、神经网络权重更新、解决Q表计算存储复杂度问题的方法,以及DQN的改进如doubleDQN、PrioritizedExperienceReplay等。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

采用两个深度神经网络(DNN)来学习状态到动作的映射,和神经网络权重的更新,以解决Q表状态-动作值决策时空间增长而计算存储高复杂度的问题。此外,还包括double DQN(解决过拟合),Prioritized Experience Replay(解决以更低的计算时间获得收敛效果),和Dueling DQN这些对DQN的提升方法。

"""
Zoe大脑RL_brain
"""
import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = '2'
import numpy as np 
import pandas as pd 
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

np.random.seed(1)
tf.set_random_seed(1)

# Deep Q Network off-policy
class DeepQNetwork:
    def __init__(
        self,
        n_actions,
        n_features,
        learning_rate=0.01,
        reward_decay=0.9,
        e_greedy=0.9,
        replace_target_iter=300,
        memory_size=500,
        batch_size=32,
        e_greedy_increment=None,
        output_graph=False,
    ):
        self.n_actions = n_actions
        self.n_features = n_features
        self.lr = learning_rate
        self.gamma = reward_decay
        self.epsilon_max = e_greedy
        self.replace_target_iter = replace_target_iter
        self.memory_size = memory_size
        self.batch_size = batch_size
        self.epsilon_increment = e_greedy_increment
        self.epsilon = 0 if e_greedy_increment is not None else self.epsilon_max

        # total learning step
        self.learn_step_counter = 0

        # initialize zero memory [s, a, r, s_]
        self.memory = np.zeros((self.memory_size, n_features * 2 + 2))

        # consist of [target_net, evaluate_net]
        self._build_net()
        t_params = tf.get_collection('target_net_params')
        e_params = tf.get_collection('eval_net_params')
        self.replace_target_op = [tf.assign(t, e) for t, e in zip(t_params, e_params)]

        self.sess = tf.Session()

        if output_graph:
            # $ tensorboard --logdir=logs
            # tf.train.SummaryWriter soon be deprecated, use following
            tf.summary.FileWriter("logs/", self.sess.graph)

        self.sess.run(tf.global_variables_initializer())
        self.cost_his = []

        # [n.name for n in tf.get_default_graph().as_graph_def().node]
        # print([n.name for n in tf.get_default_graph().as_graph_def().node])
        tensor_name_list = [tensor.name for tensor in tf.get_default_graph().as_graph_def().node]
 
        txt_path = './txt/节点名称'
        full_path = txt_path+ '.txt'
 
        for tensor_name in tensor_name_list:
            name = tensor_name + '\n'
            file = open(full_path,'a+')
            file.write(name)
        file.close()
    
    def _build_net(self):
        #--------------------build evaluate_net-----------------
        self.s = tf.placeholder(tf.float32, [None, self.n_features
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值