深度强化学习(DRL)四:DQN的实战(DQN, Double DQN, Dueling DQN)

本文详细介绍了深度强化学习中的DQN算法,包括环境设置、经验回放、DNN网络实现,并进一步探讨了DoubleDQN和DuelingDQN的改进,旨在消除Q值过度估计问题并增强状态和动作的区分。通过调整超参数,比较三种算法在CartPole-v1环境中的性能表现。

全部代码:

https://github.com/ColinFred/Reinforce_Learning_Pytorch/tree/main/RL/DQN

一、环境

查看可用的环境

from gym import envs
print(envs.registry.all())
ValuesView(├──CartPole: [ v0, v1 ]
├──MountainCar: [ v0 ]
├──MountainCarContinuous: [ v0 ]
├──Pendulum: [ v1 ]
├──Acrobot: [ v1 ]
├──LunarLander: [ v2 ]
├──LunarLanderContinuous: [ v2 ]
├──BipedalWalker: [ v3 ]
├──BipedalWalkerHardcore: [ v3 ]
├──CarRacing: [ v1 ]
├──Blackjack: [ v1 ]
├──FrozenLake: [ v1 ]
├──FrozenLake8x8: [ v1 ]
├──CliffWalking: [ v0 ]
├──Taxi: [ v3 ]
├──Reacher: [ v2 ]
├──Pusher: [ v2 ]
├──Thrower: [ v2 ]
├──Striker: [ v2 ]
├──InvertedPendulum: [ v2 ]

依旧使用CartPole-v1的环境,改写reward的值


# print(env.action_space)  # number of action
# print(env.observation_space)  # number of state
# print(env.observation_space.high)
# print(env.observation_space.low)

NUM_ACTIONS = env.action_space.n
NUM_STATES = env.observation_space.shape[0]
ENV_A_SHAPE = 0 if isinstance(env.action_space.sample(), int) else env.action_space.sample.shape

RL = DQN(n_action=NUM_ACTIONS, n_state=NUM_STATES, learning_rate=0.01)  # choose algorithm
total_steps = 0
for episode in range(1000):
    state, info = env.reset(return_info=True)
    ep_r = 0
    while True:
        env.render()  # update env
        action = RL.choose_action(state)  # choose action
        state_, reward, done, info = env.step(action)  # take action and get next state and reward
        x, x_dot, theta, theta_dot = state_  # change given reward
        r1 = (env.x_threshold - abs(x)) / env.x_threshold - 0.8
        r2 = (env.theta_threshold_radians - abs(theta)) / env.theta_threshold_radians - 0.5
        reward = r1 + r2  # consider both locations and radians

        RL.store_transition(state, action, reward, s
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值