深度强化学习（DRL）四：DQN的实战(DQN, Double DQN, Dueling DQN)

原创

已于 2022-03-08 09:35:04 修改 · 3.3k 阅读

13 ·

CC 4.0 BY-SA版权

文章标签：

#强化学习

于 2022-03-08 08:35:35 首次发布

本文详细介绍了深度强化学习中的DQN算法，包括环境设置、经验回放、DNN网络实现，并进一步探讨了DoubleDQN和DuelingDQN的改进，旨在消除Q值过度估计问题并增强状态和动作的区分。通过调整超参数，比较三种算法在CartPole-v1环境中的性能表现。

一、环境

查看可用的环境

from gym import envs
print(envs.registry.all())

ValuesView(├──CartPole: [ v0, v1 ]
├──MountainCar: [ v0 ]
├──MountainCarContinuous: [ v0 ]
├──Pendulum: [ v1 ]
├──Acrobot: [ v1 ]
├──LunarLander: [ v2 ]
├──LunarLanderContinuous: [ v2 ]
├──BipedalWalker: [ v3 ]
├──BipedalWalkerHardcore: [ v3 ]
├──CarRacing: [ v1 ]
├──Blackjack: [ v1 ]
├──FrozenLake: [ v1 ]
├──FrozenLake8x8: [ v1 ]
├──CliffWalking: [ v0 ]
├──Taxi: [ v3 ]
├──Reacher: [ v2 ]
├──Pusher: [ v2 ]
├──Thrower: [ v2 ]
├──Striker: [ v2 ]
├──InvertedPendulum: [ v2 ]

依旧使用CartPole-v1的环境，改写reward的值


# print(env.action_space)  # number of action
# print(env.observation_space)  # number of state
# print(env.observation_space.high)
# print(env.observation_space.low)

NUM_ACTIONS = env.action_space.n
NUM_STATES = env.observation_space.shape[0]
ENV_A_SHAPE = 0 if isinstance(env.action_space.sample(), int) else env.action_space.sample.shape

RL = DQN(n_action=NUM_ACTIONS, n_state=NUM_STATES, learning_rate=0.01)  # choose algorithm
total_steps = 0
for episode in range(1000):
    state, info = env.reset(return_info=True)
    ep_r = 0
    while True:
        env.render()  # update env
        action = RL.choose_action(state)  # choose action
        state_, reward, done, info = env.step(action)  # take action and get next state and reward
        x, x_dot, theta, theta_dot = state_  # change given reward
        r1 = (env.x_threshold - abs(x)) / env.x_threshold - 0.8
        r2 = (env.theta_threshold_radians - abs(theta)) / env.theta_threshold_radians - 0.5
        reward = r1 + r2  # consider both locations and radians

        RL.store_transition(state, action, reward, s