全部代码:
https://github.com/ColinFred/Reinforce_Learning_Pytorch/tree/main/RL/DQN
一、环境
查看可用的环境
from gym import envs
print(envs.registry.all())
ValuesView(├──CartPole: [ v0, v1 ]
├──MountainCar: [ v0 ]
├──MountainCarContinuous: [ v0 ]
├──Pendulum: [ v1 ]
├──Acrobot: [ v1 ]
├──LunarLander: [ v2 ]
├──LunarLanderContinuous: [ v2 ]
├──BipedalWalker: [ v3 ]
├──BipedalWalkerHardcore: [ v3 ]
├──CarRacing: [ v1 ]
├──Blackjack: [ v1 ]
├──FrozenLake: [ v1 ]
├──FrozenLake8x8: [ v1 ]
├──CliffWalking: [ v0 ]
├──Taxi: [ v3 ]
├──Reacher: [ v2 ]
├──Pusher: [ v2 ]
├──Thrower: [ v2 ]
├──Striker: [ v2 ]
├──InvertedPendulum: [ v2 ]
依旧使用CartPole-v1的环境,改写reward的值
# print(env.action_space) # number of action
# print(env.observation_space) # number of state
# print(env.observation_space.high)
# print(env.observation_space.low)
NUM_ACTIONS = env.action_space.n
NUM_STATES = env.observation_space.shape[0]
ENV_A_SHAPE = 0 if isinstance(env.action_space.sample(), int) else env.action_space.sample.shape
RL = DQN(n_action=NUM_ACTIONS, n_state=NUM_STATES, learning_rate=0.01) # choose algorithm
total_steps = 0
for episode in range(1000):
state, info = env.reset(return_info=True)
ep_r = 0
while True:
env.render() # update env
action = RL.choose_action(state) # choose action
state_, reward, done, info = env.step(action) # take action and get next state and reward
x, x_dot, theta, theta_dot = state_ # change given reward
r1 = (env.x_threshold - abs(x)) / env.x_threshold - 0.8
r2 = (env.theta_threshold_radians - abs(theta)) / env.theta_threshold_radians - 0.5
reward = r1 + r2 # consider both locations and radians
RL.store_transition(state, action, reward, s

本文详细介绍了深度强化学习中的DQN算法,包括环境设置、经验回放、DNN网络实现,并进一步探讨了DoubleDQN和DuelingDQN的改进,旨在消除Q值过度估计问题并增强状态和动作的区分。通过调整超参数,比较三种算法在CartPole-v1环境中的性能表现。
最低0.47元/天 解锁文章
1330

被折叠的 条评论
为什么被折叠?



