基于强化学习的gym Mountain Car稳定控制
依赖包版本
gym == 0.21.0
stable-baselines3 == 1.6.2
环境测试
环境介绍:Mountain Car
import gym
# Create environment
env = gym.make("MountainCar-v0")
eposides = 10
for eq in range(eposides):
obs = env.reset()
done = False
rewards = 0
while not done:
action = env.action_space.sample()
obs, reward, done, info = env.step(action)
env.render()
rewards += reward
print(rewards)
环境测试视频:Mountain Car test
Q-learning 模型
模型训练
import gym
import numpy as np
env = gym.make("MountainCar-v0")
# Q-Learning settings
LEARNING_RATE = 0.1
DISCOUNT = 0.95
EPISODES = 25000
SHOW_EVERY = 1000
# Exploration settings
epsilon = 1 # not a constant, qoing to be decayed
START_EPSILON_DECAYING = 1
END_EPSILON_DECAYING = EPISODES//2
epsilon_decay_value = epsilon/(END_EPSILON_DECAYING - START_EPSILON_DECAYING)
DISCRETE_OS_SIZE = [20, 20]
discrete_os_win_size = (env.observation_space.high - env.observation_space.low) / DISCRETE_OS_SIZE
print(discrete_os_win_size)

最低0.47元/天 解锁文章
1028

被折叠的 条评论
为什么被折叠?



