simple简介:
simple是multi particle envs(mpe)中最简单的一个环境,旨在测试算法和熟悉环境,我在mpe中使用DDPG算法完成了单智能体的navigation的功能。
DDPG算法是基于AC算法的改进版本,加入了target网络保证收敛,同时可以输出连续动作,具体不再赘述,不懂可以去看莫烦老师的强化学习教程
下面看代码
# -*- coding: utf-8 -*-
"""
Created on Tue Feb 26 09:17:43 2019
@author: Jack Lee
"""
from make_env import make_env
import tensorflow as tf
import numpy as np
import os
import shutil
np.random.seed(1)
tf.set_random_seed(1)
MAX_EPISODES = 600
MAX_EP_STEPS = 200
LR_A = 1e-3 # learning rate for actor
LR_C = 1e-3 # learning rate for critic
GAMMA = 0.9 # reward discount
REPLACE_ITER_A = 1100
REPLACE_ITER_C = 1000
MEMORY_CAPACITY = 5000
BATCH_SIZE = 16
VAR_MIN = 0.1
RENDER = True
LOAD = False
MODE = ['easy', 'hard']
n_model = 1
env = make_env('simple')
STATE_DIM = 4
ACTION_DIM = 2
ACTION_BOUND = [-0.2, 0.2]
with tf.name_scope('S'):
S = tf.placeholder(tf.float32, shape=[None, STATE_DIM], name='s')
with tf.name_scope('R'):
R = tf.placeholder(tf.float32, [None, 1], name='r')
with tf.name_scope('S_'):
S_ = tf.placeholder(tf.float32, shape=[None, STATE_DIM], name='s_')
def Act225(a):
a = a[np.newaxis, :]
a = [[0, a[0][0], 0, a[0][1], 0]]
return a
def Act522(a):
a = [i for i in a[0] if i is not 0]
return [a]
class Actor(object):
def __init__(self, sess, action_dim, action_bound, learning_rate, t_replace_iter):
self.sess = sess
self.a_dim = action_dim
self.action_bound = action_bound
self.lr = learning_rate
self.t_replace_iter = t_replace_iter
self.t_replace_counter = 0
with tf.variable_scope('Actor'):
# input s, output a
self.a = self._build_net(S, scope='eval_net', trainable=True)
# input s_, output a, get a_ for critic
self.a_ = self._build_net(S_, scope='target_net', trainable=False)
self.e_params = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='Actor/eval_net')

本文介绍了如何使用DDPG算法在multi particle envs(MPE)的simple环境中实现单智能体的navigation任务。文章指出,DDPG是AC算法的改进版,结合了目标网络以确保收敛,并能输出连续动作。代码参考了莫烦的强化学习教程,作者认为对算法的深入理解和环境的熟悉是解决问题的关键。
最低0.47元/天 解锁文章
449





