莫烦老师,Policy Gradient代码学习笔记

本文是根据莫烦老师的教程,深入学习Policy Gradient强化学习算法的笔记。通过阅读相关论文和研究源代码,了解如何使用Tensorflow实现Policy Gradient,并在CartPole环境中进行应用。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

详情请见莫烦老师主页:https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/5-1-A-PG/

论文见:https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf

源代码见:https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/tree/master/contents/7_Policy_gradient_softmax

RL_brain.py

import numpy as np
import tensorflow as tf

# reproducible
np.random.seed(1)
tf.set_random_seed(1)


class PolicyGradient:
    def __init__(                                                             #初始化
            self,
            n_actions,
            n_features,
            learning_rate=0.01,
            reward_decay=0.95,
            output_graph=False,
    ):
        self.n_actions = n_actions
        self.n_features = n_features
        self.lr = learning_rate     #反向训练用到
        self.gamma = reward_decay

        self.ep_obs, self.ep_as, self.ep_rs = [], [], []#分别用于存储当前回合的状态,动作,奖励值

        self._build_net()

        self.sess = tf.Session()

        if output_graph:
            # $ tensorboard --logdir=logs
            # http://0.0.0.0:6006/
            # tf.train.SummaryWriter soon be deprecated, use following
            tf.summary.FileWriter("logs/", self.sess.graph)

        self.sess.run(tf.global_variables_initializer())

    def _build_net(self):                                                           # 建立 policy gradient 神经网络 
        with tf.name_scope('inputs'):
            self.tf_obs = tf.placeholder(tf.float32, [None, self.n_features], name="observations")
            self.tf_acts = tf.placeholder(tf.int32, [None, ], name="actions_num")
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值