PG-REINFORCE tensorflow 2.0

炸机狂魔

已于 2022-05-02 12:30:38 修改

阅读量2.6k

点赞数

分类专栏：记录学习文章标签： python 人工智能

于 2022-04-28 14:33:09 首次发布

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.youkuaiyun.com/qq_42412225/article/details/124473240

版权

REINFORCE 算法实现

REINFORCE算法是策略梯度算法最原始的实现算法，这里采用tensorflow2.0进行实现

import tensorflow as tf
import gym
from matplotlib import pyplot as plt
import numpy as np


def PGReinforce_run(PGReinforce_agent=None, episode=1000):
    PGReinforce_agent = PGReinforce_agent.PGReinforce(n_actions=2, n_features=4)
    PGReinforce_agent.net_init()
    score = []
    env = gym.make('CartPole-v1')
    bias = 5
    for i_episode in range(episode):
        # 初始化，
        observation = env.reset()
        done = False
        t = 0
        while not done:
            env.render()
            action = PGReinforce_agent.choose_action(observation)
            PGReinforce_agent.traj_store(observation, action)
            observation_, reward, done, info = env.step(action)
            x, x_dot, theta, theta_dot = observation
            r2 = - abs(theta)*5
            # r1 = - abs(x)
            PGReinforce_agent.r_calculate(reward + r2)
            observation = observation_
            t += 1
        # PGReinforce_agent.loss_calculate()
        print("Episode finished after {} time steps".format(t + 1))
        score.append(t + 1)
        PGReinforce_agent.learn(5)
        if (i_episode + 1) % 100 == 0:
            plt.plot(score)  # 绘制波形
            # plt.draw()
            plt.savefig(f"RL_algorithm_package/img/pic_

最低0.47元/天解锁文章

博客等级

码龄7年

16
原创

30
点赞

237
收藏

32
粉丝

关注

私信

热门文章

分类专栏

记录学习 15篇
TeamBots

展开全部收起

最新评论

Learning High-Speed Flight in the Wild 环境安装
cheng_zi_lai_le: ERROR: cannot launch node of type [flightrender/flightmare.x86_64]: flightrender
Learning High-Speed Flight in the Wild 环境安装
fjlv: 您解决这个问题了吗
Learning High-Speed Flight in the Wild 环境安装
weixin_45724950: 老哥拟解决这个问题了嘛
学习笔记（二）Learning High-Speed Flight in the Wild 修改策略
weixin_48094897: 请问为什么在运行python test_trajectories.py --settings_file=config/test_settings.yaml的时候，会报错：Two checkpoint references resolved to different objects (<tensorflow.python.keras.layers.convolutional.Conv2D object at 0x7f7bd03b52d0> and <tensorflow.python.keras.layers.normalization_v2.BatchNormalization object at 0x7f7bd03be950>). WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. Either the Trackable object references in the Python program have changed in an incompatible way, or the checkpoint was generated in an incompatible program.谢谢您
Learning High-Speed Flight in the Wild 环境安装
Apricity_Li: 以解决，是ROS和anaconda环境冲突导致的，需要使用conda进行安装empy

最新文章

目录

展开全部

收起

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。