
强化学习
gz153016
格局,品味
展开
-
pytorch 实现DDPG多好的代码
import torchimport torch.nn as nnimport torch.nn.functional as Fimport numpy as npimport gymimport time##################### hyper parameters ####################EPISODES = 200EP_STEPS = 200LR_ACTOR = 0.001LR_CRITIC = 0.002GAMMA = 0.9TAU = 0原创 2021-12-08 10:45:26 · 1474 阅读 · 1 评论 -
强化学习Double DQN (DDQN)
# -- coding: utf-8 --# 单用户import torchimport torch.nn as nnimport torch.nn.functional as Fimport numpy as npimport gym# 超参数BATCH_SIZE = 32LR = 0.01 # learning rateEPSILON = 0.9 # 最优选择动作百分比GAMMA = 0.9原创 2021-07-26 10:00:09 · 637 阅读 · 0 评论 -
DDPG代码实现
"""Deep Deterministic Policy Gradient (DDPG), Reinforcement Learning.DDPG is Actor Critic based algorithm.Pendulum example.View more on my tutorial page: https://morvanzhou.github.io/tutorials/Using:tensorflow 1.0gym 0.8.0"""######################原创 2021-04-05 22:27:55 · 1433 阅读 · 1 评论 -
强化学习DQN
import gymimport tensorflow as tfimport numpy as npimport randomfrom collections import deque# Hyper Parameters for DQNGAMMA = 0.9 # discount factor for target QINITIAL_EPSILON = 0.5 # starting value of epsilonFINAL_EPSILON = 0.01 # final value ..原创 2020-12-01 17:21:25 · 295 阅读 · 0 评论 -
强化学习AC框架
import gymimport tensorflow as tfimport numpy as npimport randomfrom collections import deque# Hyper ParametersGAMMA = 0.95 # discount factorLEARNING_RATE=0.01class Actor():# PI def __init__(self, env, sess): # init some parameters..原创 2020-12-01 17:20:24 · 3882 阅读 · 0 评论 -
强化学习之策略梯度
######################################################################## Copyright (C) ## 2016 - 2019 Pinard Liu(liujianping-ok@163.com) ## https://www.cnblogs.com/pinard .原创 2020-11-30 21:26:06 · 214 阅读 · 0 评论 -
元学习-maml-few-shot learning-代码实战
第一个文件:my_miniimagenet_train.pyimport osos.environ['CUDA_VISIBLE_DEVICES']='0'import torchfrom my_MiniImagenet import MiniImagenetimport numpy as npfrom my_meta import Metaimport argparsefrom torch.utils.data import DataLoaderdef main(): .原创 2020-11-21 15:45:24 · 2125 阅读 · 1 评论 -
强化学习之DQN
主要的组件:TD,Q_leaning,神经网络表示Q函数,贝尔曼方程。from __future__ import print_functionimport tensorflow as tfimport numpy as npimport cv2import syssys.path.append("game/")import game.wrapped_flappy_bird as gameimport random#游戏名GAME = 'flappy bird'ACTIONS = 2原创 2020-11-15 16:45:09 · 350 阅读 · 0 评论 -
强化学习之PPO
import osos.environ["CUDA_VISIBLE_DEVICES"] = "1"import tensorflow as tfimport numpy as npimport gymimport matplotlib.pyplot as pltRENDER = False#利用当前策略进行采样,产生数据class Sample(): def __init__(self,env, policy_net): self.env = env原创 2020-11-14 17:11:44 · 537 阅读 · 0 评论 -
pytorch Process finished with exit code 132 (interrupted by signal 4: SIGILL)
pytorch和运行代码版本不兼容的问题。从 pytorch1.1退到pytorch1.0就可以了。pytorch的安装,可以参考我之前的博客。原创 2020-10-05 15:20:33 · 4514 阅读 · 0 评论 -
强化学习之sarsa 和qlearning 实现
sarsa:from yuanyang_env_td import YuanYangEnvimport numpy as npimport randomclass TD_RL: def __init__(self, yuanyang): self.gamma = yuanyang.gamma self.yuanyang = yuanyang # 值函数的初始值 self.qvalue = np.zeros((len(self.y原创 2020-10-02 21:38:22 · 381 阅读 · 0 评论 -
蒙特卡罗-on policy
import pygameimport timeimport randomimport numpy as npimport matplotlib.pyplot as pltfrom yuanyang_env_mc import YuanYangEnvclass MC_RL: def __init__(self, yuanyang): #行为值函数的初始化 self.qvalue = np.zeros((len(yuanyang.states),len(y.原创 2020-10-02 10:23:14 · 304 阅读 · 0 评论 -
强化学习之动态规划
策略评估,策略改进import pygamefrom load import *import mathimport timeimport randomimport numpy as npclass YuanYangEnv: def __init__(self): self.states=[] for i in range(0,100): self.states.append(i) self.actions = [原创 2020-09-22 19:37:13 · 374 阅读 · 2 评论 -
马尔科夫人决策过程代码实现-鸳鸯系统
main.pyimport pygameimport numpy as npfrom load import *import randomclass YuanYangEnv: def __init__(self): self.states = []# 状态 for i in range(0,100): self.states.append(i) self.actions = ['e', 's', 'w', 'n']#原创 2020-09-20 15:59:18 · 553 阅读 · 0 评论 -
在多臂赌博机的实际环境下测试贪心策略,玻尔兹曼策略,UCB策略。
"""在多臂赌博机的实际环境下测试贪心策略,玻尔兹曼策略,UCB策略。第一步:要定义一些多臂赌博机系统的基本信息。第二步:训练 a,选动作,以怎么样的方式选策略。例如:贪心策略,玻尔兹曼策略,UCB策略。 b,将动作发给环境,环境给出即时奖励。第三步:做图,x:玩家玩游戏的总的次数,y:累加奖励。 i, sum_reward(i)"""# 第一步:要定义一些多臂赌博机系统的基本信息。import matplotl...原创 2020-09-18 22:13:55 · 1091 阅读 · 0 评论