A3C——pytorch

最新推荐文章于 2025-03-18 18:22:13 发布

原创

最新推荐文章于 2025-03-18 18:22:13 发布 · 2.7k 阅读

19 ·

CC 4.0 BY-SA版权

文章标签：

#强化学习

这一部分的代码都来自莫凡，由于所看的书《白话强化学习》里面的代码块有一部分看不懂，转而看其他老师的代码，感觉莫老师的代码通俗易懂，但是语法可能和书上所学的有所不一样，所以还是读了这个代码，作了点注释，以供之后翻阅
main.py

"""
Reinforcement Learning (A3C) using Pytroch + multiprocessing.
The most simple implementation for continuous action.

View more on my Chinese tutorial page [莫烦Python](https://morvanzhou.github.io/).
"""

import torch
import torch.nn as nn
from utils import v_wrap, set_init, push_and_pull, record
import torch.nn.functional as F
import torch.multiprocessing as mp
from shared_adam import SharedAdam
import gym
import math, os
os.environ["OMP_NUM_THREADS"] = "1"

UPDATE_GLOBAL_ITER = 5
GAMMA = 0.9
MAX_EP = 3000
MAX_EP_STEP = 200

env = gym.make('Pendulum-v0')
N_S = env.observation_space.shape[0]
# 环境观测空间
N_A = env.action_space.shape[0]
# 环境动作空间


class Net(nn.Module):
    def __init__(self, s_dim, a_dim):
        super(Net, self).__init__()
        self.s_dim = s_dim
        self.a_dim = a_dim
        self.a1 = nn.Linear(s_dim, 200)
        self.mu = nn.Linear(200, a_dim)
        self.sigma = nn.Linear(200, a_dim)
        self.c1 = nn.Linear(s_dim, 100)
        self.v = nn.Linear(100, 1)
        set_init([self.a1, self.mu, self.sigma, self.c1, self.v])
        # 将参数初始化
        self.distribution = torch.distributions.Normal

    def forward(self, x):
        a1 = F.relu6(self.a1(x))
        mu = 2 * F.tanh(

最低0.47元/天解锁文章