Pytorch实现强化学习DQN玩迷宫游戏(莫凡强化学习DQN章节pytorch版本)

本文介绍了使用Pytorch实现强化学习DQN算法,并应用于解决迷宫游戏。参照莫凡老师的教程,提供了完整的代码资源,读者可以直接运行DQN_new.py进行体验。对于不熟悉的部分,建议查阅相关资料进行学习。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1.详细的资料可以参考莫凡老师的网页

2.用pytorch实现DQN并用于玩maze

# -*- coding: utf-8 -*-


import math
import random
import matplotlib.pyplot as plt
from collections import namedtuple, deque
from itertools import count
import numpy as np

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

from maze_env import Maze

random.seed(1)
torch.manual_seed(1)
np.random.seed(1)
BATCH_SIZE = 128  # BATCH_SIZE is the number of transitions sampled from the replay buffer
GAMMA = 0.9     # GAMMA is the discount factor as mentioned in the previous section
EPS_START = 0.9   # EPS_START is the starting value of epsilon
EPS_END = 0.05    # EPS_END is the final value of epsilon
EPS_DECAY = 1000  # EPS_DECAY controls the rate of exponential decay of epsilon, higher means a slower decay
TAU = 0.005   # TAU is the update rate of the target network
LR = 1e-4    # LR is the learning rate of the AdamW optimizer
env= Maze()
# Get number of actions from gym action space
n_actions = env.n_actions
# Get the number of state observations
state = env.reset()
n_observations = len(state)
steps_done = 0
episode_durations = []

# if gpu is to be used
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Transition = namedtuple('Transition',
                        ('state', 'action', 'next_state', 'reward'))


class ReplayMemory(object):

    def __init__(self, capacity):
        self.memory = deque([], maxlen=capacity)

    def push(self, *args):
        """Save a transition"""
        self.memory.append(Transition(*args))

    def sample(self, batch_size):
        return random.sample(self.memory<
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值