DQN网络代码调用env.reset()后存储观测信息报错问题ValueError: setting an array element with a sequence.解决处理

博客围绕强化学习DQN网络代码实现展开,针对模拟游戏学习任务,代码执行报错,提示数据维度不对应,怀疑是reset()函数返回的observation有问题,经打印观察,将observation指定为所需的array信息后,网络正常工作。

在强化学习DQN网络代码实现过程中,针对gym环境中的LunarLander-v2模拟游戏进行学习任务。我的部分代码如下,首先是Agent的状态存储过程:

def __init__(self,gamma,epsilon,lr,input_dims,batch_size,n_actions,    
        # ....此前省略 ....
        self.state_memory = np.zeros((self.mem_size,*input_dims),dtype=np.float32)
        self.new_state_memory = np.zeros((self.mem_size,*input_dims),dtype=np.float32)
        
        self.action_memory = np.zeros(self.mem_size,dtype=np.int32)
        self.reward_memory = np.zeros(self.mem_s
File "E:/ML/PythonFiles/ML/Double DQN.py", line 82, in update states = torch.FloatTensor(np.array(states)) ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (64,) + i 出现这个错误是是什么原因, 整体代码在下面 import torch import gym import torch.nn as nn import torch.optim as optim import numpy as np from collections import deque import random BTACH_SIZE = 64 GAMMA = 0.99 EPSILON_START = 1.0 EPSILON_END = 0.01 EPSILON_DECAY = 0.995 TARGET_UPDATE = 10 MEMORY_SIZE = 10000 LEARNING_RATE = 0.001 HIDDEN_SIZE = 64 UPDATE_FRE = 100 EPISODES = 500 class QNetwork(nn.Module): def __init__(self, state_size, action_size, hidden_size): super(QNetwork, self).__init__() self.fc1 = nn.Linear(state_size, hidden_size) self.fc2 = nn.Linear(hidden_size, hidden_size) self.fc3 = nn.Linear(hidden_size, action_size) def forward(self, x): x = torch.relu(self.fc1(x)) x = torch.relu(self.fc2(x)) return self.fc3(x) class ReplayBuffer: def __init__(self, capacity): self.buffer = deque(maxlen=capacity) def push(self, state, action, reward, next_state, done): self.buffer.append((state, action, reward, next_state, done)) def sample(self, bach_size): return random.sample(self.buffer, bach_size) def __len__(self): return len(self.buffer) class Agent: def __init__(self, env): self.env = env self.state_size = env.observation_space.shape[0] self.action_size = env.action_space.n self.eval_net = QNetwork(self.state_size, self.action_size, HIDDEN_SIZE) self.target_net = QNetwork(self.state_size, self.action_size, HIDDEN_SIZE) self.target_net.load_state_dict(self.eval_net.state_dict()) self.optimizer = optim.Adam(self.eval_net.parameters(), lr=LEARNING_RATE) self.buffer = ReplayBuffer(MEMORY_SIZE) self.batch_size = BTACH_SIZE
04-01
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值