DQN网络代码调用env.reset()后存储观测信息报错问题ValueError: setting an array element with a sequence.解决处理

博客围绕强化学习DQN网络代码实现展开,针对模拟游戏学习任务,代码执行报错,提示数据维度不对应,怀疑是reset()函数返回的observation有问题,经打印观察,将observation指定为所需的array信息后,网络正常工作。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

在强化学习DQN网络代码实现过程中,针对gym环境中的LunarLander-v2模拟游戏进行学习任务。我的部分代码如下,首先是Agent的状态存储过程:

def __init__(self,gamma,epsilon,lr,input_dims,batch_size,n_actions,    
        # ....此前省略 ....
        self.state_memory = np.zeros((self.mem_size,*input_dims),dtype=np.float32)
        self.new_state_memory = np.zeros((self.mem_size,*input_dims),dtype=np.float32)
        
        self.action_memory = np.zeros(self.mem_size,dtype=np.int32)
        self.reward_memory = np.zeros(self.mem_size,dtype=np.float32)
        self.terminal_memory = np.zeros(self.mem_size,dtype=np.bool)
    def store_transitons(self,state,action,reward,state_,done):
        index = self.mem_cntr % self.mem_size
        self.state_memory[index] = state
        self.action_memory[index] = action
        self.new_state_memory[index] = state_
        self.terminal_memory[index] = done
        self.reward_memory[index] = reward

下面是main代码中的学习循环过程:

if __name__ == '__main__':
    env = gym.make('LunarLander-v2')
    agent = Agent(gamma= 0.99 ,epsilon=1.0 , batch_size=64, n_actions=4,
    eps_end= 0.01 ,input_dims=[8], lr=0.003)
    scores,eps_history = [],[]
    n_games = 10
    for i in range(n_games):
        score = 0
        done = False
        observation  = env.reset() 
        while not done:
            action = agent.choose_action(observation)
            observation_, reward, done, info, __ = env.step(action)
            score += reward
            agent.store_transitons(observation,action,reward,observation_,done)
            agent.learn()
            observation = observation_
        scores.append(score)
        eps_history.append(agent.epsilon)
        # .................#

执行上述代码后报错:

Traceback (most recent call last):
  File ".\main_Lunar_lander.py", line 21, in <module>
    agent.store_transitons(observation,action,reward,observation_,done)
  File "D:\College\Projects\Person_Research\DQN_From_Yotube\DQN.py", line 59, in store_transitons
    self.state_memory[index] = state
ValueError: setting an array element with a sequence. The requested array would exceed the maximum number of dimension of 1.

报错信息提示为数据维度不对应,也即最初通过observation = env.reset()得到的的变量observation类型与Agent存储时的np.float32类型不匹配,而gym官方文档reset()函数的描述为:在这里插入图片描述
因此怀疑返回的observation有问题,于是通过print('observation:',observation)打印观察返回的observation,显示如下:在这里插入图片描述
reset()函数返回的是一个array类型以及其中数据的type!
因此需要将observation指定为真正需要的array信息即可,observation,__ = env.reset()

    for i in range(n_games):
        score = 0
        done = False
        observation,__  = env.reset() 
        while not done:
            action = agent.choose_action(observation)
            observation_, reward, done, info, __ = env.step(action)
            score += reward
            agent.store_transitons(observation,action,reward,observation_,done)
            agent.learn()
            observation = observation_

此后再次运行,网络就可以正常工作了:
在这里插入图片描述

File "E:/ML/PythonFiles/ML/Double DQN.py", line 82, in update states = torch.FloatTensor(np.array(states)) ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (64,) + i 出现这个错误是是什么原因, 整体代码在下面 import torch import gym import torch.nn as nn import torch.optim as optim import numpy as np from collections import deque import random BTACH_SIZE = 64 GAMMA = 0.99 EPSILON_START = 1.0 EPSILON_END = 0.01 EPSILON_DECAY = 0.995 TARGET_UPDATE = 10 MEMORY_SIZE = 10000 LEARNING_RATE = 0.001 HIDDEN_SIZE = 64 UPDATE_FRE = 100 EPISODES = 500 class QNetwork(nn.Module): def __init__(self, state_size, action_size, hidden_size): super(QNetwork, self).__init__() self.fc1 = nn.Linear(state_size, hidden_size) self.fc2 = nn.Linear(hidden_size, hidden_size) self.fc3 = nn.Linear(hidden_size, action_size) def forward(self, x): x = torch.relu(self.fc1(x)) x = torch.relu(self.fc2(x)) return self.fc3(x) class ReplayBuffer: def __init__(self, capacity): self.buffer = deque(maxlen=capacity) def push(self, state, action, reward, next_state, done): self.buffer.append((state, action, reward, next_state, done)) def sample(self, bach_size): return random.sample(self.buffer, bach_size) def __len__(self): return len(self.buffer) class Agent: def __init__(self, env): self.env = env self.state_size = env.observation_space.shape[0] self.action_size = env.action_space.n self.eval_net = QNetwork(self.state_size, self.action_size, HIDDEN_SIZE) self.target_net = QNetwork(self.state_size, self.action_size, HIDDEN_SIZE) self.target_net.load_state_dict(self.eval_net.state_dict()) self.optimizer = optim.Adam(self.eval_net.parameters(), lr=LEARNING_RATE) self.buffer = ReplayBuffer(MEMORY_SIZE) self.batch_size = BTACH_SIZE
04-01
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值