
✅ 博主简介:擅长数据搜集与处理、建模仿真、程序设计、仿真代码、论文写作与指导,毕业论文、期刊论文经验交流。
✅ 具体问题可以私信或扫描文章底部二维码。
(1)本研究针对港口内双向航道船舶调度效率与安全问题,提出了一种基于强化学习的协同操纵方法。港口水域狭窄、船舶密度高,传统调度方法难以应对动态变化的交通环境。首先建立了港口航道网络模型,包含主航道、支航道、锚地区域和泊位区域,使用图结构表示航道拓扑关系。针对船舶交通特点,定义了调度问题的状态空间,包括船舶位置、航速、航向、目的地信息,以及航道交通密度、天气条件等环境因素。动作空间设计为离散决策集合,包括加速、减速、转向、等待等基本操纵指令,以及协同避碰的通信指令。考虑到COLREG规则和港口特殊规定,在动作选择中加入规则约束,确保调度决策符合航海规范。奖励函数采用多目标设计,包含效率奖励(鼓励船舶按时到达)、安全惩罚(避免碰撞和近距离接触)、规则奖励(遵守航行规则)和能耗惩罚(减少不必要的操纵)。
(2)强化学习算法的创新点体现在四个方面:首先是采用了集中式训练分布式执行的架构,训练时智能体可以获取全局状态信息,学习协同策略;执行时各船舶仅根据局部观测做出决策,提高系统可扩展性。针对船舶数量不固定的问题,使用了注意力机制处理可变长度输入,智能体能够关注周围关键船舶的信息,忽略无关船舶的影响。其次是集成了模型预测控制与强化学习,使用船舶运动模型预测短期轨迹,评估动作安全性,在危险动作执行前进行修正。第三是设计了课程学习策略,从简单场景(少量船舶、开阔水域)开始训练,逐步增加难度(更多船舶、复杂航道),提高训练效率和最终性能。最后是引入了通信机制,允许船舶间交换意图信息,基于此设计了协同奖励,鼓励船舶采取配合行动,如交替通行、编队航行等,提高整体交通效率。
(3)在仿真实验与验证部分,使用实际港口数据构建了仿真环境,包括上海洋山港和汉堡港的典型航道结构。设置了多种测试场景:低密度交通(5-10艘船舶)、中等密度交通(10-20艘船舶)和高密度交通(20-30艘船舶)。将强化学习调度方法与基于规则的调度、遗传算法优化和人工调度进行对比,性能指标包括平均通行时间、船舶延误率、冲突次数和系统吞吐量。结果表明,在中等密度场景中,强化学习方法将平均通行时间减少18%,冲突次数降低65%;在高密度场景中,优势更加明显,系统吞吐量提高32%。特别在双向航道交错区域,强化学习智能体学会了自发形成交通流,减少对遇局面,在无中央协调的情况下实现了高效自治调度。针对突发情况(如船舶故障、天气恶化),测试了算法的鲁棒性,通过引入应急策略层,系统能快速调整调度方案,确保安全。最后,使用三艘实船(科研船DENEB、小型船BELA和无人艇MESSIN)进行了实地测试,验证了仿真结果的可靠性,协同操纵成功率达到95%以上。
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import random
from collections import deque, namedtuple
import matplotlib.pyplot as plt
class AttentionLayer(nn.Module):
def __init__(self, hidden_dim):
super(AttentionLayer, self).__init__()
self.hidden_dim = hidden_dim
self.query = nn.Linear(hidden_dim, hidden_dim)
self.key = nn.Linear(hidden_dim, hidden_dim)
self.value = nn.Linear(hidden_dim, hidden_dim)
self.softmax = nn.Softmax(dim=-1)
def forward(self, x, mask=None):
Q = self.query(x)
K = self.key(x)
V = self.value(x)
attention_weights = torch.matmul(Q, K.transpose(-2, -1)) / (self.hidden_dim ** 0.5)
if mask is not None:
attention_weights = attention_weights.masked_fill(mask == 0, -1e9)
attention_weights = self.softmax(attention_weights)
output = torch.matmul(attention_weights, V)
return output, attention_weights
class ShipSchedulingRL(nn.Module):
def __init__(self, state_dim, action_dim, n_ships, hidden_dim=128):
super(ShipSchedulingRL, self).__init__()
self.state_dim = state_dim
self.action_dim = action_dim
self.n_ships = n_ships
self.hidden_dim = hidden_dim
self.encoder = nn.Linear(state_dim, hidden_dim)
self.attention = AttentionLayer(hidden_dim)
self.decoder = nn.Linear(hidden_dim, hidden_dim)
self.actor = nn.Linear(hidden_dim, action_dim)
self.critic = nn.Linear(hidden_dim, 1)
self.activation = nn.ReLU()
self.softmax = nn.Softmax(dim=-1)
def forward(self, states, masks=None):
batch_size = states.shape[0]
encoded = self.activation(self.encoder(states))
if masks is None:
masks = torch.ones(batch_size, self.n_ships, self.n_ships)
attended, attention_weights = self.attention(encoded, masks)
decoded = self.activation(self.decoder(attended))
action_logits = self.actor(decoded)
state_values = self.critic(decoded.mean(dim=1))
return action_logits, state_values, attention_weights
class CollaborativeSchedulingEnv:
def __init__(self, width=200, height=100, n_ships=5):
self.width = width
self.height = height
self.n_ships = n_ships
self.ships = []
self.waypoints = []
self.obstacles = []
self.max_steps = 200
self.current_step = 0
def reset(self):
self.ships = []
self.waypoints = []
self.obstacles = []
self.current_step = 0
main_channel_y = self.height / 2
channel_width = 20
for i in range(self.n_ships):
if i % 2 == 0:
start_x = 10
end_x = self.width - 10
lane = main_channel_y - channel_width / 4
else:
start_x = self.width - 10
end_x = 10
lane = main_channel_y + channel_width / 4
speed = np.random.uniform(1.0, 3.0)
self.ships.append({
'id': i,
'x': start_x,
'y': lane,
'speed': speed,
'heading': 0 if i % 2 == 0 else np.pi,
'destination_x': end_x,
'destination_y': lane,
'completed': False,
'progress': 0
})
for i in range(3):
self.obstacles.append({
'x': np.random.uniform(self.width * 0.3, self.width * 0.7),
'y': np.random.uniform(main_channel_y - channel_width/2, main_channel_y + channel_width/2),
'radius': np.random.uniform(5, 10)
})
return self.get_state()
def get_state(self):
state = []
masks = np.ones((self.n_ships, self.n_ships))
for i, ship in enumerate(self.ships):
ship_state = [
ship['x'] / self.width,
ship['y'] / self.height,
ship['speed'] / 3.0,
np.sin(ship['heading']),
np.cos(ship['heading']),
ship['destination_x'] / self.width,
ship['destination_y'] / self.height,
1.0 if not ship['completed'] else 0.0
]
for obstacle in self.obstacles:
dx = ship['x'] - obstacle['x']
dy = ship['y'] - obstacle['y']
distance = np.sqrt(dx*dx + dy*dy)
ship_state.append(distance / 50.0)
ship_state.append(dx / 50.0)
ship_state.append(dy / 50.0)
for j, other_ship in enumerate(self.ships):
if i == j:
continue
dx = ship['x'] - other_ship['x']
dy = ship['y'] - other_ship['y']
distance = np.sqrt(dx*dx + dy*dy)
if distance > 50:
masks[i, j] = 0
else:
ship_state.append(dx / 50.0)
ship_state.append(dy / 50.0)
ship_state.append(other_ship['speed'] / 3.0)
ship_state.append(np.sin(other_ship['heading']))
ship_state.append(np.cos(other_ship['heading']))
while len(ship_state) < 50:
ship_state.append(0.0)
state.append(ship_state[:50])
return np.array(state), masks
def step(self, actions):
rewards = np.zeros(self.n_ships)
dones = np.zeros(self.n_ships)
for i, ship in enumerate(self.ships):
if ship['completed']:
continue
action = actions[i]
if action == 0:
ship['speed'] = min(ship['speed'] + 0.1, 3.0)
elif action == 1:
ship['speed'] = max(ship['speed'] - 0.1, 0.5)
elif action == 2:
ship['heading'] += 0.1
elif action == 3:
ship['heading'] -= 0.1
elif action == 4:
pass
dx = np.cos(ship['heading']) * ship['speed']
dy = np.sin(ship['heading']) * ship['speed']
new_x = ship['x'] + dx
new_y = ship['y'] + dy
if self.is_collision(i, new_x, new_y):
rewards[i] -= 10
new_x = ship['x']
new_y = ship['y']
else:
ship['x'] = new_x
ship['y'] = new_y
progress = abs(ship['x'] - ship['destination_x']) / self.width
rewards[i] += (ship['progress'] - progress) * 10
ship['progress'] = progress
if abs(ship['x'] - ship['destination_x']) < 5 and abs(ship['y'] - ship['destination_y']) < 5:
rewards[i] += 20
ship['completed'] = True
dones[i] = 1
for j, other_ship in enumerate(self.ships):
if i != j and not other_ship['completed']:
dx = ship['x'] - other_ship['x']
dy = ship['y'] - other_ship['y']
distance = np.sqrt(dx*dx + dy*dy)
if distance < 10:
rewards[i] -= 5
rewards[j] -= 5
if ship['x'] < 0 or ship['x'] > self.width or ship['y'] < 0 or ship['y'] > self.height:
rewards[i] -= 5
ship['x'] = np.clip(ship['x'], 0, self.width)
ship['y'] = np.clip(ship['y'], 0, self.height)
self.current_step += 1
if self.current_step >= self.max_steps:
dones = np.ones(self.n_ships)
return self.get_state(), rewards, dones
def is_collision(self, ship_id, x, y):
for i, obstacle in enumerate(self.obstacles):
dx = x - obstacle['x']
dy = y - obstacle['y']
distance = np.sqrt(dx*dx + dy*dy)
if distance < obstacle['radius'] + 2:
return True
for i, ship in enumerate(self.ships):
if i == ship_id or ship['completed']:
continue
dx = x - ship['x']
dy = y - ship['y']
distance = np.sqrt(dx*dx + dy*dy)
if distance < 5:
return True
return False
class MultiAgentPPO:
def __init__(self, state_dim, action_dim, n_ships, lr=3e-4, gamma=0.99, clip_epsilon=0.2):
self.state_dim = state_dim
self.action_dim = action_dim
self.n_ships = n_ships
self.gamma = gamma
self.clip_epsilon = clip_epsilon
self.policy = ShipSchedulingRL(state_dim, action_dim, n_ships)
self.optimizer = optim.Adam(self.policy.parameters(), lr=lr)
self.memory = deque(maxlen=10000)
def select_action(self, states, masks, evaluate=False):
states_tensor = torch.FloatTensor(states).unsqueeze(0)
masks_tensor = torch.FloatTensor(masks).unsqueeze(0)
action_logits, state_values, attention_weights = self.policy(states_tensor, masks_tensor)
actions = []
log_probs = []
for i in range(self.n_ships):
if evaluate:
action = torch.argmax(action_logits[0, i])
else:
dist = torch.distributions.Categorical(logits=action_logits[0, i])
action = dist.sample()
log_prob = dist.log_prob(action)
log_probs.append(log_prob)
actions.append(action.item())
if evaluate:
return actions, state_values, attention_weights
else:
return actions, log_probs, state_values, attention_weights
def store_transition(self, states, masks, actions, log_probs, rewards, next_states, next_masks, dones):
self.memory.append((states, masks, actions, log_probs, rewards, next_states, next_masks, dones))
def train(self, batch_size=256):
if len(self.memory) < batch_size:
return
batch = random.sample(self.memory, batch_size)
states, masks, actions, old_log_probs, rewards, next_states, next_masks, dones = zip(*batch)
states = torch.FloatTensor(states)
masks = torch.FloatTensor(masks)
actions = torch.LongTensor(actions)
old_log_probs = torch.FloatTensor(old_log_probs)
rewards = torch.FloatTensor(rewards)
next_states = torch.FloatTensor(next_states)
next_masks = torch.FloatTensor(next_masks)
dones = torch.FloatTensor(dones)
_, next_state_values, _ = self.policy(next_states, next_masks)
target_values = rewards + (1 - dones) * self.gamma * next_state_values.squeeze()
action_logits, state_values, _ = self.policy(states, masks)
dist = torch.distributions.Categorical(logits=action_logits)
new_log_probs = dist.log_prob(actions)
ratio = torch.exp(new_log_probs - old_log_probs)
advantages = target_values - state_values.squeeze().detach()
surrogate1 = ratio * advantages
surrogate2 = torch.clamp(ratio, 1 - self.clip_epsilon, 1 + self.clip_epsilon) * advantages
actor_loss = -torch.min(surrogate1, surrogate2).mean()
critic_loss = F.mse_loss(state_values.squeeze(), target_values)
total_loss = actor_loss + 0.5 * critic_loss
self.optimizer.zero_grad()
total_loss.backward()
self.optimizer.step()
return total_loss.item()
def train_ship_scheduling():
n_ships = 5
state_dim = 50
action_dim = 5
env = CollaborativeSchedulingEnv(n_ships=n_ships)
agent = MultiAgentPPO(state_dim, action_dim, n_ships)
episodes = 1000
episode_returns = []
for episode in range(episodes):
states, masks = env.reset()
episode_return = 0
done = False
while not done:
actions, log_probs, state_values, attention_weights = agent.select_action(states, masks)
next_states, next_masks, rewards, dones = env.step(actions)
agent.store_transition(states, masks, actions, log_probs, rewards, next_states, next_masks, dones)
loss = agent.train()
states, masks = next_states, next_masks
episode_return += np.sum(rewards)
if np.all(dones):
done = True
episode_returns.append(episode_return)
if episode % 50 == 0:
print(f"Episode {episode}, Return: {episode_return:.2f}, Loss: {loss if loss else 0:.4f}")
return episode_returns
returns = train_ship_scheduling()
plt.plot(returns)
plt.title('Ship Scheduling Training Returns')
plt.xlabel('Episode')
plt.ylabel('Total Return')
plt.show()

如有问题,可以直接沟通
👇👇👇👇👇👇👇👇👇👇👇👇👇👇👇👇👇👇👇👇👇👇
1477

被折叠的 条评论
为什么被折叠?



