智能交通控制中的路由算法研究
1. QMIX辅助路由算法
1.1 QMIX算法伪代码
QMIX是一种用于基于社交的延迟容忍网络(DTN)的合作多智能体强化学习(MARL)算法,其伪代码如下:
Algorithm 4.2 Cooperative MARL algorithm for social-based DTN
1: Initialize replay buffer
2: for episode = 1, M do
3:
Initialize network environment
4:
for step = 1, T do
5:
Each agent obtains its observation o
6:
Execute the action to get each new observation o′ and reward r of each
agent
7:
Store (o, a, r, o′) to replay buffer
8:
end for
9:
for agent t = 1, N do
10:
Randomly extract a batch from replay buffer
11:
Calculate Qi(τi, ai; θi) and maxai′ ¯Qi(τi′, a′
i; θi′) by DRQN
12:
end for
13:
Input all Qi(τ, a,
超级会员免费看
订阅专栏 解锁全文
3235

被折叠的 条评论
为什么被折叠?



