Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning (Zhang, 2020) 运用了深度强化学习的方式去为JSP问题寻找PDR(priority dispatching rule),模型通用性好,小size训练出的模型在大size的数据集上表现效果好。文章开放了源码,地址:https://github.com/zcajiayin/L2D。今天主要基于源码来分析一下Dispatching中graph的embedding过程。
一. 输入数据格式
测试文件格式:每个npy文件里有10个例子。每个例子用一个二维list表示。List[0]表示运行时间。List[1]表示运行机器。
输入数据格式:变量名:Data Type: array list Data[0]: 运行时间集合 Data[1]: 运行机器集合
二. 文中的GNN公式

三.GraphCNN编码的输入输出
self.feature_extract = GraphCNN (num_layers=num_layers, # 3
num_mlp_layers=num_mlp_layers_feature_extract, # 2
input_dim=input_dim, # 2
hidden_dim=hidden_dim, # 64
learn_eps=learn_eps, # false
neighbor_pooling_type=neighbor_pooling_type, # sum
device=device).to(device)
表格 1 GRAPH CNN输入参数
| 变量名 | Value | Type | 说明 |
| num_layers | 3 | GraphCNN输入参数 | 网络层数 |
| num_mlp_layers | 2 | GraphCNN输入参数 | MLP层数 K |
| input_dim | 2 | GraphCNN输入参数 | 输入层维度,(BOOL, INT) |
| hidden_dim | 64 | GraphCNN输入参数 | 隐藏层 |
| learn_eps | False | GraphCNN输入参数 | 学习率 |
| neighbor_pooling_type | sum | GraphCNN输入参数 | aggregate neighbors (mean, average, or max) |
四.GNN网络结构
GraphCNN( 一共有三层,加上输入层
(mlps): ModuleList(
(0): MLP( # 第一层输入层
(linears): ModuleList(
(0): Linear(in_features=2, out_features=64, bias=True)
(1): Linear(in_features=64, out_features=64, bias=True)
)
(batch_norms): ModuleList( # Normalization
(0): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
# pytorch nn.BatchNorm1d 与手动python实现不一样--解决办法 - 简书
# https://blog.youkuaiyun.com/qq_23262411/article/details/100175943
)
)
(1): MLP( # 第二层
(linears): ModuleList(
(0): Linear(in_features=64, out_features=64, bias=True)
(1): Linear(in_features=64, out_features=64, bias=True)
)
(batch_norms): ModuleList(
(0): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(batch_norms): ModuleList( # Normalization
(0): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
五.GraphCNN前向传播
h_pooled, h_nodes = self.feature_extract(x=x,
graph_pool=graph_pool,
padded_nei=padded_nei,
adj=adj)
表格 2 GRAPH CNN前向传播参数
| 变量名 | Value | Type | 说明 |
| feature_extract | GraphCNN | GraphCNN类 | 存储了GraphCNN结构的信息 |
| graph_pool | tensor | GraphCNN前向输入参数 | 图中每一个点的概率 |
| padded_nei | None | GraphCNN前向输入参数 | neighbor_pooling_type为sum取None |
| adj | tensor | GraphCNN前向输入参数 | 邻接矩阵,点被执行完为1,未执行为0 |
| x | tensor | GraphCNN前向输入参数 | 存储了每一个节点的feature |
表格 3 GRAPH CNN输出值
| 变量名 | Value | Type | 说明 |
| pooled_h | tensor | GraphCNN输出 | |
| h_nodes | tensor | GraphCNN输出 | 每一个点的features |
通过GNN得到图的feature后(h_nodes, pooled_h),通过Actor-Critic得到Policy pi。
ActorCritic代码如下:
concateFea = torch.cat((candidate_feature, h_pooled_repeated), dim=-1)
candidate_scores = self.actor(concateFea)
# perform mask
mask_reshape = mask.reshape(candidate_scores.size())
candidate_scores[mask_reshape] = float('-inf')
pi = F.softmax(candidate_scores, dim=1) # 得到策略分布
v = self.critic(h_pooled)
return pi, v
四.Actor-critic网络结构
ActorCritic(
(feature_extract): GraphCNN(
(mlps): ModuleList(
(0): MLP(
(linears): ModuleList(
(0): Linear(in_features=2, out_features=64, bias=True)
(1): Linear(in_features=64, out_features=64, bias=True)
)
(batch_norms): ModuleList(
(0): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): MLP(
(linears): ModuleList(
(0): Linear(in_features=64, out_features=64, bias=True)
(1): Linear(in_features=64, out_features=64, bias=True)
)
(batch_norms): ModuleList(
(0): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(batch_norms): ModuleList(
(0): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(actor): MLPActor(
(linears): ModuleList(
(0): Linear(in_features=128, out_features=32, bias=True)
(1): Linear(in_features=32, out_features=1, bias=True)
)
)
(critic): MLPCritic(
(linears): ModuleList(
(0): Linear(in_features=64, out_features=32, bias=True)
(1): Linear(in_features=32, out_features=1, bias=True)
)
)
)
前向神经网络得到
h_pooled, h_nodes = self.feature_extract (x=x, graph_pool=graph_pool,padded_nei=padded_nei,adj=adj)
x = fea_tensor
fea_tensor = torch.from_numpy(np.copy(fea)).to(device)
adj, fea, candidate, mask = env.reset(data) # env.reset详细代码在JSSP_ENV中注释
hint:
start time 初始化时全都是-99。
Mask初始化时全都是false。
每一个candidate改成一个list,list对应每一个机器运行的时间。
随机数据第一版生成方式:
- Data【0】由1个机器扩成统一的5台机器。
- Data【1】,与Data【0】对应。从1-100随机生成时间。
本文介绍了如何利用深度强化学习中的GraphCNN和Actor-Critic模型解决工作车间调度(JSP)问题。文中详细阐述了数据格式、GNN公式、GraphCNN网络结构及前向传播过程,并展示了Actor-Critic网络结构,以求得最优调度策略。
6566

被折叠的 条评论
为什么被折叠?



