图神经网络
在本文中,我们将图神经网络划分为五大类别,分别是:图卷积网络(Graph Convolution Networks, GCN)、图注意力网络(Graph Attention Networks)、图自编码器( Graph Autoencoders)、图生成网络( Graph Generative Networks)和图时空网络(Graph Spatial-temporal Networks)
3.图自动编码器(Graph Autoencoders)
图自动编码器是一类图嵌入方法,其目的是利用神经网络结构将图的顶点表示为低维向量。典型的解决方案是利用多层感知机作为编码器来获取节点嵌入,其中解码器重建节点的邻域统计信息。
RGCN论文阅读实践
论文主题
Relational Graph Convo-lutional Networks (R-GCNs) 适用于链路预测(对于缺失事实的恢复,主谓宾三元组)或者用于节点分类问题(对于节点缺失属性的填充),R-GCN适用于对于高维多关系的数据结构,其通过因子分解模型作为编码模型相对于只包含解码器的baseline模型有良好提升。即针对于P2P平台间关系(基于不同属性形成的多元路径)的多样性,能够良好实现节点预测。
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-W92pMTPC-1667047514179)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028113150992.png)]
GCN,GAT,GraphSAGE等同构图原理及实现
GCN
H l + 1 = σ ( D − 1 / 2 A D 1 / 2 H ( l ) W ( l ) ) H^{l+1}=\sigma(D^{-1/2}AD^{1/2}H^{(l)}W^{(l)}) Hl+1=σ(D−1/2AD1/2H(l)W(l))
其中H表示第l层的节点,D表示度矩阵,A表示邻接矩阵,其过程同CNN卷积过程相似,是一个加权求和的过程,利用邻居点通过度矩阵及其临阶矩阵,计算各边权重,随后加权求和。
主要缺点:1.融合时边权值是固定的,不够灵活。2.可扩展性差,因为它是全图卷积融合,全图做梯度更新,当图比较大时,这样的方式就太慢了,不合适。3.层数加深时,结果会极容易平滑,每个点的特征结果都十分相似。
GAT就来解决问题1的,GraphSAGE就来解决这个问题2的,DeepGCN等一系列文章就来讨论问题3的。
GAT
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-rlgmXufY-1667047514181)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028133842750.png)]
其中hi,hj,hk表示为node_feather,其中ai,j表示为第i与第j点之间的的attention系数。
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-P6XHLkff-1667047514181)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028135046955.png)]
之后,第i个点的融合attention过后的node feature可以表示下面这个公式,实质上还是一个加权的feature求和过程,只是每次融合中的权重系数是随模型训练学习的,最后在经过一个非线性的激活函数去做对应的任务。
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-miVaOePH-1667047514182)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028135229933.png)]
为了使得attention机制更具有扩展性,定义了multi-head attention机制,其中k表示K个attention head,不同聚合方式对于多头注意力机制上存在差异。GAT中的attention机制还是很直观的,通过给每条边加了一个模型可学习的系数ai,j,并基于attention系数进行node feather的融合,依据任务调整模型参数,能够使得自适应参数效果更好。
GraphSAGE
- transductive是说要预测的数据在训练时模型也能看到。进一步解释一下,就是说训练前,构建的图结构已经是固定的,你要预测的点或边关系结构都应该已经在这个图里了,训练跟预测时的图结构都是一样的。
- inductive是说要预测的数据在训练时模型不用看到,也就是我们平常做算法模型的样子,训练预测时的数据是分开的,也就是上面说的可以图结构不是固定的,加入新的节点。
GraphSAGE提出随机采子图的方式去采样,通过子图更新node embedding, 这样采出的子图结构本身就是变化,从而让模型学到的是一种采样及聚合的参数的方式,有效解决了unseen nodes问题,同时避免了训练中需要把整个图的node embedding一起更新的窘境,有效的增加了扩展性。
- 采子图:训练过程中,对于每一个节点采用切子图的方法,随机sample出部分的邻居点,作为聚合的feature点。如上图最左边,对于中心点,采两度,同时sample出部分邻居节点组成训练中的子图。
- 聚合:采出子图后,做feature聚合。这里与GCN的方式是一致的,从最外层往里聚合,从而聚合得到中心点的node embedding。聚合这里可操作的地方很多,比如你可以修改聚合函数(一般是用的mean、sum,pooling等),或增加边权值都是可以的。
- 任务预测:得到node embedding后,就可以接下游任务了,比如做node classification,node embedding后接一个linear层+softmax做分类即可。
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-nCwmRApu-1667047514182)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028140919952.png)]
GraphSAGE主要解决了两个问题:1.解决了预测中unseen nodes的问题,原来的GCN训练时,需要看到所有nodes的图数据。2.解决了图规模较大,全图进行梯度更新,内存消耗大,计算慢的问题
RGCN
RGCN应该说是GCN在多关系图场景上的一个简单尝试。***从同构图到异构图,RGCN要解决的核心问题就一个,就是多关系间怎么交互。***RGCN通过使用一个通用的GNN模型,通过计算多元性质边节点的编码形式(不同的下游处理)去计算实体的embedding。
RGCN所描述的在每一种关系下,指向内与指向外的都能够作为其邻居点,同时加入自循环的特征,进行特征融合用于更新中心节点。
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-CIVGy9eJ-1667047514182)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028142351497.png)]
h
i
l
+
1
=
σ
(
∑
r
∈
R
∑
j
∈
N
i
r
W
r
(
l
)
h
j
l
/
(
c
i
,
r
)
+
W
0
(
l
)
h
i
l
)
h_i^{l+1}=\sigma(\sum_{r\in\mathcal{R}}\sum_{j\in\mathcal{N}_i^r}W^{(l)}_rh^{l}_j/(c_{i,r})+W^{(l)}_0h^{l}_i)
hil+1=σ(r∈R∑j∈Nir∑Wr(l)hjl/(ci,r)+W0(l)hil)
其中双层循环遍历,遍历每一种关系下,叠加每一个点的邻居点的特征进行融合,最后加上一层的中心节点特征,经过一个激活函数输出作为中心节点的输出特征,其中W为维度转换矩阵,也就是模型参数。其中R表示关系结合,N表示邻居节点,ci,r表示针对于问题特定的乘子。相较于GCN中采用度矩阵与领接矩阵作为加权求和的特征融合,而RGCN更多是在模型过程中自行学习。
同时,其中因多关系在参数上的提升,即给定了两个W矩阵规则化定义:
-
Bias-decomposition(共享转换矩阵参数),
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-H3QvzOX6-1667047514182)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028155239764.png)]
-
Block-diagonal-decomposition(权重矩阵W由基础小矩阵拼接得到,保证W的稀疏性)
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-qOpBpW0u-1667047514183)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028155248495.png)]
(1)中B表示分解block个数(常量), Vb为分解的参数关系矩阵,与因子 arb组成一对相关系数,都与关系类型r相关,这里不同关系是共享 Vb
(2)中表示为一系列的低维矩阵求和
在通过RGCN作为encode,聚合得到node embedding后,节点分类任务比较好理解,就是拿着encode得到的node embedding后接一个逻辑回归或linear层,结合Cross-Entropy做一个分类任务。
而关系预测任务,在encode得到node embedding后,则是类似于TransE,计算三元组(s,r,o),得到一个score。(文中采用的是DistMult,原理差不多)
import torch
import torch.nn as nn
import torch.nn.functional as F
from dgl import DGLGraph
import dgl.function as fn
from functools import partial
class RGCNLayer(nn.Module):
def __init__(self, in_feat, out_feat, num_rels, num_bases=-1, bias=None,
activation=None, is_input_layer=False):
super(RGCNLayer, self).__init__()
self.in_feat = in_feat
self.out_feat = out_feat
self.num_rels = num_rels
self.num_bases = num_bases
self.bias = bias
self.activation = activation
self.is_input_layer = is_input_layer
# sanity check
if self.num_bases <= 0 or self.num_bases > self.num_rels:
self.num_bases = self.num_rels
# weight bases in equation (3)
self.weight = nn.Parameter(torch.Tensor(self.num_bases, self.in_feat,
self.out_feat))
if self.num_bases < self.num_rels:
# linear combination coefficients in equation (3)
self.w_comp = nn.Parameter(torch.Tensor(self.num_rels, self.num_bases))
# add bias
if self.bias:
self.bias = nn.Parameter(torch.Tensor(out_feat))
# init trainable parameters
nn.init.xavier_uniform_(self.weight,
gain=nn.init.calculate_gain('relu'))
if self.num_bases < self.num_rels:
nn.init.xavier_uniform_(self.w_comp,
gain=nn.init.calculate_gain('relu'))
if self.bias:
nn.init.xavier_uniform_(self.bias,
gain=nn.init.calculate_gain('relu'))
def forward(self, g):
if self.num_bases < self.num_rels:
# generate all weights from bases (equation (3))
weight = self.weight.view(self.in_feat, self.num_bases, self.out_feat)
weight = torch.matmul(self.w_comp, weight).view(self.num_rels,
self.in_feat, self.out_feat)
else:
weight = self.weight
if self.is_input_layer:
def message_func(edges):
# for input layer, matrix multiply can be converted to be
# an embedding lookup using source node id
embed = weight.view(-1, self.out_feat)
index = edges.data['rel_type'] * self.in_feat + edges.src['id']
return {'msg': embed[index] * edges.data['norm']}
else:
def message_func(edges):
w = weight[edges.data['rel_type']]
msg = torch.bmm(edges.src['h'].unsqueeze(1), w).squeeze()
msg = msg * edges.data['norm']
return {'msg': msg}
def apply_func(nodes):
h = nodes.data['h']
if self.bias:
h = h + self.bias
if self.activation:
h = self.activation(h)
return {'h': h}
g.update_all(message_func, fn.sum(msg='msg', out='h'), apply_func)
class Model(nn.Module):
def __init__(self, num_nodes, h_dim, out_dim, num_rels,
num_bases=-1, num_hidden_layers=1):
super(Model, self).__init__()
self.num_nodes = num_nodes
self.h_dim = h_dim
self.out_dim = out_dim
self.num_rels = num_rels
self.num_bases = num_bases
self.num_hidden_layers = num_hidden_layers
# create rgcn layers
self.build_model()
# create initial features
self.features = self.create_features()
def build_model(self):
self.layers = nn.ModuleList()
# input to hidden
i2h = self.build_input_layer()
self.layers.append(i2h)
# hidden to hidden
for _ in range(self.num_hidden_layers):
h2h = self.build_hidden_layer()
self.layers.append(h2h)
# hidden to output
h2o = self.build_output_layer()
self.layers.append(h2o)
# initialize feature for each node
def create_features(self):
features = torch.arange(self.num_nodes)
return features
def build_input_layer(self):
return RGCNLayer(self.num_nodes, self.h_dim, self.num_rels, self.num_bases,
activation=F.relu, is_input_layer=True)
def build_hidden_layer(self):
return RGCNLayer(self.h_dim, self.h_dim, self.num_rels, self.num_bases,
activation=F.relu)
def build_output_layer(self):
return RGCNLayer(self.h_dim, self.out_dim, self.num_rels, self.num_bases,
activation=partial(F.softmax, dim=1))
def forward(self, g):
if self.features is not None:
g.ndata['id'] = self.features
for layer in self.layers:
layer(g)
return g.ndata.pop('h')
class Model(nn.Module):
def __init__(self, num_nodes, h_dim, out_dim, num_rels,
num_bases=-1, num_hidden_layers=1):
super(Model, self).__init__()
self.num_nodes = num_nodes
self.h_dim = h_dim
self.out_dim = out_dim
self.num_rels = num_rels
self.num_bases = num_bases
self.num_hidden_layers = num_hidden_layers
# create rgcn layers
self.build_model()
# create initial features
self.features = self.create_features()
def build_model(self):
self.layers = nn.ModuleList()
# input to hidden
i2h = self.build_input_layer()
self.layers.append(i2h)
# hidden to hidden
for _ in range(self.num_hidden_layers):
h2h = self.build_hidden_layer()
self.layers.append(h2h)
# hidden to output
h2o = self.build_output_layer()
self.layers.append(h2o)
# initialize feature for each node
def create_features(self):
features = torch.arange(self.num_nodes)
return features
def build_input_layer(self):
return RGCNLayer(self.num_nodes, self.h_dim, self.num_rels, self.num_bases,
activation=F.relu, is_input_layer=True)
def build_hidden_layer(self):
return RGCNLayer(self.h_dim, self.h_dim, self.num_rels, self.num_bases,
activation=F.relu)
def build_output_layer(self):
return RGCNLayer(self.h_dim, self.out_dim, self.num_rels, self.num_bases,
activation=partial(F.softmax, dim=1))
def forward(self, g):
if self.features is not None:
g.ndata['id'] = self.features
for layer in self.layers:
layer(g)
return g.ndata.pop('h')
# load graph data
from dgl.contrib.data import load_data
data = load_data(dataset='aifb')
num_nodes = data.num_nodes
num_rels = data.num_rels
num_classes = data.num_classes
labels = data.labels
train_idx = data.train_idx
# split training and validation set
val_idx = train_idx[:len(train_idx) // 5]
train_idx = train_idx[len(train_idx) // 5:]
# edge type and normalization factor
edge_type = torch.from_numpy(data.edge_type)
edge_norm = torch.from_numpy(data.edge_norm).unsqueeze(1)
labels = torch.from_numpy(labels).view(-1)
# configurations
n_hidden = 16 # number of hidden units
n_bases = -1 # use number of relations as number of bases
n_hidden_layers = 0 # use 1 input layer, 1 output layer, no hidden layer
n_epochs = 25 # epochs to train
lr = 0.01 # learning rate
l2norm = 0 # L2 norm coefficient
# create graph
g = DGLGraph((data.edge_src, data.edge_dst))
g.edata.update({'rel_type': edge_type, 'norm': edge_norm})
# create model
model = Model(g.num_nodes(),
n_hidden,
num_classes,
num_rels,
num_bases=n_bases,
num_hidden_layers=n_hidden_layers)
# optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=l2norm)
print("start training...")
model.train()
for epoch in range(n_epochs):
optimizer.zero_grad()
logits = model.forward(g)
loss = F.cross_entropy(logits[train_idx], labels[train_idx])
loss.backward()
optimizer.step()
train_acc = torch.sum(logits[train_idx].argmax(dim=1) == labels[train_idx])
train_acc = train_acc.item() / len(train_idx)
val_loss = F.cross_entropy(logits[val_idx], labels[val_idx])
val_acc = torch.sum(logits[val_idx].argmax(dim=1) == labels[val_idx])
val_acc = val_acc.item() / len(val_idx)
print("Epoch {:05d} | ".format(epoch) +
"Train Accuracy: {:.4f} | Train Loss: {:.4f} | ".format(
train_acc, loss.item()) +
"Validation Accuracy: {:.4f} | Validation loss: {:.4f}".format(
val_acc, val_loss.item()))
Capsule实践
Dynamic Routing Between Capsules
(1)对于传统的标准神经网络因层次结构太少,只存在神经元,层,网络三个层级,需要对每一层中的神经元组成胶囊(capsule),在胶囊内部能够做大量的内部计算,并实现输出压缩后的界面。
(2)作用是同步过滤,在传统神经元中,标量 x i x_i xi,加权求和得到 a j a_j aj。用非线性激活函数,转换得到神经元输出,是个标量值,激活函数可以选择sigmoid、tanh和ReLU等,最终得到标量。在Capsule中,ui是向量,矩阵的乘就是一个简单的仿射变换,然后,对i维度做加权求和,传统是对标量加权求和,Capsule是对向量加权求和得到向量。Squash函数是个非线性的函数,与传统非线性对应,输出是向量。(1)其中向量的长度表示其存在的概率,方向表示向量的属性。(2)模型的输出依据capsule同父类(上层向量)的正确程度被送到子类节点。在训练期间,routing是迭代完成的。每次迭代都会根据观察到的routing调整胶囊之间的路由权重。这是一种类似于k均值算法或竞争性学习的方式。
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-tdyXsviH-1667047514183)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028184523667.png)]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ndh22oVo-1667047514183)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028193816505.png)]
import matplotlib.pyplot as plt
import numpy as np
import torch as th
import torch.nn as nn
import torch.nn.functional as F
import dgl
def init_graph(in_nodes, out_nodes, f_size):
u = np.repeat(np.arange(in_nodes), out_nodes)
v = np.tile(np.arange(in_nodes, in_nodes + out_nodes), in_nodes)
g = dgl.DGLGraph((u, v))
# init states
g.ndata["v"] = th.zeros(in_nodes + out_nodes, f_size)
g.edata["b"] = th.zeros(in_nodes * out_nodes, 1)
return g
import dgl.function as fn
class DGLRoutingLayer(nn.Module):
def __init__(self, in_nodes, out_nodes, f_size):
super(DGLRoutingLayer, self).__init__()
self.g = init_graph(in_nodes, out_nodes, f_size)
self.in_nodes = in_nodes
self.out_nodes = out_nodes
self.in_indx = list(range(in_nodes))
self.out_indx = list(range(in_nodes, in_nodes + out_nodes))
def forward(self, u_hat, routing_num=1):
self.g.edata["u_hat"] = u_hat
for r in range(routing_num):
# step 1 (line 4): normalize over out edges
edges_b = self.g.edata["b"].view(self.in_nodes, self.out_nodes)
self.g.edata["c"] = F.softmax(edges_b, dim=1).view(-1, 1)
self.g.edata["c u_hat"] = self.g.edata["c"] * self.g.edata["u_hat"]
# Execute step 1 & 2
self.g.update_all(fn.copy_e("c u_hat", "m"), fn.sum("m", "s"))
# step 3 (line 6)
self.g.nodes[self.out_indx].data["v"] = self.squash(
self.g.nodes[self.out_indx].data["s"], dim=1
)
# step 4 (line 7)
v = th.cat(
[self.g.nodes[self.out_indx].data["v"]] * self.in_nodes, dim=0
)
self.g.edata["b"] = self.g.edata["b"] + (
self.g.edata["u_hat"] * v
).sum(dim=1, keepdim=True)
@staticmethod
def squash(s, dim=1):
sq = th.sum(s**2, dim=dim, keepdim=True)
s_norm = th.sqrt(sq)
s = (sq / (1.0 + sq)) * (s / s_norm)
return s
# test
in_nodes = 20
out_nodes = 10
f_size = 4
u_hat = th.randn(in_nodes * out_nodes, f_size)
routing = DGLRoutingLayer(in_nodes, out_nodes, f_size)
entropy_list = []
dist_list = []
for i in range(10):
routing(u_hat)
dist_matrix = routing.g.edata["c"].view(in_nodes, out_nodes)
entropy = (-dist_matrix * th.log(dist_matrix)).sum(dim=1)
entropy_list.append(entropy.data.numpy())
dist_list.append(dist_matrix.data.numpy())
stds = np.std(entropy_list, axis=1)
means = np.mean(entropy_list, axis=1)
plt.errorbar(np.arange(len(entropy_list)), means, stds, marker="o")
plt.ylabel("Entropy of Weight Distribution")
plt.xlabel("Number of Routing")
plt.xticks(np.arange(len(entropy_list)))
plt.close()
import matplotlib.animation as animation
import seaborn as sns
fig = plt.figure(dpi=150)
fig.clf()
ax = fig.subplots()
def dist_animate(i):
ax.cla()
sns.distplot(dist_list[i].reshape(-1), kde=False, ax=ax)
ax.set_xlabel("Weight Distribution Histogram")
ax.set_title("Routing: %d" % (i))
ani = animation.FuncAnimation(
fig, dist_animate, frames=len(entropy_list), interval=500
)
plt.close()
import networkx as nx
from networkx.algorithms import bipartite
g = routing.g.to_networkx()
X, Y = bipartite.sets(g)
height_in = 10
height_out = height_in * 0.8
height_in_y = np.linspace(0, height_in, in_nodes)
height_out_y = np.linspace((height_in - height_out) / 2, height_out, out_nodes)
pos = dict()
fig2 = plt.figure(figsize=(8, 3), dpi=150)
fig2.clf()
ax = fig2.subplots()
pos.update(
(n, (i, 1)) for i, n in zip(height_in_y, X)
) # put nodes from X at x=1
pos.update(
(n, (i, 2)) for i, n in zip(height_out_y, Y)
) # put nodes from Y at x=2
def weight_animate(i):
ax.cla()
ax.axis("off")
ax.set_title("Routing: %d " % i)
dm = dist_list[i]
nx.draw_networkx_nodes(
g, pos, nodelist=range(in_nodes), node_color="r", node_size=100, ax=ax
)
nx.draw_networkx_nodes(
g,
pos,
nodelist=range(in_nodes, in_nodes + out_nodes),
node_color="b",
node_size=100,
ax=ax,
)
for edge in g.edges():
nx.draw_networkx_edges(
g,
pos,
edgelist=[edge],
width=dm[edge[0], edge[1] - in_nodes] * 1.5,
ax=ax,
)
ani2 = animation.FuncAnimation(
fig2, weight_animate, frames=len(dist_list), interval=500
)
plt.close()
Tree—LSTM in DGL
论文实践
该模型核心思想是通过将链式LSTM扩展到树结构LSTM来引入语言任务的句法信息。利用依赖树和支持树技术来获得“潜在树”。
其中因差异性的树通常在结构上存在差异,DGL通过将这些树放置在一个简单图中,通过不同树之间的的结构完成消息传递的过程。
Generative Models of Graphs
论文实践
生成式模型用于实现图的训练以及生成,通过图生成模型进行图结构的形成。感觉上通强化学习类似。
现阶段衡量真实图数据的属性有:
-
度的分布
1.1 一个随机节点度为k的概率,即可以通过节点度数归一化直方图
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-p4kbvhY6-1667047514184)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028215009816.png)]
-
聚类系数
2.1 衡量衡量节点邻居的连接紧密程度。其中节点i的度数为 k i k_i ki,其中邻居间边数为 e i e_i ei,即实际存在的邻居的边数占所有邻居上可能存在的边数。整个图上的clustering coefficient就是对每个节点的clustering coefficient取平均。
C i = e i K i ( K i − 1 ) C_i=\frac{e_i}{K_i(K_i-1)} Ci=Ki(Ki−1)ei -
连接组成部分
connectivity是任意两个节点都有路径相连的最大子图的大小。找到connected components(连通分量)的方法:随机选取节点跑BFS,标记所有被访问到的节点;如果所有节点都能访问到,说明整个网络都是连通的;否则就选一个没有访问过的节点重复BFS过程。
-
路径长度
生成图的基本步骤:
- 对处于变化的图编码
- 随机添加行为
- 如果在训练过程中,收集错误标志并最优化模型参数
DGMG( Deep Generative Models of Graphs )
在每一个事件步,选择1. 在图中增加新的节点,2. 选择两个已存在的node并在其中增添一条边。
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-lw5b1IuG-1667047514184)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028222814104.png)]
优化目标
同语言建模保持一致,DGMG通过假定存在以序列形式存在的行为
a
1
,
⋯
,
a
T
a_{1},\cdots,a_{T}
a1,⋯,aT,此时模型需要跟随这些步骤,计算整体概率分布形式,并使该MLE损失函数最小化。
p
(
a
1
,
⋯
,
a
T
)
=
p
(
a
1
)
p
(
a
2
∣
a
1
)
⋯
p
(
a
T
∣
a
1
,
⋯
,
a
T
−
1
)
.
\begin{split}p(a_{1},\cdots, a_{T}) = p(a_{1})p(a_{2}|a_{1})\cdots p(a_{T}|a_{1},\cdots,a_{T-1}).\\\end{split}
p(a1,⋯,aT)=p(a1)p(a2∣a1)⋯p(aT∣a1,⋯,aT−1).
我们的目标为最小化MLE损失
−
log
p
(
a
1
,
⋯
,
a
T
)
=
−
∑
t
=
1
T
log
p
(
a
t
∣
a
1
,
⋯
,
a
t
−
1
)
.
\begin{split}-\log p(a_{1},\cdots,a_{T})=-\sum_{t=1}^{T}\log p(a_{t}|a_{1},\cdots, a_{t-1}).\\\end{split}
−logp(a1,⋯,aT)=−t=1∑Tlogp(at∣a1,⋯,at−1).
def forward_train(self, actions):
"""
- actions: list
- Contains a_1, ..., a_T described above
- self.prepare_for_train()
- Initializes self.action_step to be 0, which will get
incremented by 1 every time it is called.
- Initializes objects recording log p(a_t|a_1,...a_{t-1})
Returns
-------
- self.get_log_prob(): log p(a_1, ..., a_T)
"""
self.prepare_for_train()
stop = self.add_node_and_update(a=actions[self.action_step])
while not stop:
to_add_edge = self.add_edge_or_not(a=actions[self.action_step])
while to_add_edge:
self.choose_dest_and_update(a=actions[self.action_step])
to_add_edge = self.add_edge_or_not(a=actions[self.action_step])
stop = self.add_node_and_update(a=actions[self.action_step])
return self.get_log_prob()
需要实现DGMG框架
import torch.nn as nn
class DGMGSkeleton(nn.Module):
def __init__(self, v_max):
"""
Parameters
----------
v_max: int
Max number of nodes considered
"""
super(DGMGSkeleton, self).__init__()
# Graph configuration
self.v_max = v_max
def add_node_and_update(self, a=None):
"""Decide if to add a new node.
If a new node should be added, update the graph."""
return NotImplementedError
def add_edge_or_not(self, a=None):
"""Decide if a new edge should be added."""
return NotImplementedError
def choose_dest_and_update(self, a=None):
"""Choose destination and connect it to the latest node.
Add edges for both directions and update the graph."""
return NotImplementedError
def forward_train(self, actions):
"""Forward at training time. It records the probability
of generating a ground truth graph following the actions."""
return NotImplementedError
def forward_inference(self):
"""Forward at inference time.
It generates graphs on the fly."""
return NotImplementedError
def forward(self, actions=None):
# The graph you will work on
self.g = dgl.DGLGraph()
# If there are some features for nodes and edges,
# zero tensors will be set for those of new nodes and edges.
self.g.set_n_initializer(dgl.frame.zero_initializer)
self.g.set_e_initializer(dgl.frame.zero_initializer)
if self.training:
return self.forward_train(actions=actions)
else:
return self.forward_inference()
实现动态图编码
因上述行为产生的图都由一个概率分布函数采样获得,为投影该结构化的数据至一个欧式空间。此时最大的挑战是该过程需要再图形变化的过程中仍然需要重复。
h
G
=
∑
v
∈
V
Sigmoid
(
g
m
(
h
v
)
)
f
m
(
h
v
)
,
\begin{split}\textbf{h}_{G} =\sum_{v\in V}\text{Sigmoid}(g_m(\textbf{h}_{v}))f_{m}(\textbf{h}_{v}),\\\end{split}
hG=v∈V∑Sigmoid(gm(hv))fm(hv),
未来方向:
本文围绕图神经网络展开,介绍了GCN、GAT、GraphSAGE等同构图原理及实现,分析了GCN缺点及GAT、GraphSAGE的改进。还阐述了RGCN在多关系图场景的应用,以及Capsule实践、Tree—LSTM in DGL和图生成模型的论文实践,包括优化目标和动态图编码等内容。
1284

被折叠的 条评论
为什么被折叠?



