RGCN，GCN，GRAPHSAGE论文阅读实践

最新推荐文章于 2024-09-17 13:36:06 发布

原创最新推荐文章于 2024-09-17 13:36:06 发布 · 1.4k 阅读

4 ·

CC 4.0 BY-SA版权

文章标签：

#论文阅读 #python #sklearn #cnn #网络

图神经网络专栏收录该内容

2 篇文章

订阅专栏

本文围绕图神经网络展开，介绍了GCN、GAT、GraphSAGE等同构图原理及实现，分析了GCN缺点及GAT、GraphSAGE的改进。还阐述了RGCN在多关系图场景的应用，以及Capsule实践、Tree—LSTM in DGL和图生成模型的论文实践，包括优化目标和动态图编码等内容。

部署运行你感兴趣的模型镜像

图神经网络

在本文中，我们将图神经网络划分为五大类别，分别是:图卷积网络(Graph Convolution Networks, GCN)、图注意力网络(Graph Attention Networks)、图自编码器( Graph Autoencoders)、图生成网络( Graph Generative Networks)和图时空网络(Graph Spatial-temporal Networks)

3.图自动编码器(Graph Autoencoders)

图自动编码器是一类图嵌入方法，其目的是利用神经网络结构将图的顶点表示为低维向量。典型的解决方案是利用多层感知机作为编码器来获取节点嵌入，其中解码器重建节点的邻域统计信息。

RGCN论文阅读实践

论文主题

Relational Graph Convo-lutional Networks (R-GCNs) 适用于链路预测（对于缺失事实的恢复，主谓宾三元组）或者用于节点分类问题（对于节点缺失属性的填充），R-GCN适用于对于高维多关系的数据结构，其通过因子分解模型作为编码模型相对于只包含解码器的baseline模型有良好提升。即针对于P2P平台间关系（基于不同属性形成的多元路径）的多样性，能够良好实现节点预测。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-W92pMTPC-1667047514179)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028113150992.png)]

GCN,GAT,GraphSAGE等同构图原理及实现

GCN

$H^{l+1}=\sigma(D^{-1/2}AD^{1/2}H^{(l)}W^{(l)})$

其中H表示第l层的节点，D表示度矩阵，A表示邻接矩阵，其过程同CNN卷积过程相似，是一个加权求和的过程，利用邻居点通过度矩阵及其临阶矩阵，计算各边权重，随后加权求和。

主要缺点：1.融合时边权值是固定的，不够灵活。2.可扩展性差，因为它是全图卷积融合，全图做梯度更新，当图比较大时，这样的方式就太慢了，不合适。3.层数加深时，结果会极容易平滑，每个点的特征结果都十分相似。

GAT就来解决问题1的，GraphSAGE就来解决这个问题2的，DeepGCN等一系列文章就来讨论问题3的。

GAT

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-rlgmXufY-1667047514181)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028133842750.png)]

其中h_i,h_j,h_k表示为node_feather，其中a_i,j表示为第i与第j点之间的的attention系数。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-P6XHLkff-1667047514181)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028135046955.png)]

之后，第i个点的融合attention过后的node feature可以表示下面这个公式，实质上还是一个加权的feature求和过程，只是每次融合中的权重系数是随模型训练学习的，最后在经过一个非线性的激活函数去做对应的任务。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-miVaOePH-1667047514182)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028135229933.png)]

为了使得attention机制更具有扩展性，定义了multi-head attention机制，其中k表示K个attention head，不同聚合方式对于多头注意力机制上存在差异。GAT中的attention机制还是很直观的，通过给每条边加了一个模型可学习的系数a_i,j，并基于attention系数进行node feather的融合，依据任务调整模型参数，能够使得自适应参数效果更好。

GraphSAGE

transductive是说要预测的数据在训练时模型也能看到。进一步解释一下，就是说训练前，构建的图结构已经是固定的，你要预测的点或边关系结构都应该已经在这个图里了，训练跟预测时的图结构都是一样的。
inductive是说要预测的数据在训练时模型不用看到，也就是我们平常做算法模型的样子，训练预测时的数据是分开的，也就是上面说的可以图结构不是固定的，加入新的节点。

GraphSAGE提出随机采子图的方式去采样，通过子图更新node embedding，这样采出的子图结构本身就是变化，从而让模型学到的是一种采样及聚合的参数的方式，有效解决了unseen nodes问题，同时避免了训练中需要把整个图的node embedding一起更新的窘境，有效的增加了扩展性。

采子图：训练过程中，对于每一个节点采用切子图的方法，随机sample出部分的邻居点，作为聚合的feature点。如上图最左边，对于中心点，采两度，同时sample出部分邻居节点组成训练中的子图。
聚合：采出子图后，做feature聚合。这里与GCN的方式是一致的，从最外层往里聚合，从而聚合得到中心点的node embedding。聚合这里可操作的地方很多，比如你可以修改聚合函数（一般是用的mean、sum，pooling等），或增加边权值都是可以的。
任务预测：得到node embedding后，就可以接下游任务了，比如做node classification，node embedding后接一个linear层+softmax做分类即可。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-nCwmRApu-1667047514182)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028140919952.png)]

GraphSAGE主要解决了两个问题：1.解决了预测中unseen nodes的问题，原来的GCN训练时，需要看到所有nodes的图数据。2.解决了图规模较大，全图进行梯度更新，内存消耗大，计算慢的问题

RGCN

RGCN应该说是GCN在多关系图场景上的一个简单尝试。***从同构图到异构图，RGCN要解决的核心问题就一个，就是多关系间怎么交互。***RGCN通过使用一个通用的GNN模型，通过计算多元性质边节点的编码形式（不同的下游处理）去计算实体的embedding。

RGCN所描述的在每一种关系下，指向内与指向外的都能够作为其邻居点，同时加入自循环的特征，进行特征融合用于更新中心节点。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-CIVGy9eJ-1667047514182)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028142351497.png)]
$h_i^{l+1}=\sigma(\sum_{r\in\mathcal{R}}\sum_{j\in\mathcal{N}_i^r}W^{(l)}_rh^{l}_j/(c_{i,r})+W^{(l)}_0h^{l}_i)$
其中双层循环遍历，遍历每一种关系下，叠加每一个点的邻居点的特征进行融合，最后加上一层的中心节点特征，经过一个激活函数输出作为中心节点的输出特征，其中W为维度转换矩阵，也就是模型参数。其中R表示关系结合，N表示邻居节点，c_i,r表示针对于问题特定的乘子。相较于GCN中采用度矩阵与领接矩阵作为加权求和的特征融合，而RGCN更多是在模型过程中自行学习。

同时，其中因多关系在参数上的提升，即给定了两个W矩阵规则化定义：

Bias-decomposition(共享转换矩阵参数)，

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-H3QvzOX6-1667047514182)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028155239764.png)]
Block-diagonal-decomposition(权重矩阵W由基础小矩阵拼接得到，保证W的稀疏性)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-qOpBpW0u-1667047514183)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028155248495.png)]

（1）中B表示分解block个数（常量）， V_b为分解的参数关系矩阵，与因子 a_rb组成一对相关系数，都与关系类型r相关，这里不同关系是共享 V_b

（2）中表示为一系列的低维矩阵求和

在通过RGCN作为encode，聚合得到node embedding后，节点分类任务比较好理解，就是拿着encode得到的node embedding后接一个逻辑回归或linear层，结合Cross-Entropy做一个分类任务。

而关系预测任务，在encode得到node embedding后，则是类似于TransE,计算三元组（s,r,o），得到一个score。（文中采用的是DistMult，原理差不多）

import torch
import torch.nn as nn
import torch.nn.functional as F
from dgl import DGLGraph
import dgl.function as fn
from functools import partial

class RGCNLayer(nn.Module):
    def __init__(self, in_feat, out_feat, num_rels, num_bases=-1, bias=None,
                 activation=None, is_input_layer=False):
        super(RGCNLayer, self).__init__()
        self.in_feat = in_feat
        self.out_feat = out_feat
        self.num_rels = num_rels
        self.num_bases = num_bases
        self.bias = bias
        self.activation = activation
        self.is_input_layer = is_input_layer

        # sanity check
        if self.num_bases <= 0 or self.num_bases > self.num_rels:
            self.num_bases = self.num_rels

        # weight bases in equation (3)
        self.weight = nn.Parameter(torch.Tensor(self.num_bases, self.in_feat,
                                                self.out_feat))
        if self.num_bases < self.num_rels:
            # linear combination coefficients in equation (3)
            self.w_comp = nn.Parameter(torch.Tensor(self.num_rels, self.num_bases))

        # add bias
        if self.bias:
            self.bias = nn.Parameter(torch.Tensor(out_feat))

        # init trainable parameters
        nn.init.xavier_uniform_(self.weight,
                                gain=nn.init.calculate_gain('relu'))
        if self.num_bases < self.num_rels:
            nn.init.xavier_uniform_(self.w_comp,
                                    gain=nn.init.calculate_gain('relu'))
        if self.bias:
            nn.init.xavier_uniform_(self.bias,
                                    gain=nn.init.calculate_gain('relu'))

    def forward(self, g):
        if self.num_bases < self.num_rels:
            # generate all weights from bases (equation (3))
            weight = self.weight.view(self.in_feat, self.num_bases, self.out_feat)
            weight = torch.matmul(self.w_comp, weight).view(self.num_rels,
                                                        self.in_feat, self.out_feat)
        else:
            weight = self.weight

        if self.is_input_layer:
            def message_func(edges):
                # for input layer, matrix multiply can be converted to be
                # an embedding lookup using source node id
                embed = weight.view(-1, self.out_feat)
                index = edges.data['rel_type'] * self.in_feat + edges.src['id']
                return {'msg': embed[index] * edges.data['norm']}
        else:
            def message_func(edges):
                w = weight[edges.data['rel_type']]
                msg = torch.bmm(edges.src['h'].unsqueeze(1), w).squeeze()
                msg = msg * edges.data['norm']
                return {'msg': msg}

        def apply_func(nodes):
            h = nodes.data['h']
            if self.bias:
                h = h + self.bias
            if self.activation:
                h = self.activation(h)
            return {'h': h}

        g.update_all(message_func, fn.sum(msg='msg', out='h'), apply_func)
class Model(nn.Module):
    def __init__(self, num_nodes, h_dim, out_dim, num_rels,
                 num_bases=-1, num_hidden_layers=1):
        super(Model, self).__init__()
        self.num_nodes = num_nodes
        self.h_dim = h_dim
        self.out_dim = out_dim
        self.num_rels = num_rels
        self.num_bases = num_bases
        self.num_hidden_layers = num_hidden_layers

        # create rgcn layers
        self.build_model()

        # create initial features
        self.features = self.create_features()

    def build_model(self):
        self.layers = nn.ModuleList()
        # input to hidden
        i2h = self.build_input_layer()
        self.layers.append(i2h)
        # hidden to hidden
        for _ in range(self.num_hidden_layers):
            h2h = self.build_hidden_layer()
            self.layers.append(h2h)
        # hidden to output
        h2o = self.build_output_layer()
        self.layers.append(h2o)

    # initialize feature for each node
    def create_features(self):
        features = torch.arange(self.num_nodes)
        return features

    def build_input_layer(self):
        return RGCNLayer(self.num_nodes, self.h_dim, self.num_rels, self.num_bases,
                         activation=F.relu, is_input_layer=True)

    def build_hidden_layer(self):
        return RGCNLayer(self.h_dim, self.h_dim, self.num_rels, self.num_bases,
                         activation=F.relu)

    def build_output_layer(self):
        return RGCNLayer(self.h_dim, self.out_dim, self.num_rels, self.num_bases,
                         activation=partial(F.softmax, dim=1))

    def forward(self, g):
        if self.features is not None:
            g.ndata['id'] = self.features
        for layer in self.layers:
            layer(g)
        return g.ndata.pop('h')
 class Model(nn.Module):
    def __init__(self, num_nodes, h_dim, out_dim, num_rels,
                 num_bases=-1, num_hidden_layers=1):
        super(Model, self).__init__()
        self.num_nodes = num_nodes
        self.h_dim = h_dim
        self.out_dim = out_dim
        self.num_rels = num_rels
        self.num_bases = num_bases
        self.num_hidden_layers = num_hidden_layers

        # create rgcn layers
        self.build_model()

        # create initial features
        self.features = self.create_features()

    def build_model(self):
        self.layers = nn.ModuleList()
        # input to hidden
        i2h = self.build_input_layer()
        self.layers.append(i2h)
        # hidden to hidden
        for _ in range(self.num_hidden_layers):
            h2h = self.build_hidden_layer()
            self.layers.append(h2h)
        # hidden to output
        h2o = self.build_output_layer()
        self.layers.append(h2o)

    # initialize feature for each node
    def create_features(self):
        features = torch.arange(self.num_nodes)
        return features

    def build_input_layer(self):
        return RGCNLayer(self.num_nodes, self.h_dim, self.num_rels, self.num_bases,
                         activation=F.relu, is_input_layer=True)

    def build_hidden_layer(self):
        return RGCNLayer(self.h_dim, self.h_dim, self.num_rels, self.num_bases,
                         activation=F.relu)

    def build_output_layer(self):
        return RGCNLayer(self.h_dim, self.out_dim, self.num_rels, self.num_bases,
                         activation=partial(F.softmax, dim=1))

    def forward(self, g):
        if self.features is not None:
            g.ndata['id'] = self.features
        for layer in self.layers:
            layer(g)
        return g.ndata.pop('h')
 # load graph data
from dgl.contrib.data import load_data
data = load_data(dataset='aifb')
num_nodes = data.num_nodes
num_rels = data.num_rels
num_classes = data.num_classes
labels = data.labels
train_idx = data.train_idx
# split training and validation set
val_idx = train_idx[:len(train_idx) // 5]
train_idx = train_idx[len(train_idx) // 5:]

# edge type and normalization factor
edge_type = torch.from_numpy(data.edge_type)
edge_norm = torch.from_numpy(data.edge_norm).unsqueeze(1)

labels = torch.from_numpy(labels).view(-1)
# configurations
n_hidden = 16 # number of hidden units
n_bases = -1 # use number of relations as number of bases
n_hidden_layers = 0 # use 1 input layer, 1 output layer, no hidden layer
n_epochs = 25 # epochs to train
lr = 0.01 # learning rate
l2norm = 0 # L2 norm coefficient

# create graph
g = DGLGraph((data.edge_src, data.edge_dst))
g.edata.update({'rel_type': edge_type, 'norm': edge_norm})

# create model
model = Model(g.num_nodes(),
              n_hidden,
              num_classes,
              num_rels,
              num_bases=n_bases,
              num_hidden_layers=n_hidden_layers)
# optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=l2norm)

print("start training...")
model.train()
for epoch in range(n_epochs):
    optimizer.zero_grad()
    logits = model.forward(g)
    loss = F.cross_entropy(logits[train_idx], labels[train_idx])
    loss.backward()

    optimizer.step()

    train_acc = torch.sum(logits[train_idx].argmax(dim=1) == labels[train_idx])
    train_acc = train_acc.item() / len(train_idx)
    val_loss = F.cross_entropy(logits[val_idx], labels[val_idx])
    val_acc = torch.sum(logits[val_idx].argmax(dim=1) == labels[val_idx])
    val_acc = val_acc.item() / len(val_idx)
    print("Epoch {:05d} | ".format(epoch) +
          "Train Accuracy: {:.4f} | Train Loss: {:.4f} | ".format(
              train_acc, loss.item()) +
          "Validation Accuracy: {:.4f} | Validation loss: {:.4f}".format(
              val_acc, val_loss.item()))

Capsule实践

Dynamic Routing Between Capsules

（1）对于传统的标准神经网络因层次结构太少，只存在神经元，层，网络三个层级，需要对每一层中的神经元组成胶囊（capsule），在胶囊内部能够做大量的内部计算，并实现输出压缩后的界面。

（2）作用是同步过滤，在传统神经元中，标量 $x_i$ ，加权求和得到 $a_j$ 。用非线性激活函数，转换得到神经元输出，是个标量值，激活函数可以选择sigmoid、tanh和ReLU等，最终得到标量。在Capsule中，ui是向量，矩阵的乘就是一个简单的仿射变换，然后，对i维度做加权求和，传统是对标量加权求和，Capsule是对向量加权求和得到向量。Squash函数是个非线性的函数，与传统非线性对应，输出是向量。（1）其中向量的长度表示其存在的概率，方向表示向量的属性。（2）模型的输出依据capsule同父类（上层向量）的正确程度被送到子类节点。在训练期间，routing是迭代完成的。每次迭代都会根据观察到的routing调整胶囊之间的路由权重。这是一种类似于k均值算法或竞争性学习的方式。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-tdyXsviH-1667047514183)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028184523667.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ndh22oVo-1667047514183)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028193816505.png)]

import matplotlib.pyplot as plt
import numpy as np
import torch as th
import torch.nn as nn
import torch.nn.functional as F

import dgl

def init_graph(in_nodes, out_nodes, f_size):
    u = np.repeat(np.arange(in_nodes), out_nodes)
    v = np.tile(np.arange(in_nodes, in_nodes + out_nodes), in_nodes)
    g = dgl.DGLGraph((u, v))
    # init states
    g.ndata["v"] = th.zeros(in_nodes + out_nodes, f_size)
    g.edata["b"] = th.zeros(in_nodes * out_nodes, 1)
    return g

import dgl.function as fn


class DGLRoutingLayer(nn.Module):
    def __init__(self, in_nodes, out_nodes, f_size):
        super(DGLRoutingLayer, self).__init__()
        self.g = init_graph(in_nodes, out_nodes, f_size)
        self.in_nodes = in_nodes
        self.out_nodes = out_nodes
        self.in_indx = list(range(in_nodes))
        self.out_indx = list(range(in_nodes, in_nodes + out_nodes))

    def forward(self, u_hat, routing_num=1):
        self.g.edata["u_hat"] = u_hat

        for r in range(routing_num):
            # step 1 (line 4): normalize over out edges
            edges_b = self.g.edata["b"].view(self.in_nodes, self.out_nodes)
            self.g.edata["c"] = F.softmax(edges_b, dim=1).view(-1, 1)
            self.g.edata["c u_hat"] = self.g.edata["c"] * self.g.edata["u_hat"]

            # Execute step 1 & 2
            self.g.update_all(fn.copy_e("c u_hat", "m"), fn.sum("m", "s"))

            # step 3 (line 6)
            self.g.nodes[self.out_indx].data["v"] = self.squash(
                self.g.nodes[self.out_indx].data["s"], dim=1
            )

            # step 4 (line 7)
            v = th.cat(
                [self.g.nodes[self.out_indx].data["v"]] * self.in_nodes, dim=0
            )
            self.g.edata["b"] = self.g.edata["b"] + (
                self.g.edata["u_hat"] * v
            ).sum(dim=1, keepdim=True)

    @staticmethod
    def squash(s, dim=1):
        sq = th.sum(s**2, dim=dim, keepdim=True)
        s_norm = th.sqrt(sq)
        s = (sq / (1.0 + sq)) * (s / s_norm)
        return s
# test
in_nodes = 20
out_nodes = 10
f_size = 4
u_hat = th.randn(in_nodes * out_nodes, f_size)
routing = DGLRoutingLayer(in_nodes, out_nodes, f_size)

entropy_list = []
dist_list = []

for i in range(10):
    routing(u_hat)
    dist_matrix = routing.g.edata["c"].view(in_nodes, out_nodes)
    entropy = (-dist_matrix * th.log(dist_matrix)).sum(dim=1)
    entropy_list.append(entropy.data.numpy())
    dist_list.append(dist_matrix.data.numpy())

stds = np.std(entropy_list, axis=1)
means = np.mean(entropy_list, axis=1)
plt.errorbar(np.arange(len(entropy_list)), means, stds, marker="o")
plt.ylabel("Entropy of Weight Distribution")
plt.xlabel("Number of Routing")
plt.xticks(np.arange(len(entropy_list)))
plt.close()


import matplotlib.animation as animation
import seaborn as sns

fig = plt.figure(dpi=150)
fig.clf()
ax = fig.subplots()


def dist_animate(i):
    ax.cla()
    sns.distplot(dist_list[i].reshape(-1), kde=False, ax=ax)
    ax.set_xlabel("Weight Distribution Histogram")
    ax.set_title("Routing: %d" % (i))


ani = animation.FuncAnimation(
    fig, dist_animate, frames=len(entropy_list), interval=500
)
plt.close()

import networkx as nx
from networkx.algorithms import bipartite

g = routing.g.to_networkx()
X, Y = bipartite.sets(g)
height_in = 10
height_out = height_in * 0.8
height_in_y = np.linspace(0, height_in, in_nodes)
height_out_y = np.linspace((height_in - height_out) / 2, height_out, out_nodes)
pos = dict()

fig2 = plt.figure(figsize=(8, 3), dpi=150)
fig2.clf()
ax = fig2.subplots()
pos.update(
    (n, (i, 1)) for i, n in zip(height_in_y, X)
)  # put nodes from X at x=1
pos.update(
    (n, (i, 2)) for i, n in zip(height_out_y, Y)
)  # put nodes from Y at x=2


def weight_animate(i):
    ax.cla()
    ax.axis("off")
    ax.set_title("Routing: %d  " % i)
    dm = dist_list[i]
    nx.draw_networkx_nodes(
        g, pos, nodelist=range(in_nodes), node_color="r", node_size=100, ax=ax
    )
    nx.draw_networkx_nodes(
        g,
        pos,
        nodelist=range(in_nodes, in_nodes + out_nodes),
        node_color="b",
        node_size=100,
        ax=ax,
    )
    for edge in g.edges():
        nx.draw_networkx_edges(
            g,
            pos,
            edgelist=[edge],
            width=dm[edge[0], edge[1] - in_nodes] * 1.5,
            ax=ax,
        )


ani2 = animation.FuncAnimation(
    fig2, weight_animate, frames=len(dist_list), interval=500
)
plt.close()

Tree—LSTM in DGL

论文实践

该模型核心思想是通过将链式LSTM扩展到树结构LSTM来引入语言任务的句法信息。利用依赖树和支持树技术来获得“潜在树”。

其中因差异性的树通常在结构上存在差异，DGL通过将这些树放置在一个简单图中，通过不同树之间的的结构完成消息传递的过程。

Generative Models of Graphs

论文实践

生成式模型用于实现图的训练以及生成，通过图生成模型进行图结构的形成。感觉上通强化学习类似。

现阶段衡量真实图数据的属性有：

度的分布

1.1 一个随机节点度为k的概率，即可以通过节点度数归一化直方图

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-p4kbvhY6-1667047514184)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028215009816.png)]
聚类系数

2.1 衡量衡量节点邻居的连接紧密程度。其中节点i的度数为 $k_i$ ，其中邻居间边数为 $e_i$ ，即实际存在的邻居的边数占所有邻居上可能存在的边数。整个图上的clustering coefficient就是对每个节点的clustering coefficient取平均。
$C_i=\frac{e_i}{K_i(K_i-1)}$
连接组成部分

connectivity是任意两个节点都有路径相连的最大子图的大小。找到connected components（连通分量）的方法：随机选取节点跑BFS，标记所有被访问到的节点；如果所有节点都能访问到，说明整个网络都是连通的；否则就选一个没有访问过的节点重复BFS过程。
路径长度

生成图的基本步骤：
1. 对处于变化的图编码
2. 随机添加行为
3. 如果在训练过程中，收集错误标志并最优化模型参数

DGMG（ Deep Generative Models of Graphs ）

在每一个事件步，选择1. 在图中增加新的节点，2. 选择两个已存在的node并在其中增添一条边。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-lw5b1IuG-1667047514184)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028222814104.png)]

优化目标

同语言建模保持一致，DGMG通过假定存在以序列形式存在的行为 $a_{1},\cdots,a_{T}$ ，此时模型需要跟随这些步骤，计算整体概率分布形式，并使该MLE损失函数最小化。
$\begin{split}p(a_{1},\cdots, a_{T}) = p(a_{1})p(a_{2}|a_{1})\cdots p(a_{T}|a_{1},\cdots,a_{T-1}).\\\end{split}$
我们的目标为最小化MLE损失
$\begin{split}-\log p(a_{1},\cdots,a_{T})=-\sum_{t=1}^{T}\log p(a_{t}|a_{1},\cdots, a_{t-1}).\\\end{split}$

def forward_train(self, actions):
    """
    - actions: list
        - Contains a_1, ..., a_T described above
    - self.prepare_for_train()
        - Initializes self.action_step to be 0, which will get
          incremented by 1 every time it is called.
        - Initializes objects recording log p(a_t|a_1,...a_{t-1})

    Returns
    -------
    - self.get_log_prob(): log p(a_1, ..., a_T)
    """
    self.prepare_for_train()

    stop = self.add_node_and_update(a=actions[self.action_step])
    while not stop:
        to_add_edge = self.add_edge_or_not(a=actions[self.action_step])
        while to_add_edge:
            self.choose_dest_and_update(a=actions[self.action_step])
            to_add_edge = self.add_edge_or_not(a=actions[self.action_step])
        stop = self.add_node_and_update(a=actions[self.action_step])

    return self.get_log_prob()

需要实现DGMG框架

import torch.nn as nn


class DGMGSkeleton(nn.Module):
    def __init__(self, v_max):
        """
        Parameters
        ----------
        v_max: int
            Max number of nodes considered
        """
        super(DGMGSkeleton, self).__init__()

        # Graph configuration
        self.v_max = v_max

    def add_node_and_update(self, a=None):
        """Decide if to add a new node.
        If a new node should be added, update the graph."""
        return NotImplementedError

    def add_edge_or_not(self, a=None):
        """Decide if a new edge should be added."""
        return NotImplementedError

    def choose_dest_and_update(self, a=None):
        """Choose destination and connect it to the latest node.
        Add edges for both directions and update the graph."""
        return NotImplementedError

    def forward_train(self, actions):
        """Forward at training time. It records the probability
        of generating a ground truth graph following the actions."""
        return NotImplementedError

    def forward_inference(self):
        """Forward at inference time.
        It generates graphs on the fly."""
        return NotImplementedError

    def forward(self, actions=None):
        # The graph you will work on
        self.g = dgl.DGLGraph()

        # If there are some features for nodes and edges,
        # zero tensors will be set for those of new nodes and edges.
        self.g.set_n_initializer(dgl.frame.zero_initializer)
        self.g.set_e_initializer(dgl.frame.zero_initializer)

        if self.training:
            return self.forward_train(actions=actions)
        else:
            return self.forward_inference()

实现动态图编码

因上述行为产生的图都由一个概率分布函数采样获得，为投影该结构化的数据至一个欧式空间。此时最大的挑战是该过程需要再图形变化的过程中仍然需要重复。
$\begin{split}\textbf{h}_{G} =\sum_{v\in V}\text{Sigmoid}(g_m(\textbf{h}_{v}))f_{m}(\textbf{h}_{v}),\\\end{split}$
未来方向：