Task03 基于图神经网络的节点表征学习

前言

本文主要参考了DataWhale图神经网络开源学习教程, Task03 :基于图神经网络的节点表征学习.
上一讲进行了消息传递范式的学习,本节主要学习基于图神经网络的节点表征。
节点表征:利用已知节点信息(节点标签、节点属性、与节点相连边的属性等)预测部分未知标签信息。
本节主要基于Cora数据集,对比MLP、GCN与GAT三种神经网络的节点表征能力。

1.准备工作

1.1数据集的获取

首先是前几节的内容,进行Planetoid 数据集的下载。其中,NormalizeFeatures进行了节点特征的归一化,使得各特征总和为一。

from torch_geometric.datasets import Planetoid
from torch_geometric.transforms import NormalizeFeatures
if __name__ == '__main__':
    # 下载数据集
    dataset = Planetoid(root= '../dataset/Planetoid', name='Cora', transform=NormalizeFeatures() )
    #数据集部分参数
    print(f'Dataset:{dataset}:')
    print(f'Number of graphs: {len(dataset)}')
    print(f'Number of features: {dataset.num_features}')
    print(f'Number of classes: {dataset.num_classes}')
    # 部分图的参数
    data = dataset[0]
    print(data)
    print(f'Number of nodes: {data.num_nodes}')
    print(f'Number of edges: {data.num_edges}')
    ######显示结果
	Dataset:Cora():
	Number of graphs: 1
	Number of features: 1433
	Number of classes: 7
	Data(edge_index=[2, 10556], test_mask=[2708], train_mask=[2708], val_mask=[2708], x=[2708, 1433], y=[2708])
	Number of nodes: 2708
	Number of edges: 10556

Cora图有2708个节点,仅使用140个有真是标签的节点进行训练,仅占5%!

1.2可视化

可视化过程中需要先把高维的特征嵌入到二维平面中,这里借助了流行学习的TSNE方法。代码如下:

import  matplotlib.pyplot as plt
from sklearn.manifold import TSNE
#
z=TSNE(n_components=2).fit_transform(out.detach().cpu().numpy())
plt.figure( figsize=(10,10))
plt.scatter(z[:,0], z[:,1], s=70, c= color, cmap="set2")
plt.show()

2. 三种方法在图节点分类任务中的对比

2.1 MLP图节点分类任务

构建了一个两层的神经网络,包含两个线性层、一个dropout层
代码如下:
最终分类精度仅 58.5 % ! 58.5\% ! 58.5%

Test Acc: 0.5850
import  torch
from torch.nn import Linear
import torch.nn.functional as F
from torch_geometric.datasets import Planetoid
from torch_geometric.transforms import NormalizeFeatures

class MyMLPNet(torch.nn.Module):
    def __init__ (self, dataset, hidden_channels):
        super(MyMLPNet, self).__init__()
        self.Lin1 = Linear(dataset.num_features, hidden_channels)
        self.Lin2 = Linear(hidden_channels, dataset.num_classes)

    def forward(self, x):
        x= self.Lin1(x)
        x= x.relu()
        x= F.dropout(x, p=0.5, training =self.training)
        x= self.Lin2(x)
        return x

def MLPTrain():
    mlp_model.train()
    optimizer.zero_grad()
    out = mlp_model( data.x)
    loss = criterion( out[data.train_mask], data.y[data.train_mask])
    loss.backward()
    optimizer.step()
    return loss

def MLPTest():
    mlp_model.eval()
    out = mlp_model(data.x)
    pred = out.argmax(dim= 1)
    test_cor = pred[ data.test_mask] == data.y[data.test_mask]
    test_acc = test_cor.sum() / data.test_mask.sum()
    #test_acc = int(test_cor.sum()) /    int(data.test_mask.sum())
    return test_acc

if __name__ =='__main__':
    torch.manual_seed(2021)
    # 下载数据集
    dataset = Planetoid(root= '../dataset/Planetoid', name='Cora', transform=NormalizeFeatures() )
    # 部分图的参数
    data = dataset[0]
    #建立模型
    mlp_model = MyMLPNet(dataset, hidden_channels=16)
    print(mlp_model)
    criterion = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam( mlp_model.parameters(), lr=0.01, weight_decay= 5e-4)
    # 喂入数据训练
    for epoch in range(1, 201):
        loss = MLPTrain()
        print(f'Epoch: {epoch:03d}, Loss:{loss:.4f}')
    #测试分类精度
    test_acc = MLPTest()
    print(f'Test Acc: {test_acc:.4f}')

2.2 GCN(图卷积神经网络)

1)图卷积神经网络的节点更新公式:
X ′ = D ^ − 1 / 2 A ^ D ^ − 1 / 2 X Θ , \mathbf{X}^{\prime} = \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2} \mathbf{X} \mathbf{\Theta}, X=D^1/2A^D^1/2XΘ,
其中, D ^ − 1 / 2 A ^ D ^ − 1 / 2 \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2} D^1/2A^D^1/2为对称归一化矩阵,与权值运算更新节点参数。具体地: A ^ = A + I \mathbf{\hat{A}}=\mathbf{A}+\mathbf{I} A^=A+I ,表示邻接矩阵插入了自环; D ^ i i = ∑ j = 0 A ^ i j \hat{D}_{ii} = \sum_{j=0} \hat{A}_{ij} D^ii=j=0A^ij为对角线度矩阵。
参考论文:“Semi-supervised Classification with Graph Convolutional Network
2)基于PyG库的GCNConv实现
PyG库提供了GCNConv类,可用于轻松实现GCN网络。GCN主要是包含了两个GCNConv层,具体实现代码如下:
最终分类精度仅 33.97 % ! 33.97\% ! 33.97% 自己跑的程序精度比MLP还低!

Test Acc: 0.3397
import  torch
import torch.nn.functional as F
from torch_geometric.datasets import Planetoid
from torch_geometric.transforms import NormalizeFeatures
from torch_geometric.nn import GCNConv

class MyGCNNet(torch.nn.Module):
    def __init__(self, datasets, hidden_channels):
        super(MyGCNNet, self).__init__()
        self.conv1 = GCNConv(datasets.num_features, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, datasets.num_classes)

    def forward(self, x, edge_index):
        x= self.conv1(x, edge_index)
        x= x.relu()
        x= F.dropout(x, p=0.5, training= self.training)
        x= self.conv2(x, edge_index)
        return x


def GCNTrain():
    gcn_model.train()
    optimizer.zero_grad()
    out = gcn_model( data.x, data.edge_index)
    loss = criterion( out[data.train_mask], data.y[data.train_mask])
    loss.backward()
    optimizer.step()
    return loss

def GCNTest():
    gcn_model.eval()
    out = gcn_model(data.x, data.edge_index)
    pred = out.argmax(dim= 1)
    test_cor = pred[ data.test_mask] == data.y[data.test_mask]
    #test_acc = test_cor.sum() / data.test_mask.sum()
    test_acc = int(test_cor.sum()) /    int(data.test_mask.sum())
    return test_acc

if __name__ =='__main__':
    torch.manual_seed(2020)
    # 下载数据集
    dataset = Planetoid(root= '../dataset/Planetoid', name='Cora', transform=NormalizeFeatures() )
    # 部分图的参数
    data = dataset[0]
    #建立模型
    gcn_model = MyGCNNet(dataset, hidden_channels=16)
    print(gcn_model)
    criterion = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam( gcn_model.parameters(), lr=0.01, weight_decay= 5e-4)

    # 喂入数据训练
    for epoch in range(1, 201):
        loss = GCNTrain()
        print(f'Epoch: {epoch:03d}, Loss:{loss:.4f}')
    #测试分类精度
    test_acc = GCNTrain()
    print(f'Test Acc: {test_acc:.4f}')

2.3 GAT 图注意力机制

参考文献:Graph Attention Networks
1)节点更新公式:
x i ′ = α i , i Θ x i + ∑ j ∈ N ( i ) α i , j Θ x j , \mathbf{x}^{\prime}_i = \alpha_{i,i}\mathbf{\Theta}\mathbf{x}_{i} + \sum_{j \in \mathcal{N}(i)} \alpha_{i,j}\mathbf{\Theta}\mathbf{x}_{j}, xi=αi,iΘxi+jN(i)αi,jΘxj,
其中,注意力系数系数:
α i , j = exp ⁡ ( L e a k y R e L U ( a ⊤ [ Θ x i   ∥   Θ x j ] ) ) ∑ k ∈ N ( i ) ∪ { i } exp ⁡ ( L e a k y R e L U ( a ⊤ [ Θ x i   ∥   Θ x k ] ) ) . \alpha_{i,j} = \frac{ \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_j] \right)\right)} {\sum_{k \in \mathcal{N}(i) \cup \{ i \}} \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_k] \right)\right)}. αi,j=kN(i){i}exp(LeakyReLU(a[ΘxiΘxk]))exp(LeakyReLU(a[ΘxiΘxj])).
2)基于PyG库的GATConv实现
PyG库中同样提供了GATConv层接口。实现地实现GARTConv,代码如下:

import  torch
import torch.nn.functional as F
from torch_geometric.datasets import Planetoid
from torch_geometric.transforms import NormalizeFeatures
from torch_geometric.nn import GCNConv
from torch_geometric.nn import GATConv

class MyGATNet(torch.nn.Module):
    def __init__(self, datasets, hidden_channels):
        super(MyGATNet, self).__init__()
        self.gat1 = GATConv(datasets.num_features, hidden_channels)
        self.gat2 = GATConv(hidden_channels, datasets.num_classes)

    def forward(self, x, edge_index):
        x= self.gat1(x, edge_index)
        x= x.relu()
        x= F.dropout(x, p=0.5, training= self.training)
        x= self.gat2(x, edge_index)
        return x


def GCNTrain():
    gat_model.train()
    optimizer.zero_grad()
    out = gat_model( data.x, data.edge_index)
    loss = criterion( out[data.train_mask], data.y[data.train_mask])
    loss.backward()
    optimizer.step()
    return loss

def GCNTest():
    gat_model.eval()
    out = gat_model(data.x, data.edge_index)
    pred = out.argmax(dim= 1)
    test_cor = pred[ data.test_mask] == data.y[data.test_mask]
    #test_acc = test_cor.sum() / data.test_mask.sum()
    test_acc = int(test_cor.sum()) /    int(data.test_mask.sum())
    return test_acc

if __name__ =='__main__':
    torch.manual_seed(2020)
    # 下载数据集
    dataset = Planetoid(root= '../dataset/Planetoid', name='Cora', transform=NormalizeFeatures() )
    # 部分图的参数
    data = dataset[0]
    #建立模型
    gat_model = MyGATNet(dataset, hidden_channels=16)
    print(gat_model)
    criterion = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam( gat_model.parameters(), lr=0.01, weight_decay= 5e-4)

    # 喂入数据训练
    for epoch in range(1, 201):
        loss = GCNTrain()
        print(f'Epoch: {epoch:03d}, Loss:{loss:.4f}')
    #测试分类精度
    test_acc = GCNTrain()
    print(f'Test Acc: {test_acc:.4f}')

参考资料

[1]. DataWhale Task03 :基于图神经网络的节点表征学习.
[2] “Semi-supervised Classification with Graph Convolutional Network
[3] Graph Attention Networks

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值