[论文精读]Towards Deeper Graph Neural Networks

论文网址:Towards Deeper Graph Neural Networks | Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 省流版

1.1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Background and related works

2.3.1. Graph convolution operations

2.3.2. Related works

2.4. Empirical and theoretical analysis of deep GNNs

2.4.1. Quantitative metric for smoothness

2.4.2. Why deeper GNNs fail?

2.4.3. Theoretical analysis of very deep models

2.5. Deep adaptive graph neural network

2.6. Experimental strudies

2.6.1. Datasets and setup

2.6.2. Overall results

2.6.3. Training set sizes

2.6.4. Model Depths

2.7. Conclusion

3. Reference


1. 省流版

1.1. 心得

(1)有点难评,我觉得是比较简单的东西诶

2. 论文逐段精读

2.1. Abstract

        ①They proposed a Deep Adaptive Graph Neural Network (DAGNN) to solve the over smoothing problem when the GNN is too deep

2.2. Introduction

        ①DAGNN adaptively integrates information and learns node representations

2.3. Background and related works

        ①Graph representation: G=\left ( V, E \right )

        ②V \in \mathbb{R}^{n}n=\left | V \right |E \in \mathbb{R}^{V \times V}m=\left | E \right |

        ③Adjacency matrix A \in \mathbb{R}^{n \times n} with no weight (0 and 1 only)

        ④Degree matrix D\in\mathbb{R}^{n\times n} and D_{(i,i)}=\sum_{j}A_{(i,j)}

        ⑤Neighbor nodes are represented by N_i

        ⑥Node feature matrix X\in\mathbb{R}^{n\times d}

2.3.1. Graph convolution operations

        ①Traditional message passing:

\begin{aligned}&a_{i}^{(\ell)}=\mathrm{PROPAGATION}^{(\ell)}\left(\left\{\mathbf{x}_{i}^{(\ell-1)},\{\mathbf{x}_{j}^{(\ell-1)}|j\in\mathcal{N}_{i}\}\right\}\right)\\&x_{i}^{(\ell)}=\mathrm{TRANSFORMATION}^{(\ell)}\left(a_{i}^{(\ell)}\right).\end{aligned}

        ②Convolution operator:

X^{(\ell)}=\sigma\left(\widehat{A}X^{(\ell-1)}W^{(\ell)}\right)

where {\widehat A}={\widetilde D}^{-{\frac{1}{2}}}{\widetilde A}{\widetilde D}^{-{\frac{1}{2}}}\tilde{A}=A+I

2.3.2. Related works

        ①Stacking multi layers might brings indistinguishable problem of nodes in different classes

        ②Listing some remote capturing models

 demystify  vt. 使非神秘化;阐明;启发

2.4. Empirical and theoretical analysis of deep GNNs

2.4.1. Quantitative metric for smoothness

        ①Similarity metric (Euclidean distance) between node i and node j:

D(x_{i},x_{j})=\frac{1}{2}\left\|\frac{x_{i}}{\|x_{i}\|}-\frac{x_{j}}{\|x_{j}\|}\right\|

where \left \|\cdot \right \| denotes Euclidean norm

        ②Smoothness metric:

SMV_i=\frac{1}{n-1}\sum_{j\in V,j\neq i}D(x_i,x_j)

为什么作者认为叶子节点有更大的平滑度度量值?这个n不是节点总数吗,叶子节点不是邻居更少吗?那应该平滑度度量值更小吧?还是说这个n代表邻居个数?我认为不太是后者

        ③Smoothness metric of the whole graph G:

SMV_{G}=\frac{1}{n}\sum_{i\in V}SMV_{i}

 periphery  n. 外围,边缘;圆周;圆柱体表面

2.4.2. Why deeper GNNs fail?

        ①Datasets: Cora, CiteSeer and PubMed

        ②t-SNE visualization on different layers on Cora:

        ③Accuracy on Cora with different layers:

        ④Over smoothing problems mostly exist in sparse graph

        ⑤Decoupled transformation formula:

Z=\mathrm{MLP}\left(X\right)\\X_{out}=\mathrm{softmax}\left(\widehat{A}^{k}Z\right)

        ⑥Statistics of datasets, where the edge density is calculated by \frac{2m}{n^2}:

        ⑦Deeper layer on Cora:

        ⑧Accuracy and smoothness on Cora:

2.4.3. Theoretical analysis of very deep models

        ①Two propagation mechanisms:

\hat{A}_{\oplus}=\tilde{D}^{-1}\tilde{A}

mostly in GraphSAGE and DGCNN and 

\widehat{A}_{\odot}=\widetilde{D}^{-\frac{1}{2}}\widetilde{A}\widetilde{D}^{-\frac{1}{2}}

in GCN

        ②They define \Psi(x)=\frac{x}{\mathrm{sum}(x)}\Phi(x)=\frac{x}{\|x\|}, and e=[1,1,\cdots,1]\in\mathbb{R}^{1\times n}

        ③They introduce 2 theorem to prove the convergence:

        ④剩余定理和证明略

2.5. Deep adaptive graph neural network

        ①Steps of DAGNN:

\begin{aligned}&Z=\mathrm{MLP}\left(X\right)&&\in\mathbb{R}^{n\times c}\\&H_{\ell}=\widehat{A}^{\ell}Z,\ell=1,2,\cdots,k&&\in\mathbb{R}^{n\times c}\\&H=\mathrm{stack}\left(Z,H_{1},\cdots,H_{k}\right)&&\in\mathbb{R}^{n\times(k+1)\times c}\\&S=\sigma\left(Hs\right)&&\in\mathbb{R}^{n\times(k+1)\times1}\\&\widetilde{S}=\mathrm{reshape}\left(S\right)&&\in\mathbb{R}^{n\times1\times(k+1)}\\&X_{out}=\mathrm{softmax}\left(\mathrm{squeeze}\left(\widetilde{S}H\right)\right)&&\in\mathbb{R}^{n\times c},\end{aligned}

where c is the number of node classes, Z \in \mathbb{R}^{n \times c} denotes feature matrix, s\in\mathbb{R}^{c\times1} denotes trainable projection vector, they set \sigma to Sigmoid, k is the layer of model

        ②It's hard to define the hop number, so they designed adaptive projection vector

        ③⭐There is no fully connected layer in the last layer but only use X_{out}

        ④The loss function can be:

L=-\sum_{i\in V_L}\sum_{p=1}^cY_{[i,p]}\ln X_{out[i,p]}

where V_L denotes the set of labeled nodes, Y\in\mathbb{R}^{n\times c} stores the real label

        ⑤Workflow of DAGNN:

(说句实话,为什么我觉得这玩意儿这么抽象,毕竟感觉每一层都加入了前面所有层,照这样来说对于越深的层就给越低的权重就好了....因为过于平滑....但这样也聊胜于无嘛)

2.6. Experimental strudies

2.6.1. Datasets and setup

        ①Datasets: Cora,CiteSeer, PubMed, Coauthor CS, Coauthor Physics, Amazon Computers, Amazon Photo:

        ②Baselines: Logistic Regression (LogReg), Multilayer Perceptron (MLP), Label Propagation (LabelProp), ormalized Laplacian Label Propagation (LabelProp NL), ChebNet, Graph Convolutional Network (GCN), Graph Attention Network(GAT), Mixture Model Network (MoNet), GraphSAGE, APPNP, SGC

        ③Grid search for hyperparameters: k=\left \{ 5,10,20 \right \}weight\, decay \in \left \{ 0,2e-2,5e-3,5e-4,5e-5 \right \}dropout\, rate= \left \{ 0.5,0.8 \right \}

2.6.2. Overall results

        ①Uniform setting on Cora, CiteSeer and PubMed: 20 labeled nodes/class for training, 500 nodes for val and 1000 to test. On co-authorship and co-purchase: 20 labeled/class for training, 30 nodes/class for val and the rest is for test

        ②Epoch: 100 for fixed and random split respectively

        ③Performance comparison table on Cora, CiteSeer and PubMed:

        ④Performance comparison table on co-authorship and co-purchase:

2.6.3. Training set sizes

        ①For comparing the influence of depth, they set the layer in APPNP, SGC and DAGNN to 10 and test them in a relatively large dataset Cora:

2.6.4. Model Depths

        ①To test how depths/hops influence smoothness:

        ②It should be fewer hops when facing dense graph

2.7. Conclusion

        我其实没太get到作者一直强调的解耦,哪里解耦了?感觉不就像是一个注意力机制或者打分机制啥的

3. Reference

Liu, M., Gao,H. & Ji, S. (2020) 'Towards Deeper Graph Neural Networks', KDD. pp. 338-348. doi: Towards Deeper Graph Neural Networks | Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值