CS224W: Machine Learning with Graphs - 06 Graph Neural Networks (GNN) 1: GNN Model

本文链接：https://blog.youkuaiyun.com/fxb163/article/details/120962191

博客探讨了传统浅层嵌入方法的局限性，如每个节点需要独立参数，无法处理未见过的节点，且不利用节点特征。接着介绍了深度图编码器的概念，通过多层非线性变换基于图结构进行节点表示学习。内容涵盖了现代深度学习工具箱对复杂网络处理的挑战，以及深度学习在图上的应用。详细阐述了图卷积网络（GCN）的工作原理，包括消息传递、矩阵形式的更新规则以及训练方法。此外，还讨论了GNN的归纳能力，即使用相同的聚合参数对所有节点进行共享，从而能够泛化到新节点。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

GNN Model

0. Limitations of shallow embedding methods

$O (∣ V ∣)$ parameters are needed: no sharing of parameters between nodes so every node has its own unique embedding
Inherently “transductive”: cannot generate embeddings for nodes not seen during training
Do not incorporate node features: features should be leveraged

1. Deep Graph Encoders

0). Deep Methods based on GNN

$E N C (v) =$ multiple layers of non-linear transformations based on graph structure
Note: all deep encodes can be combined with node similarity functions

1). Modern ML Toolbox

Modern deep learning toolbox is designed for simple sequences and grids. But networks are far more complex

Arbitrary size and complex topological structure (i.e., no spatial locality like grids)
No fixed node ordering or reference point
Often dynamic and have multimodal features

2. Basics of Deep Learning

To be updated

3. Deep Learning for Graphs

1). A Naive Approach

Join adjacency matrix and features then feed them into a deep neural network
Issues:

$O (∣ V ∣)$ parameters
Not applicable to graph of different sizes
Sensitive to node ording

2). Convolutional Networks

a). From images to graphs

Goal: generalize convolutions beyond simple lattices and leverage node features/ attributes
Problem:

There is no fixed notion of locality or sliding window on the graph
Graph is permutation invariant

Idea: transform information at the neighbors and combine it:

Transform “message” $h_i$ from neighbors: $W_ih_i$
Add them up: $\sum_i W_ih_i$

b). Graph convolutional networks

Idea: node’s neighborhood defines a computation graph (determine node computation graph; propagate and transform information)
Basic approach: average information from neighbors and apply a neural network
$h_v^0=x_v$
$h_v^{l+1}=\sigma(W_l\sum_{u\in N(v)} \dfrac{h_u^l}{|N(v)|}+B_lh_v^l), \forall l\in \{0,...,L-1\}$
$z_v=h_v^L$
where

$h_v^l$ : hidden representation of node $v$ at layer $l$
$W_l$ : weight matrix for neighborhood aggregation
$B_l$ : weight matrix for transforming hidden vector of self

c). Matrix formulation

Many aggregations can be performed efficiently by (sparse) matrix operations
Let $H^l=[h_1^l \cdots h_{|V|}^l]^T$ , then $\sum_{u\in N(v)}h_u^l=A_vH^l$
Let $D$ be diagonal matrix where $D_{vv} = Deg(v)=|N(v)|$ then $D_{vv}^{-1} = 1/|N(v)|$
Rewriting update function in matrix form
$H^{l+1}=\sigma (\tilde AH^lW_l^T+H^lB_l^T)$
where $\tilde A=D^{-1}A$
This implies that efficient sparse matrix multiplication can be used ( $\tilde A$ is sparse)

d). How to train a GNN

Node embedding $z_v$ is a function of input graph
Supervised setting: minimize the loss $L$
$\min_\theta L(y, f(z_v))$
Example: node classification
$L=-\sum_{v\in V}y_v\log(\sigma(z_v^T)+(1-y_v)\log(1-\sigma(z_v^T))$
Unsupervised setting: No node label available so use the graph structure as the supervision.
Similar nodes have similar embeddings
$L=\sum_{z_u,z_v}\text{CrossEntropy}(y_{uv}, \text{DEC}(z_u,z_v))$
where $y_{uv}=1$ when node $u$ and $v$ are similar and DEC is the decoder (e.g., inner product)
Node similarity can be anything such as random walks (node2vec, DeepWalk, struc2vec), matrix factorization, and node proximity in the graph

e). Model design: overview

Define a neighborhood aggregation function
Define a loss function on the embeddings
Train a set of nodes
Generate embeddings for nodes as needed

f). Inductive Capability

The same aggregation parameters ( $W_l$ and $B_l$ ) are shared for all nodes: the number of model parameters is sublinear in $∣ V ∣$ and we can generalize to unseen nodes (new graphs or new nodes).