CS224W: Machine Learning with Graphs - 06 Graph Neural Networks (GNN) 1: GNN Model

博客探讨了传统浅层嵌入方法的局限性,如每个节点需要独立参数,无法处理未见过的节点,且不利用节点特征。接着介绍了深度图编码器的概念,通过多层非线性变换基于图结构进行节点表示学习。内容涵盖了现代深度学习工具箱对复杂网络处理的挑战,以及深度学习在图上的应用。详细阐述了图卷积网络(GCN)的工作原理,包括消息传递、矩阵形式的更新规则以及训练方法。此外,还讨论了GNN的归纳能力,即使用相同的聚合参数对所有节点进行共享,从而能够泛化到新节点。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

GNN Model

0. Limitations of shallow embedding methods

  • O ( ∣ V ∣ ) O(|V|) O(V) parameters are needed: no sharing of parameters between nodes so every node has its own unique embedding
  • Inherently “transductive”: cannot generate embeddings for nodes not seen during training
  • Do not incorporate node features: features should be leveraged

1. Deep Graph Encoders

0). Deep Methods based on GNN

E N C ( v ) = ENC(v)= ENC(v)= multiple layers of non-linear transformations based on graph structure
Note: all deep encodes can be combined with node similarity functions

1). Modern ML Toolbox

Modern deep learning toolbox is designed for simple sequences and grids. But networks are far more complex

  • Arbitrary size and complex topological structure (i.e., no spatial locality like grids)
  • No fixed node ordering or reference point
  • Often dynamic and have multimodal features

2. Basics of Deep Learning

To be updated

3. Deep Learning for Graphs

1). A Naive Approach

Join adjacency matrix and features then feed them into a deep neural network
Issues:

  • O ( ∣ V ∣ ) O(|V|) O(V) parameters
  • Not applicable to graph of different sizes
  • Sensitive to node ording
2). Convolutional Networks
a). From images to graphs

Goal: generalize convolutions beyond simple lattices and leverage node features/ attributes
Problem:

  • There is no fixed notion of locality or sliding window on the graph
  • Graph is permutation invariant

Idea: transform information at the neighbors and combine it:

  • Transform “message” h i h_i hi from neighbors: W i h i W_ih_i Wihi
  • Add them up: ∑ i W i h i \sum_i W_ih_i iWihi
b). Graph convolutional networks

Idea: node’s neighborhood defines a computation graph (determine node computation graph; propagate and transform information)
Basic approach: average information from neighbors and apply a neural network
h v 0 = x v h_v^0=x_v hv0=xv
h v l + 1 = σ ( W l ∑ u ∈ N ( v ) h u l ∣ N ( v ) ∣ + B l h v l ) , ∀ l ∈ { 0 , . . . , L − 1 } h_v^{l+1}=\sigma(W_l\sum_{u\in N(v)} \dfrac{h_u^l}{|N(v)|}+B_lh_v^l), \forall l\in \{0,...,L-1\} hvl+1=σ(WluN(v)N(v)hul+Blhvl),l{0,...,L1}
z v = h v L z_v=h_v^L zv=hvL
where

  • h v l h_v^l hvl: hidden representation of node v v v at layer l l l
  • W l W_l Wl: weight matrix for neighborhood aggregation
  • B l B_l Bl: weight matrix for transforming hidden vector of self
c). Matrix formulation

Many aggregations can be performed efficiently by (sparse) matrix operations
Let H l = [ h 1 l ⋯ h ∣ V ∣ l ] T H^l=[h_1^l \cdots h_{|V|}^l]^T Hl=[h1lhVl]T, then ∑ u ∈ N ( v ) h u l = A v H l \sum_{u\in N(v)}h_u^l=A_vH^l uN(v)hul=AvHl
Let D D D be diagonal matrix where D v v = D e g ( v ) = ∣ N ( v ) ∣ D_{vv} = Deg(v)=|N(v)| Dvv=Deg(v)=N(v) then D v v − 1 = 1 / ∣ N ( v ) ∣ D_{vv}^{-1} = 1/|N(v)| Dvv1=1/N(v)
Rewriting update function in matrix form
H l + 1 = σ ( A ~ H l W l T + H l B l T ) H^{l+1}=\sigma (\tilde AH^lW_l^T+H^lB_l^T) Hl+1=σ(A~HlWlT+HlBlT)
where A ~ = D − 1 A \tilde A=D^{-1}A A~=D1A
This implies that efficient sparse matrix multiplication can be used ( A ~ \tilde A A~ is sparse)

d). How to train a GNN
  • Node embedding z v z_v zv is a function of input graph
  • Supervised setting: minimize the loss L L L
    min ⁡ θ L ( y , f ( z v ) ) \min_\theta L(y, f(z_v)) θminL(y,f(zv))
    Example: node classification
    L = − ∑ v ∈ V y v log ⁡ ( σ ( z v T ) + ( 1 − y v ) log ⁡ ( 1 − σ ( z v T ) ) L=-\sum_{v\in V}y_v\log(\sigma(z_v^T)+(1-y_v)\log(1-\sigma(z_v^T)) L=vVyvlog(σ(zvT)+(1yv)log(1σ(zvT))
  • Unsupervised setting: No node label available so use the graph structure as the supervision.
    Similar nodes have similar embeddings
    L = ∑ z u , z v CrossEntropy ( y u v , DEC ( z u , z v ) ) L=\sum_{z_u,z_v}\text{CrossEntropy}(y_{uv}, \text{DEC}(z_u,z_v)) L=zu,zvCrossEntropy(yuv,DEC(zu,zv))
    where y u v = 1 y_{uv}=1 yuv=1 when node u u u and v v v are similar and DEC is the decoder (e.g., inner product)
    Node similarity can be anything such as random walks (node2vec, DeepWalk, struc2vec), matrix factorization, and node proximity in the graph
e). Model design: overview
  1. Define a neighborhood aggregation function
  2. Define a loss function on the embeddings
  3. Train a set of nodes
  4. Generate embeddings for nodes as needed
f). Inductive Capability

The same aggregation parameters ( W l W_l Wl and B l B_l Bl) are shared for all nodes: the number of model parameters is sublinear in ∣ V ∣ |V| V and we can generalize to unseen nodes (new graphs or new nodes).

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值