文章目录
往期文章链接目录
Note
This is the second post of the Graph Neural Networks (GNNs) series.
Convolutional graph neural networks (ConvGNNs)
Convolutional graph neural networks (ConvGNNs) generalize the operation of convolution from grid data to graph data. The main idea is to generate a node v v v’s representation by
aggregating its own features x v \mathbf{x}_{v} xv and neighbors’ features x u \mathbf{x}_{u} xu, where u ∈ N ( v ) u \in N(v) u∈N(v). Different from RecGNNs, ConvGNNs stack fixed number of multiple graph convolutional layers with different weights to extract high-level node representations.
ConvGNNs fall into two categories:
-
spatial-based GCN: Spatial-based approaches inherit ideas from RecGNNs to define graph convolutions by information propagation.
-
spectral-based GCN: Spectral based approaches define graph convolutions by introducing filters from the perspective of graph signal processing where the graph convolutional operation is interpreted as removing noises from graph signals.
Spatial-based methods have developed rapidly recently due to its attractive efficiency, flexibility, and generality. In this post, we mainly focus on spatial-based GCN and leave spectral-based GCN to the next post. Let’s get started.
GCN Framework

As shown in the figure above, the input of GCN is the entire graph. In each convolution layer, a convolution operation is performed on the neighbors of each node, and the center node representation is updated with the result of the convolution. Then an activation function such as ReLU is used before going through the next layer of convolution layer. The above process continues until the number of layers reaches the expected depth (a hyper-parameter).
GCN v.s. RecGNN
The main difference between GCN and RecGNN is that each convolutional layer of GCN has unique weights, and, on the other hand, in RecGNN the weights of each layer are shared.
What is Convolution
In mathematics, convolution is a mathematical operation on two functions f f f and g g g that produces a third function ( f ∗ g ) (f∗g) (f∗g) expressing how the shape of one is modified by the other.
The term convolution is defined as the integral of the product of the two functions after one is reversed and shifted. The mathematical definition is the following:
( f ∗ g ) ( t ) = ∫ − ∞ ∞ f ( τ ) g ( t − τ ) ( continuous ) ( f ∗ g ) ( t ) = ∑ τ = − ∞ ∞ f ( τ ) g ( t − τ ) ( discrete ) \begin{array}{c} (f * g)(t)=\int_{-\infty}^{\infty} f(\tau) g(t-\tau) \quad (\text {continuous}) \\ (f * g)(t)=\sum_{\tau=-\infty}^{\infty} f(\tau) g(t-\tau) \quad(\text {discrete}) \end{array} (f∗g)(t)=∫−∞∞f(τ)g(t−τ)(continuous)(f∗g)(t)=∑τ=−∞∞f(τ)g(t−τ)(discrete)
The convolution formula can be described as a weighted average of the function f ( τ ) f(\tau) f(τ) at the moment t t t where the weighting is given by g ( − τ ) g(-\tau) g(−τ) simply shifted by amount t t t. As t t t changes, the weighting function emphasizes different parts of the input function.

As the figure shown above, the filter is moved over by one pixel and this process is repeated until all of the possible locations in the image are filtered. At each step, the convolution takes the weighted average of pixel values of the center pixel along with its neighbors. Since the center pixel is changing at each time step, the convolution is emphasizing on different parts of the image.
Spatial-based ConvGNNs

Analogous to the convolutional operation of a conventional CNN on an image, spatial-based methods define graph convolutions based on a node’s spatial relations.
Images can be considered as a special form of graph with each pixel representing a node. Each pixel is directly connected to its nearby pixels, as illustrated in the figure above (left). A filter is applied to a 3 × 3 3 \times 3 3×3 patch by taking the weighted aver