Contents
Variational graph auto-encoders (VGAE)
Graph Auto-Encoders (GAE)
Definitions
- 给定一个无向无权图 G = ( V , E ) \mathcal G=(\mathcal V,\mathcal E) G=(V,E), N = ∣ V ∣ N=|\mathcal V| N=∣V∣ 为顶点数, A ∈ R N × N \boldsymbol A\in\R^{N\times N} A∈RN×N 为邻接矩阵 (对角线元素为 1), X ∈ R N × D \boldsymbol X\in\R^{N\times D} X∈RN×D 为结点的特征向量
Graph Auto-Encoders
- GAE 中的 Encoder 为 GCN,它负责由邻接矩阵和结点特征编码得到每个结点的 embedding 向量 z i z_i zi ( i = 1 , . . . , N i=1,...,N i=1,...,N),它们构成了结点 embedding 矩阵 Z ∈ R N × F \boldsymbol Z\in\R^{N\times F} Z∈RN×F
- GAE 中的 Decoder 为一个简单的 inner product decoder,它负责由结点的 embedding 向量
Z
\boldsymbol Z
Z 来重构归一化的邻接矩阵
A
^
\hat \boldsymbol A
A^。它通过计算
σ
(
z
i
T
z
j
)
\sigma(z_i^Tz_j)
σ(ziTzj) 来决定
A
^
i
j
\hat \boldsymbol A_{ij}
A^ij
A framework for unsupervised learning on graph-structured data
- GAE 引入自编码器来处理图数据,可以基于图数据进行无监督学习
Variational graph auto-encoders (VGAE)
- VGAE 在 GAE 的基础上进一步引入了 变分自编码器 (VAE) 的思想,对 latent space 施加正则化来保证一个 regular latent space
VGAE
- VGAE 假设先验概率服从标准正态分布
p ( Z ) = ∏ i = 1 N p ( z i ) = ∏ i = 1 N N ( z i ∣ 0 , I ) p(\boldsymbol Z)=\prod_{i=1}^N p(z_i)=\prod_{i=1}^N \mathcal N(z_i|0,\boldsymbol I) p(Z)=i=1∏Np(zi)=i=1∏NN(zi∣0,I)但论文中也提到,“A Gaussian prior is potentially a poor choice in combination with an inner product decoder, as the latter tries to push embeddings away from the zero-center (see Figure 1).” - 似然由点积模型得到
- 后验概率由变分推理近似得到,概率分布族为协方差矩阵为对角矩阵的正态分布
其中, μ = GCN μ ( X , A ) \mu=\text{GCN}_\mu(\boldsymbol X,\boldsymbol A) μ=GCNμ(X,A) 为 GCN 输出的后验概率分布均值向量, log σ = GCN σ ( X , A ) \log\sigma=\text{GCN}_\sigma(\boldsymbol X,\boldsymbol A) logσ=GCNσ(X,A) 为 GCN 输出的后验概率分布标准差的对数。 GCN \text{GCN} GCN 为一个简单的 2 层 GCN,可以被表示为 GCN ( X , A ) = A ~ RELU ( A ~ X W 0 ) W 1 \text{GCN}(\boldsymbol X,\boldsymbol A)=\tilde \boldsymbol A\text{RELU}(\tilde \boldsymbol A\boldsymbol X\boldsymbol W_0)\boldsymbol W_1 GCN(X,A)=A~RELU(A~XW0)W1,其中 W i \boldsymbol W_i Wi 为 MLP 权重矩阵, A ~ = D − 1 2 A D − 1 2 \tilde \boldsymbol A=\boldsymbol D^{-\frac{1}{2}}\boldsymbol A\boldsymbol D^{-\frac{1}{2}} A~=D−21AD−21 为归一化的邻接矩阵, D \boldsymbol D D 为度矩阵 (一个对角矩阵,对角元素为各个顶点的度数),左乘 D − 1 2 \boldsymbol D^{-\frac{1}{2}} D−21 会使得 A \boldsymbol A A 的第 i i i 行除以结点 i i i 度数的根号,右乘 D − 1 2 \boldsymbol D^{-\frac{1}{2}} D−21 会使得 A \boldsymbol A A 的第 i i i 列除以结点 i i i 度数的根号,因此 A ~ i j = A i j / ( D i i D j j ) \tilde\boldsymbol A_{ij}=\boldsymbol A_{ij}/(\sqrt{\boldsymbol D_{ii}\boldsymbol D_{jj}}) A~ij=Aij/(DiiDjj),相当于是给邻接矩阵根据度数做了一个归一化。 A ~ X = [ a ~ 1 T X . . . a ~ N T X ] \tilde \boldsymbol A\boldsymbol X=\begin{bmatrix} \tilde a_1^T\boldsymbol X\\...\\\tilde a_N^T\boldsymbol X\end{bmatrix} A~X=⎣⎡a~1TX...a~NTX⎦⎤ 是在进行结点的信息聚合。 GCN μ ( X , A ) \text{GCN}_\mu(\boldsymbol X,\boldsymbol A) GCNμ(X,A) 和 GCN σ ( X , A ) \text{GCN}_\sigma(\boldsymbol X,\boldsymbol A) GCNσ(X,A) 共享第一层的权重 W 0 \boldsymbol W_0 W0
- 由变分推理得到的优化问题为最大化下式:
VGAE 的损失函数
- 在
L
\mathcal L
L 中,期望似然可以通过蒙特卡洛法采样来近似得到
∑ i = 1 N ∑ j = 1 N [ δ A i , j = 1 log σ ( z i T z j ) + δ A i , j = 0 log [ 1 − σ ( z i T z j ) ] ] \sum_{i=1}^N\sum_{j=1}^N\left[\delta_{\boldsymbol A_{i,j}=1}\log\sigma(z_i^Tz_j)+\delta_{\boldsymbol A_{i,j}=0}\log\left[1-\sigma(z_i^Tz_j)\right]\right] i=1∑Nj=1∑N[δAi,j=1logσ(ziTzj)+δAi,j=0log[1−σ(ziTzj)]] - 正则项为
KL [ q ( Z ∣ X , A ) ∣ ∣ p ( Z ) ] = E q ( Z ∣ X , A ) [ log q ( Z ∣ X , A ) − log p ( Z ) ] = ∑ i = 1 N E q ( Z ∣ X , A ) [ log q ( z i ∣ X , A ) − log p ( z i ) ] = ∑ i = 1 N E q ( z i ∣ X , A ) [ log q ( z i ∣ X , A ) − log p ( z i ) ] = ∑ i = 1 N KL [ q ( z i ∣ X , A ) ∣ ∣ p ( z i ) ] = ∑ i = 1 N KL [ N ( z i ∣ μ i , diag ( σ i 2 ) ) ∣ ∣ N ( z i ∣ 0 , I ) ] \begin{aligned} \text{KL}[q(\boldsymbol Z|\boldsymbol X,\boldsymbol A)||p(\boldsymbol Z)] &=\mathbb E_{q(\boldsymbol Z|\boldsymbol X,\boldsymbol A)}[\log q(\boldsymbol Z|\boldsymbol X,\boldsymbol A)-\log p(\boldsymbol Z)] \\&=\sum_{i=1}^N\mathbb E_{q(\boldsymbol Z|\boldsymbol X,\boldsymbol A)}\left[\log q(z_i|\boldsymbol X,\boldsymbol A)-\log p(z_i)\right] \\&=\sum_{i=1}^N\mathbb E_{q(z_i|\boldsymbol X,\boldsymbol A)}\left[\log q(z_i|\boldsymbol X,\boldsymbol A)-\log p(z_i)\right] \\&=\sum_{i=1}^N \text{KL}[q(z_i|\boldsymbol X,\boldsymbol A)||p(z_i)] \\&=\sum_{i=1}^N \text{KL}[\mathcal N(z_i|\mu_i,\text{diag}(\sigma_i^2))||\mathcal N(z_i|0,\boldsymbol I)] \end{aligned} KL[q(Z∣X,A)∣∣p(Z)]=Eq(Z∣X,A)[logq(Z∣X,A)−logp(Z)]=i=1∑NEq(Z∣X,A)[logq(zi∣X,A)−logp(zi)]=i=1∑NEq(zi∣X,A)[logq(zi∣X,A)−logp(zi)]=i=1∑NKL[q(zi∣X,A)∣∣p(zi)]=i=1∑NKL[N(zi∣μi,diag(σi2))∣∣N(zi∣0,I)] - 两个多维高斯分布之间的 KL 散度:设
p ( x ) = 1 ( 2 π ) n 2 ∣ Σ ∣ 1 2 exp ( − 1 2 ( x − μ ) ⊤ Σ − 1 ( x − μ ) ) q ( x ) = 1 ( 2 π ) n 2 ∣ L ∣ 1 2 exp ( − 1 2 ( x − m ) ⊤ L − 1 ( x − m ) ) p(x)=\frac{1}{(2 \pi)^{\frac{n}{2}}|\Sigma|^{\frac{1}{2}}} \exp \left(-\frac{1}{2}(x-\mu)^{\top} \Sigma^{-1}(x-\mu)\right)\\ q(x)=\frac{1}{(2 \pi)^{\frac{n}{2}}|L|^{\frac{1}{2}}} \exp \left(-\frac{1}{2}(x-m)^{\top} L^{-1}(x-m)\right) p(x)=(2π)2n∣Σ∣211exp(−21(x−μ)⊤Σ−1(x−μ))q(x)=(2π)2n∣L∣211exp(−21(x−m)⊤L−1(x−m))下面推导它们之间的 KL 散度:
K L ( p ∥ q ) = E p [ log p ( x ) q ( x ) ] K L(p \| q)=\mathbb E_{p}\left[\log \frac{p(x)}{q(x)}\right] KL(p∥q)=Ep[logq(x)p(x)] p ( x ) q ( x ) = 1 ( 2 π ) n 2 ∣ Σ ∣ 1 2 exp ( − 1 2 ( x − μ ) ⊤ Σ − 1 ( x − μ ) ) 1 ( 2 π ) n 2 ∣ L ∣ 1 2 exp ( − 1 2 ( x − m ) ⊤ L − 1 ( x − m ) ) = ( ∣ L ∣ ∣ Σ ∣ ) 1 2 exp ( 1 2 ( x − m ) ⊤ L − 1 ( x − m ) − 1 2 ( x − μ ) ⊤ Σ − 1 ( x − μ ) ) \begin{aligned} \frac{p(x)}{q(x)} &=\frac{\frac{1}{(2 \pi)^{\frac{n}{2}}|\Sigma|^{\frac{1}{2}}} \exp \left(-\frac{1}{2}(x-\mu)^{\top} \Sigma^{-1}(x-\mu)\right)}{\frac{1}{(2 \pi)^{\frac{n}{2}}|L|^{\frac{1}{2}}} \exp \left(-\frac{1}{2}(x-m)^{\top} L^{-1}(x-m)\right)} \\ &=\left(\frac{|L|}{|\Sigma|}\right)^{\frac{1}{2}} \exp \left(\frac{1}{2}(x-m)^{\top} L^{-1}(x-m)-\frac{1}{2}(x-\mu)^{\top} \Sigma^{-1}(x-\mu)\right) \end{aligned} q(x)p(x)=(2π)2n∣L∣211exp(−21(x−m)⊤L−1(x−m))(2π)2n∣Σ∣211exp(−21(x−μ)⊤Σ−1(x−μ))=(∣Σ∣∣L∣)21exp(21(x−m)⊤L−1(x−m)−21(x−μ)⊤Σ−1(x−μ)) log p ( x ) q ( x ) = 1 2 log ∣ L ∣ ∣ Σ ∣ + 1 2 ( x − m ) ⊤ L − 1 ( x − m ) − 1 2 ( x − μ ) ⊤ Σ − 1 ( x − μ ) \begin{aligned} \log \frac{p(x)}{q(x)}=\frac{1}{2} \log \frac{|L|}{|\Sigma|}+\frac{1}{2}(x-m)^{\top} L^{-1}(x-m)-\frac{1}{2}(x-\mu)^{\top} \Sigma^{-1}(x-\mu) \end{aligned} logq(x)p(x)=21log∣Σ∣∣L∣+21(x−m)⊤L−1(x−m)−21(x−μ)⊤Σ−1(x−μ) E p [ log p ( x ) q ( x ) ] = 1 2 log ∣ L ∣ ∣ Σ ∣ + 1 2 E p [ ( x − m ) ⊤ L − 1 ( x − m ) − ( x − μ ) ⊤ Σ − 1 ( x − μ ) ] = 1 2 log ∣ L ∣ ∣ Σ ∣ + 1 2 E p [ t r ( ( x − m ) ⊤ L − 1 ( x − m ) ) ] − 1 2 E p [ t r ( ( x − μ ) ⊤ Σ − 1 ( x − μ ) ) ] = 1 2 log ∣ L ∣ ∣ Σ ∣ + 1 2 E p [ t r ( L − 1 ( x − m ) ( x − m ) ⊤ ) ] − 1 2 E p [ t r ( Σ − 1 ( x − μ ) ( x − μ ) ⊤ ) ] ( t r ( A B ) = t r ( B A ) ) = 1 2 log ∣ L ∣ ∣ Σ ∣ + 1 2 t r [ E p ( L − 1 ( x − m ) ( x − m ) ⊤ ) ] − 1 2 t r [ E p ( Σ − 1 ( x − μ ) ( x − μ ) ⊤ ) ] ( t r [ E x ( f ( x ) ) ] = E x [ t r ( f ( x ) ) ] ) = 1 2 log ∣ L ∣ ∣ Σ ∣ + 1 2 t r [ L − 1 E p ( x x ⊤ − m x ⊤ − x m ⊤ + m m ⊤ ) ] − 1 2 t r [ Σ − 1 E p ( ( x − μ ) ( x − μ ) ⊤ ) ] = 1 2 log ∣ L ∣ ∣ Σ ∣ + 1 2 t r [ L − 1 E p ( x x ⊤ − m x ⊤ − x m ⊤ + m m ⊤ ) ] − 1 2 t r [ Σ − 1 Σ ] = 1 2 log ∣ L ∣ ∣ Σ ∣ + 1 2 t r [ L − 1 E p ( Σ + μ μ ⊤ − m x ⊤ − x m ⊤ + m m ⊤ ) ] − n 2 ( E p ( x x ⊤ ) = Σ + μ μ ⊤ ; 证 明 : Σ = E p [ ( x − μ ) ( x − μ ) ⊤ ] ) = 1 2 log ∣ L ∣ ∣ Σ ∣ + 1 2 t r [ L − 1 ( Σ + μ μ ⊤ − m μ ⊤ − μ m ⊤ + m m ⊤ ) ] − n 2 = 1 2 { log ∣ L ∣ ∣ Σ ∣ − n + t r ( L − 1 Σ ) + t r [ L − 1 ( μ μ ⊤ − m μ ⊤ − μ m ⊤ + m m ⊤ ) ] } = 1 2 { log ∣ L ∣ ∣ Σ ∣ − n + t r ( L − 1 Σ ) + t r [ μ ⊤ L − 1 μ − m ⊤ L − 1 μ − μ ⊤ L − 1 m + m ⊤ L − 1 m ] } = 1 2 { log ∣ L ∣ ∣ Σ ∣ − n + t r ( L − 1 Σ ) + ( μ ⊤ L − 1 μ − m ⊤ L − 1 μ − μ ⊤ L − 1 m + m ⊤ L − 1 m ) } = 1 2 { log ∣ L ∣ ∣ Σ ∣ − n + t r ( L − 1 Σ ) + ( μ − m ) ⊤ L − 1 ( μ − m ) } \begin{aligned} \mathbb E_{p}\left[\log \frac{p(x)}{q(x)}\right]&=\frac{1}{2} \log \frac{|L|}{|\Sigma|}+\frac{1}{2}\mathbb E_{p}\left[(x-m)^{\top} L^{-1}(x-m)-(x-\mu)^{\top} \Sigma^{-1}(x-\mu)\right] \\&=\frac{1}{2} \log \frac{|L|}{|\Sigma|}+\frac{1}{2}\mathbb E_{p}\left[tr((x-m)^{\top} L^{-1}(x-m))\right]-\frac{1}{2}\mathbb E_{p}\left[tr((x-\mu)^{\top} \Sigma^{-1}(x-\mu))\right] \\&=\frac{1}{2} \log \frac{|L|}{|\Sigma|}+\frac{1}{2}\mathbb E_{p}\left[tr(L^{-1}(x-m) (x-m)^{\top})\right]-\frac{1}{2}\mathbb E_{p}\left[tr(\Sigma^{-1}(x-\mu) (x-\mu)^{\top})\right] \\&\ \ \ \ (tr(AB)=tr(BA)) \\&=\frac{1}{2} \log \frac{|L|}{|\Sigma|}+\frac{1}{2}tr\left[\mathbb E_{p}(L^{-1}(x-m) (x-m)^{\top})\right]-\frac{1}{2}tr\left[\mathbb E_{p}(\Sigma^{-1}(x-\mu) (x-\mu)^{\top})\right] \\&\ \ \ \ (tr[\mathbb E_{x}(f(x))]=\mathbb E_{x}[tr(f(x))]) \\&=\frac{1}{2} \log \frac{|L|}{|\Sigma|}+\frac{1}{2}tr\left[L^{-1}\mathbb E_{p}(xx^{\top}-mx^{\top}-xm^{\top}+mm^{\top})\right]-\frac{1}{2}tr\left[\Sigma^{-1}\mathbb E_{p}((x-\mu) (x-\mu)^{\top})\right] \\&=\frac{1}{2} \log \frac{|L|}{|\Sigma|}+\frac{1}{2}tr\left[L^{-1}\mathbb E_{p}(xx^{\top}-mx^{\top}-xm^{\top}+mm^{\top})\right]-\frac{1}{2}tr\left[\Sigma^{-1}\Sigma\right] \\&=\frac{1}{2} \log \frac{|L|}{|\Sigma|}+\frac{1}{2}tr\left[L^{-1}\mathbb E_{p}(\Sigma+\mu\mu^\top-mx^{\top}-xm^{\top}+mm^{\top})\right]-\frac{n}{2} \\&\ \ \ \ (\mathbb E_{p}(xx^{\top})=\Sigma+\mu\mu^\top;\ 证明:\Sigma=\mathbb E_{p}[(x-\mu)(x-\mu)^\top]) \\&=\frac{1}{2} \log \frac{|L|}{|\Sigma|}+\frac{1}{2}tr\left[L^{-1}(\Sigma+\mu\mu^\top-m\mu^{\top}-\mu m^{\top}+mm^{\top})\right]-\frac{n}{2} \\&=\frac{1}{2}\left\{ \log \frac{|L|}{|\Sigma|}-n+tr(L^{-1}\Sigma)+tr\left[L^{-1}(\mu\mu^\top-m\mu^{\top}-\mu m^{\top}+mm^{\top})\right]\right\} \\&=\frac{1}{2}\left\{ \log \frac{|L|}{|\Sigma|}-n+tr(L^{-1}\Sigma)+tr\left[\mu^\top L^{-1}\mu-m^{\top} L^{-1}\mu-\mu^{\top}L^{-1} m+m^{\top}L^{-1} m\right]\right\} \\&=\frac{1}{2}\left\{ \log \frac{|L|}{|\Sigma|}-n+tr(L^{-1}\Sigma)+(\mu^\top L^{-1}\mu-m^{\top} L^{-1}\mu-\mu^{\top}L^{-1} m+m^{\top}L^{-1} m)\right\} \\&=\frac{1}{2}\left\{ \log \frac{|L|}{|\Sigma|}-n+tr(L^{-1}\Sigma)+(\mu-m)^\top L^{-1}(\mu-m)\right\} \end{aligned} Ep[logq(x)p(x)]=21log∣Σ∣∣L∣+21Ep[(x−m)⊤L−1(x−m)−(x−μ)⊤Σ−1(x−μ)]=21log∣Σ∣∣L∣+21Ep[tr((x−m)⊤L−1(x−m))]−21Ep[tr((x−μ)⊤Σ−1(x−μ))]=21log∣Σ∣∣L∣+21Ep[tr(L−1(x−m)(x−m)⊤)]−21Ep[tr(Σ−1(x−μ)(x−μ)⊤)] (tr(AB)=tr(BA))=21log∣Σ∣∣L∣+21tr[Ep(L−1(x−m)(x−m)⊤)]−21tr[Ep(Σ−1(x−μ)(x−μ)⊤)] (tr[Ex(f(x))]=Ex[tr(f(x))])=21log∣Σ∣∣L∣+21tr[L−1Ep(xx⊤−mx⊤−xm⊤+mm⊤)]−21tr[Σ−1Ep((x−μ)(x−μ)⊤)]=21log∣Σ∣∣L∣+21tr[L−1Ep(xx⊤−mx⊤−xm⊤+mm⊤)]−21tr[Σ−1Σ]=21log∣Σ∣∣L∣+21tr[L−1Ep(Σ+μμ⊤−mx⊤−xm⊤+mm⊤)]−2n (Ep(xx⊤)=Σ+μμ⊤; 证明:Σ=Ep[(x−μ)(x−μ)⊤])=21log∣Σ∣∣L∣+21tr[L−1(Σ+μμ⊤−mμ⊤−μm⊤+mm⊤)]−2n=21{log∣Σ∣∣L∣−n+tr(L−1Σ)+tr[L−1(μμ⊤−mμ⊤−μm⊤+mm⊤)]}=21{log∣Σ∣∣L∣−n+tr(L−1Σ)+tr[μ⊤L−1μ−m⊤L−1μ−μ⊤L−1m+m⊤L−1m]}=21{log∣Σ∣∣L∣−n+tr(L−1Σ)+(μ⊤L−1μ−m⊤L−1μ−μ⊤L−1m+m⊤L−1m)}=21{log∣Σ∣∣L∣−n+tr(L−1Σ)+(μ−m)⊤L−1(μ−m)} - 将上述结论代入正则项可得
KL [ q ( Z ∣ X , A ) ∣ ∣ p ( Z ) ] = ∑ i = 1 N KL [ N ( z i ∣ μ i , diag ( σ i 2 ) ) ∣ ∣ N ( z i ∣ 0 , I ) ] = ∑ i = 1 N 1 2 ( − ∑ j = 1 F log σ i 2 − F + ∑ j = 1 F σ i j 2 + μ i ⊤ μ i ) ( n = F , μ = μ i , Σ = diag ( σ i 2 ) , m = 0 , L = I ) \begin{aligned} \text{KL}[q(\boldsymbol Z|\boldsymbol X,\boldsymbol A)||p(\boldsymbol Z)] &=\sum_{i=1}^N \text{KL}[\mathcal N(z_i|\mu_i,\text{diag}(\sigma_i^2))||\mathcal N(z_i|0,\boldsymbol I)] \\&=\sum_{i=1}^N \frac{1}{2}\left(-\sum_{j=1}^F\log\sigma_i^2 -F+\sum_{j=1}^F\sigma_{ij}^2+\mu_i^\top\mu_i\right) \\&\ \ \ \ (n=F,\ \ \mu=\mu_i,\ \ \Sigma=\text{diag}(\sigma_i^2),\ \ m=0,\ \ L=\boldsymbol I) \end{aligned} KL[q(Z∣X,A)∣∣p(Z)]=i=1∑NKL[N(zi∣μi,diag(σi2))∣∣N(zi∣0,I)]=i=1∑N21(−j=1∑Flogσi2−F+j=1∑Fσij2+μi⊤μi) (n=F, μ=μi, Σ=diag(σi2), m=0, L=I) - 综上所述,VGAE 的损失函数为
L = − ∑ i = 1 N ∑ j = 1 N [ δ A i , j = 1 log σ ( z i T z j ) + δ A i , j = 0 log [ 1 − σ ( z i T z j ) ] ] + ∑ i = 1 N 1 2 ( − ∑ j = 1 F log σ i 2 − F + ∑ j = 1 F σ i j 2 + μ i ⊤ μ i ) L=-\sum_{i=1}^N\sum_{j=1}^N\left[\delta_{\boldsymbol A_{i,j}=1}\log\sigma(z_i^Tz_j)+\delta_{\boldsymbol A_{i,j}=0}\log\left[1-\sigma(z_i^Tz_j)\right]\right]+\sum_{i=1}^N \frac{1}{2}\left(-\sum_{j=1}^F\log\sigma_i^2 -F+\sum_{j=1}^F\sigma_{ij}^2+\mu_i^\top\mu_i\right) L=−i=1∑Nj=1∑N[δAi,j=1logσ(ziTzj)+δAi,j=0log[1−σ(ziTzj)]]+i=1∑N21(−j=1∑Flogσi2−F+j=1∑Fσij2+μi⊤μi) (当 A \boldsymbol A A 非常稀疏的时候,VGAE 对 A i , j = 1 \boldsymbol A_{i,j}=1 Ai,j=1 的项进行了重加权)
References
- Kipf, Thomas N., and Max Welling. “Variational graph auto-encoders.” arXiv preprint arXiv:1611.07308 (2016).
- Wu, Zonghan, et al. “A comprehensive survey on graph neural networks.” IEEE transactions on neural networks and learning systems 32.1 (2020): 4-24.
- code: https://github.com/DaehanKim/vgae_pytorch
- 两个多维高斯分布的 Kullback-Leibler divergence (KL 散度)