1.GCN
图上的图信号:
x=[1,2,3,4,4]x=[1,2,3,4,4]x=[1,2,3,4,4]
先滤波,再做非线性变换
x′=gθ∗xkx' = g_{\theta}* x_{k}x′=gθ∗xkxk+1=σ(wx′)x_{k+1} = \sigma(wx')xk+1=σ(wx′)
使用图卷积滤波:
gθ∗x=UgθUTxg_{\theta}* x=U g_{\theta}U^T xgθ∗x=UgθUTx
其中:L=IN−D−1/2AD−1/2=UΛUTL=I_N-D^{-1/2}AD^{-1/2}=U\Lambda U^TL=IN−D−1/2AD−1/2=UΛUT, gθ=diag(θ)g_{\theta} = diag(\theta)gθ=diag(θ)
简化滤波过程:
1.1 切比雪夫近似
由Hammond et al. (2011)得,有切比雪夫近似:
gθ′(Λ)≈∑k=0Kθk′Tk(Λ~)g_{\theta'}(\Lambda) \approx \sum^{K}_{k=0} \theta'_k T_k(\tilde{\Lambda})gθ′(Λ)≈k=0∑Kθk′Tk(Λ~)
其中:Λ~=2λmaxΛ−IN\tilde{\Lambda}=\frac{2}{\lambda_{max}}\Lambda -I_NΛ~=λmax2Λ−IN, Tk=2xTk−1(x)−Tk−2(x)T_k=2xT_{k-1}(x)-T_{k-2}(x)Tk=2xTk−1(x)−Tk−2(x), T0(x)=1T_0(x)=1T0(x)=1, T1(x)=xT_1(x)=xT1(x)=x
因此有:
gθ′∗x=Ugθ(Λ)UTx≈U∑k=0Kθk′Tk(Λ~)UTx=∑k=0Kθk′UTk(Λ~)UTx=∑k=0Kθk′Tk(UΛ~UT)x=∑k=0Kθk′Tk(L~)xg_{\theta'} * x =U g_{\theta}(\Lambda) U^T x \\ \approx U \sum^{K}_{k=0} \theta'_k T_k(\tilde{\Lambda}) U^Tx \\ = \sum^{K}_{k=0} \theta'_k U T_k(\tilde{\Lambda} )U^T x \\ = \sum^{K}_{k=0} \theta'_k T_k(U\tilde{\Lambda} U^T) x \\= \sum^{K}_{k=0} \theta'_k T_k(\tilde{L}) xgθ′∗x=Ugθ(Λ)UTx≈Uk=0∑Kθk′Tk(Λ~)UTx=k=0∑Kθk′UTk(Λ~)UTx=k=0∑Kθk′Tk(UΛ~UT)x=k=0∑Kθk′Tk(L~)x
其中:L~=2λmaxL−IN\tilde{L}=\frac{2}{\lambda_{max}}L -I_NL~=λmax2L−IN
1.2 限制阶数K=1
令K=1K=1K=1, 因为T0(x)=1,T1(x)=xT_0(x)=1, T_1(x)=xT0(x)=1,T1(x)=x,则有
gθ′∗x=∑k=0Kθk′Tk(L~)x≈(θ0T0(L~)+θ1T1(L~))x=(θ0+θ1L~)xg_{\theta'} * x = \sum^{K}_{k=0} \theta'_k T_k(\tilde{L}) x \\ \approx (\theta_0T_0(\tilde{L})+\theta_1T_1(\tilde{L}))x \\ = (\theta_0+\theta_1\tilde{L})xgθ′∗x=k=0∑Kθk′Tk(L~)x≈(θ0T0(L~)+θ1T1(L~))x=(θ0+θ1L~)x
其中:L~=2λmaxL−IN\tilde{L}=\frac{2}{\lambda_{max}}L -I_NL~=λmax2L−IN
1.3 假设λmax=2\lambda_{max}=2λmax=2
假设λmax=2\lambda_{max}=2λmax=2,则有L~=L−IN\tilde{L}=L -I_NL~=L−IN,有:
gθ′∗x=(θ0+θ1L~)x=(θ0+θ1(L−IN))xg_{\theta'} * x = (\theta_0+\theta_1\tilde{L})x
\\=(\theta_0+\theta_1 (L -I_N))xgθ′∗x=(θ0+θ1L~)x=(θ0+θ1(L−IN))x
1.4 设θ0,θ1\theta_0, \theta_1θ0,θ1
设θ0=−θ1\theta_0=-\theta_1θ0=−θ1,有
gθ′∗x=(θ0+θ1(L−IN))x=θ(IN−L+IN)x=θ(IN−(IN−D−1/2AD−1/2)+IN)x=θ(IN+D−1/2AD−1/2)xg_{\theta'} * x =(\theta_0+\theta_1 (L -I_N))x
\\=\theta(I_N-L+I_N)x
\\=\theta(I_N-(I_N-D^{-1/2}AD^{-1/2})+I_N)x
\\=\theta(I_N+D^{-1/2}AD^{-1/2})xgθ′∗x=(θ0+θ1(L−IN))x=θ(IN−L+IN)x=θ(IN−(IN−D−1/2AD−1/2)+IN)x=θ(IN+D−1/2AD−1/2)x
1.5 renormalization trick
gθ′∗x=θ(IN+D−1/2AD−1/2)x≈θ(D~−1/2A~D~−1/2)xg_{\theta'} * x =\theta(I_N+D^{-1/2}AD^{-1/2})x
\\ \approx \theta(\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2})xgθ′∗x=θ(IN+D−1/2AD−1/2)x≈θ(D~−1/2A~D~−1/2)x
1.6 总结
综上:
x′=gθ′∗x≈θ(D~−1/2A~D~−1/2)xx' = g_{\theta'} * x \approx \theta(\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2})xx′=gθ′∗x≈θ(D~−1/2A~D~−1/2)x
代入非线性方程,有:
x(k+1)=σ(wx)≈σ(w(D~−1/2A~D~−1/2)xk)x_{(k+1)}=\sigma(wx)
\\\approx\sigma(w(\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2})x_{k})x(k+1)=σ(wx)≈σ(w(D~−1/2A~D~−1/2)xk)
对于特征矩阵XXX:
X(k+1)≈σ((D~−1/2A~D~−1/2)ΘXk)X_{(k+1)}\approx\sigma((\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2})\Theta X_{k})X(k+1)≈σ((D~−1/2A~D~−1/2)ΘXk)
2.SGC
2.1一阶切比雪夫滤波器
在GCN中,经过近似,一阶(K=1)切比雪夫滤波器近似为传播矩阵:
S1−order=IN+D−1/2AD−1/2S_{1-order} =I_N+D^{-1/2}AD^{-1/2}S1−order=IN+D−1/2AD−1/2
由于L=IN−D−1/2AD−1/2L=I_N-D^{-1/2}AD^{-1/2}L=IN−D−1/2AD−1/2, 因此有
S1−order=2IN−LS_{1-order}=2I_N-LS1−order=2IN−L
x′=S1−order x=(IN+D−1/2AD−1/2)x=(2IN−L)x=(2IN−UΛUT)x=(2UU−1−UΛUT)x=U(2I−Λ)UTx'=S_{1-order} \;x
\\=(I_N+D^{-1/2}AD^{-1/2}) x
\\=(2I_N-L)x
\\=(2I_N-U\Lambda U^T)x
\\=(2UU^{-1}-U\Lambda U^T)x
\\=U(2I-\Lambda)U^Tx′=S1−orderx=(IN+D−1/2AD−1/2)x=(2IN−L)x=(2IN−UΛUT)x=(2UU−1−UΛUT)x=U(2I−Λ)UT
其中,由于L是实对称矩阵,因此有UT=U−1U^T=U^{-1}UT=U−1
由此可得:相当于滤波函数为
gθ(Λ)=2I−Λg_\theta(\Lambda)=2I-\Lambdagθ(Λ)=2I−Λ
也即
gθ(λ)=2−λg_\theta(\lambda)=2-\lambdagθ(λ)=2−λ
其中,λ\lambdaλ是拉普拉斯矩阵LLL的特征值,表示频率
在经过K次累积后(K层网络),有
gθ(λ)K=(2−λ)Kg_\theta(\lambda)^K=(2-\lambda)^Kgθ(λ)K=(2−λ)K
其函数图像为
2.2 增强正则化邻接矩阵
当GCN采用renormalization trick策略后,传播矩阵由S1−orderS_{1-order}S1−order改为S~adj\tilde{S}_{adj}S~adj,其中:
S~adj=D~−1/2A~D~−1/2\tilde{S}_{adj} = \tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2}S~adj=D~−1/2A~D~−1/2 其中A~=A+I\tilde{A}=A+IA~=A+I, D~=D+I\tilde{D}=D+ID~=D+I
相应的,定义增强正则化矩阵L~=IN−D~−1/2A~D~−1/2\tilde{L}=I_N-\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2}L~=IN−D~−1/2A~D~−1/2, 其特征值为λ~\tilde{\lambda}λ~
相应的,使用S~adj\tilde{S}_{adj}S~adj做传播矩阵,有
x′=S~adj x=(D~−1/2A~D~−1/2)x=(IN−L~)x=(IN−UΛ~UT)x=(UU−1−UΛ~UT)x=U(I−Λ~)UTx'=\tilde{S}_{adj} \;x \\= (\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2})x \\=(I_N-\tilde{L})x \\=(I_N-U\tilde{\Lambda} U^T)x \\=(UU^{-1}-U\tilde{\Lambda} U^T)x \\=U(I-\tilde{\Lambda})U^Tx′=S~adjx=(D~−1/2A~D~−1/2)x=(IN−L~)x=(IN−UΛ~UT)x=(UU−1−UΛ~UT)x=U(I−Λ~)UT
也即
gθ(λ)=1−λ~g_\theta(\lambda)=1-\tilde{\lambda}gθ(λ)=1−λ~
其中,λ\lambdaλ是拉普拉斯矩阵LLL的特征值,表示频率
在经过K次累积后(K层网络),有
gθ(λ~)K=(1−λ~)Kg_\theta(\tilde{\lambda})^K=(1-\tilde{\lambda})^Kgθ(λ~)K=(1−λ~)K
SGC证明:
0=λ0<λn<λn~<λ0=\lambda_0<\lambda_n<\tilde{\lambda_n}<\lambda0=λ0<λn<λn~<λ
因此有图像
或句话说,renormalization trick策略使得传播矩阵的最大特征值变小了,在1.6左右,而不是原先的2
2.3 正则化邻接矩阵
为说明其优点,可以先考虑Sadj=D−1/2AD−1/2S_{adj}=D^{-1/2}AD^{-1/2}Sadj=D−1/2AD−1/2做传播矩阵,有
x′=Sadj x=(D−1/2AD−1/2)x=(IN−L)x=U(I−Λ)UTx'=S_{adj} \;x
\\= (D^{-1/2}AD^{-1/2})x
\\=(I_N-L)x
\\=U(I-\Lambda)U^Tx′=Sadjx=(D−1/2AD−1/2)x=(IN−L)x=U(I−Λ)UT
也即
gθ(λ)=1−λg_\theta(\lambda)=1-\lambdagθ(λ)=1−λ
在经过K次累积后(K层网络),有
gθ(λ)K=(1−λ)Kg_\theta(\lambda)^K=(1-\lambda)^Kgθ(λ)K=(1−λ)K
综上,三种传播矩阵S1−orderS_{1-order}S1−order, SadjS_{adj}Sadj, S~adj\tilde{S}_{adj}S~adj做传播矩阵,分别有滤波函数为
FAGCN
设计两个传播矩阵:
FL=αI+D−1/2AD−1/2=(α+1)I−L\mathcal{F}_L=\alpha I+D^{-1/2}AD^{-1/2}
\\=(\alpha+1)I-LFL=αI+D−1/2AD−1/2=(α+1)I−L
FL=αI−D−1/2AD−1/2=(α−1)I+L\mathcal{F}_L=\alpha I-D^{-1/2}AD^{-1/2} \\=(\alpha-1)I+LFL=αI−D−1/2AD−1/2=(α−1)I+L
分别相当于滤波函数
g1(λ)=(1−λ+α)g1(\lambda)=(1-\lambda+\alpha)g1(λ)=(1−λ+α)g2(λ)=(λ−1+α)g2(\lambda)=(\lambda-1+\alpha)g2(λ)=(λ−1+α)
其图像分别为