[论文精读]Multi-Channel Graph Neural Network for Entity Alignment

①Defining a directed graph $G=\left ( E,R,T \right )$ , which contains entity set $E$ , relation set $R$ and triplets $T$
②Triplet $t=(e_{i},r_{ij},e_{j})\in T$

（2）Rule knowledge

①For rule $k=(r_{c}|r_{s1},\cdots,r_{sp})$ , $\mathcal{K}=\{k\}$ , it means there are $\forall x,y\in E:(x,r_{s},y)\Rightarrow (x,r_{c},y)$

（3）Rule Grounding

①通过上面的递推，实体可以找到更进一步的关系

（4）Entity alignment

①Alignments in two entities: $\mathcal{A}_{e}=\{(e,e^{\prime}) \in E\times E^{\prime}|e \leftrightarrow e^{\prime}\}$

②Alignment relation: $\mathcal{A}_{r}^{s}=\{(r,r^{\prime})\in R\times R'|r\leftrightarrow r'\}$

2.3.2. Framework

①Workflow of MuGNN:

（1）KG completion

①Adopt rule mining system AMIE+

（2）Multi-channel Graph Neural Network

①Encoding KG in different channels

2.4. KG Completion

2.4.1. Rule Inference and Transfer

2.4.2. Rule Grounding

①比如从KG2中找到 $province(x,y) \wedge dialect(y,z) \Rightarrow dialect(x,z)$ 关系，就可以补充到KG1中去

2.5. Multi-Channel Graph Neural Network

2.5.1. Relation Weighting

①They will generate a weighted relationship matrix

②They construct self attention adjacency matrix and cross-KG attention adjacency matrix for each channel

（1）KG Self-Attention（这个是为了补齐）

①Normalized connection weights:

$a_{ij}=softmax(c_{ij})=\frac{exp(c_{ij})}{\sum_{e_{k}\in N_{e_{i}}\cup e_{i}}exp(c_{ik})}$

where $e_i$ contains self loop and $e_{k} \in N_{e_{i}}\cup\{e_{i}\}$ denotes the neighbors of $e_i$

② $c_{ij}$ denotes the attention coefficient between two entities:

$\begin{aligned} \text{cij}& =attn(\mathbf{We_{i}},\mathbf{We_{j}}) \\ &=LeakyReLU(\mathbf{p[We_{i}\|We_{j}]}) \end{aligned}$

where $\mathbf{W}$ and $\mathbf{p}$ are trainable parameters

（2）Cross-KG Attention（这个是为了修剪，是另一个邻接矩阵）

①Pruning operation :

$a_{ij}=\max\limits_{r\in R,r'\in R'}\mathbf{1}((e_i,r,e_j)\in T)sim(r,r')$

if $(e_i,r,e_j)\in T)$ is true then it will be 1 otherwise 0, $sim\left ( \cdot \right )$ denotes inner product similarity measure $sim(r,r')=\mathbf{r}^{T}\mathbf{r}^{\prime}$

2.5.2. Multi-Channel GNN Encoder

①Propagation of GNN:

$\mathrm{GNN}(A,H,W)=\sigma(\mathbf{AHW})$

and they chose $\sigma \left ( \cdot \right )$ as ReLU

②Multi GNN encoder:

$\mathrm{MultiGNN}(H^{l};A_{1},\cdots,A_{c})=\mathrm{Pooling}(H_{1}^{l+1},\cdots,H_{c}^{l+1})$

where $c$ denotes the number of channels

③Updating function:

$\mathbf{H}_i^{l+1}=\mathrm{GNN}(A_i,H^l,W_i)$

④Pooling strategy: mean pooling

2.5.3. Align Model

①Embedding two KG to the same vector space and measure the distance to judge the equivalence relation:

$\mathcal{L}_{a}=\sum_{(e,e^{'})\in\mathcal{A}_{e}^{s}}\sum_{(e_{-},e_{-}^{'})\in\mathcal{A}_{e}^{s-}}[d(e,e^{'})+\gamma_{1}-d(e_{-},e_{-}^{'})]_{+}+\\\sum_{(r,r^{'})\in\mathcal{A}_{r}^{s}}\sum_{(r_{-},r_{-}^{'})\in\mathcal{A}_{r}^{s-}}[d(r,r^{'})+\gamma_{2}-d(r_{-},r_{-}^{'})]_{+}$

where $[\cdot]_{+}=max\{0,\cdot\}$ , $d(\cdot)=\|\cdot\|_{2}$ , $\mathcal{A}_e^{s-}$ and $\mathcal{A}_r^{s-}$ are negative pairs in the original sets, $\gamma _1> 0$ and $\gamma _2> 0$ are margin hyper-parameters separating positive and negative entity and relation alignments

②Triplet loss:

$\begin{gathered} L_{r} =\sum_{g^{+}\in\mathcal{G}(\mathcal{K})g^{-}\in\mathcal{G}^{-}(\mathcal{K})}[\gamma_{r}-I(g^{+})+I(g^{-})]_{+} \\ +\sum_{t^{+}\in Tt^{-}\in T^{-}}[\gamma_{r}-I(t^{+})+I(t^{-})]_{+} \end{gathered}$

③ $I\left ( \cdot \right )$ denotes the true value function for triplet $t$ :

$I(t)=1-\frac{1}{3\sqrt{d}}\|\mathbf{e}_{i}+\mathbf{r}_{ij}-\mathbf{e}_{j}\|_{2}$

then it can be recursively transformed into:

$I(t_{s})=I(t_{s1}\wedge t_{s2})=I(t_{s1})\cdot I(t_{s2})\\I(t_{s}\Rightarrow t_{c})=I(t_{s})\cdot I(t_{c})-I(t_{s})+1$

where $d$ is the embedding size

④The overall loss:

$\mathcal{L}=\mathcal{L}_a+\mathcal{L}_r'+\mathcal{L}_r$

2.6. Experiment

2.6.1. Experiment Settings

（1）Datasets

①Datasets: DBP15K (contains DBPZH-EN(Chinese to English), DBPJA-EN (Japanese to English), and DBPFREN (French to English)) and DWY100K (contains DWY-WD (DBpedia to Wikidata) and DWY-YG (DBpedia to YAGO3))

②Statistics of datasets:

③Statistics of KG in datasets:

（2）Baselines

①MTransE

②JAPE

③GCN-Align

④AlignEA

（3）Training Details

①Training ratio: 30% for training and 70% for testing

②All the embedding size: 128

③All the GNN layers: 2

④Optimizer: Adagrad

⑤Hyperparameter: $\gamma _1=1.0,\gamma _2=1.0,\gamma _r=0.12$

⑥Grid search to learning rate in {0.1,0.01,0.001}, L2 in {0.01,0.001,0.0001}, dropout rate in {0.1,0.2,0.5}. They finally got 0.001,0.01,0.2 optimal each

2.6.2. Overall Performance

2.6.3. Impact of Two Channels and Rule Transfer

①Module ablation:

2.6.4. Impact of Seed Alignments

①Ratio of seeds:

2.6.5. Qualitative Analysis

①Two examples of how the rule works:

2.7. Related Work

Introduces some related works

2.8. Conclusions

They aim to further research word ambiguity

3. 知识补充

3.1. Adagrad Optimizer

（1）补充学习：Deep Learning 最优化方法之AdaGrad - 知乎 (zhihu.com)

4. Reference

Cao, Y. et al. (2019) 'Multi-Channel Graph Neural Network for Entity Alignment', Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, doi: 10.18653/v1/P19-1140