[论文精读]Multi-Channel Graph Neural Network for Entity Alignment

论文网址:Multi-Channel Graph Neural Network for Entity Alignment (aclanthology.org)

论文代码:https:// github.com/thunlp/MuGNN

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Preliminaries and Framework

2.3.1. Preliminaries

2.3.2. Framework

2.4. KG Completion

2.4.1. Rule Inference and Transfer

2.4.2. Rule Grounding

2.5. Multi-Channel Graph Neural Network

2.5.1. Relation Weighting

2.5.2. Multi-Channel GNN Encoder

2.5.3. Align Model

2.6. Experiment

2.6.1. Experiment Settings

2.6.2. Overall Performance

2.6.3. Impact of Two Channels and Rule Transfer

2.6.4. Impact of Seed Alignments

2.6.5. Qualitative Analysis

2.7. Related Work

2.8. Conclusions

3. 知识补充

3.1. Adagrad Optimizer

4. Reference


1. 心得

(1)是比较容易理解的论文

2. 论文逐段精读

2.1. Abstract

        ①Limitations of entity alignment: structural heterogeneity and limited seed alignments

        ②They proposed Multi-channel Graph Neural Network model (MuGNN)

2.2. Introduction

        ①Knowledge graph (KG) stores information by directed graph, where the nodes are entity and the edges denote relationship

        ②Mother tongue information usually stores more information:

(作者觉得KG1的Jilin会对齐KG2的Jilin City,因为他们有相似的方言和连接的长春。这个感觉不是一定吧?取决于具体模型?感觉还是挺有差别的啊这俩东西,结构上也没有很相似

        ③To solve the problem, it is necessary to fill in missing entities and eliminate unnecessary ones

2.3. Preliminaries and Framework

2.3.1. Preliminaries

(1)KG

        ①Defining a directed graph G=\left ( E,R,T \right ), which contains entity set E, relation set R and triplets T
        ②Triplet t=(e_{i},r_{ij},e_{j})\in T

(2)Rule knowledge

        ①For rule k=(r_{c}|r_{s1},\cdots,r_{sp})\mathcal{K}=\{k\}, it means there are \forall x,y\in E:(x,r_{s},y)\Rightarrow (x,r_{c},y)

(3)Rule Grounding

        ①通过上面的递推,实体可以找到更进一步的关系

(4)Entity alignment

        ①Alignments in two entities: \mathcal{A}_{e}=\{(e,e^{\prime}) \in E\times E^{\prime}|e \leftrightarrow e^{\prime}\}

        ②Alignment relation: \mathcal{A}_{r}^{s}=\{(r,r^{\prime})\in R\times R'|r\leftrightarrow r'\}

2.3.2. Framework

        ①Workflow of MuGNN:

(1)KG completion

        ①Adopt rule mining system AMIE+

(2)Multi-channel Graph Neural Network

        ①Encoding KG in different channels

2.4. KG Completion

2.4.1. Rule Inference and Transfer

        

2.4.2. Rule Grounding

        ①比如从KG2中找到province(x,y) \wedge dialect(y,z) \Rightarrow dialect(x,z)关系,就可以补充到KG1中去

2.5. Multi-Channel Graph Neural Network

2.5.1. Relation Weighting

        ①They will generate a weighted relationship matrix

        ②They construct self attention adjacency matrix and cross-KG attention adjacency matrix for each channel

(1)KG Self-Attention(这个是为了补齐)

        ①Normalized connection weights:

a_{ij}=softmax(c_{ij})=\frac{exp(c_{ij})}{\sum_{e_{k}\in N_{e_{i}}\cup e_{i}}exp(c_{ik})}

where e_i contains self loop and e_{k} \in N_{e_{i}}\cup\{e_{i}\} denotes the neighbors of e_i

        ②c_{ij} denotes the attention coefficient between two entities:

\begin{aligned} \text{cij}& =attn(\mathbf{We_{i}},\mathbf{We_{j}}) \\ &=LeakyReLU(\mathbf{p[We_{i}\|We_{j}]}) \end{aligned}

where \mathbf{W} and \mathbf{p} are trainable parameters

(2)Cross-KG Attention(这个是为了修剪,是另一个邻接矩阵)

        ①Pruning operation :

a_{ij}=\max\limits_{r\in R,r'\in R'}\mathbf{1}((e_i,r,e_j)\in T)sim(r,r')

if (e_i,r,e_j)\in T) is true then it will be 1 otherwise 0, sim\left ( \cdot \right ) denotes inner product similarity measure sim(r,r')=\mathbf{r}^{T}\mathbf{r}^{\prime}

2.5.2. Multi-Channel GNN Encoder

       ①Propagation of GNN:

\mathrm{GNN}(A,H,W)=\sigma(\mathbf{AHW})

and they chose \sigma \left ( \cdot \right ) as ReLU

        ②Multi GNN encoder:

\mathrm{MultiGNN}(H^{l};A_{1},\cdots,A_{c})=\mathrm{Pooling}(H_{1}^{l+1},\cdots,H_{c}^{l+1})

where c denotes the number of channels

        ③Updating function:

\mathbf{H}_i^{l+1}=\mathrm{GNN}(A_i,H^l,W_i)

        ④Pooling strategy: mean pooling

2.5.3. Align Model

        ①Embedding two KG to the same vector space and measure the distance to judge the equivalence relation:

\mathcal{L}_{a}=\sum_{(e,e^{'})\in\mathcal{A}_{e}^{s}}\sum_{(e_{-},e_{-}^{'})\in\mathcal{A}_{e}^{s-}}[d(e,e^{'})+\gamma_{1}-d(e_{-},e_{-}^{'})]_{+}+\\\sum_{(r,r^{'})\in\mathcal{A}_{r}^{s}}\sum_{(r_{-},r_{-}^{'})\in\mathcal{A}_{r}^{s-}}[d(r,r^{'})+\gamma_{2}-d(r_{-},r_{-}^{'})]_{+}

where [\cdot]_{+}=max\{0,\cdot\}d(\cdot)=\|\cdot\|_{2}\mathcal{A}_e^{s-} and \mathcal{A}_r^{s-} are negative pairs in the original sets, \gamma _1> 0 and \gamma _2> 0 are margin hyper-parameters separating positive and negative entity and relation alignments

        ②Triplet loss:

\begin{gathered} L_{r} =\sum_{g^{+}\in\mathcal{G}(\mathcal{K})g^{-}\in\mathcal{G}^{-}(\mathcal{K})}[\gamma_{r}-I(g^{+})+I(g^{-})]_{+} \\ +\sum_{t^{+}\in Tt^{-}\in T^{-}}[\gamma_{r}-I(t^{+})+I(t^{-})]_{+} \end{gathered}

        ③I\left ( \cdot \right ) denotes the true value function for triplet t:

I(t)=1-\frac{1}{3\sqrt{d}}\|\mathbf{e}_{i}+\mathbf{r}_{ij}-\mathbf{e}_{j}\|_{2}

then it can be recursively transformed into:

I(t_{s})=I(t_{s1}\wedge t_{s2})=I(t_{s1})\cdot I(t_{s2})\\I(t_{s}\Rightarrow t_{c})=I(t_{s})\cdot I(t_{c})-I(t_{s})+1

where d is the embedding size

        ④The overall loss:

\mathcal{L}=\mathcal{L}_a+\mathcal{L}_r'+\mathcal{L}_r

2.6. Experiment

2.6.1. Experiment Settings

(1)Datasets

        ①Datasets: DBP15K (contains DBPZH-EN(Chinese to English), DBPJA-EN (Japanese to English), and DBPFREN (French to English)) and DWY100K (contains DWY-WD (DBpedia to Wikidata) and DWY-YG (DBpedia to YAGO3))

        ②Statistics of datasets:

        ③Statistics of KG in datasets:

(2)Baselines

        ①MTransE

        ②JAPE

        ③GCN-Align

        ④AlignEA

(3)Training Details

        ①Training ratio: 30% for training and 70% for testing

        ②All the embedding size: 128

        ③All the GNN layers: 2

        ④Optimizer: Adagrad

        ⑤Hyperparameter: \gamma _1=1.0,\gamma _2=1.0,\gamma _r=0.12

        ⑥Grid search to learning rate in {0.1,0.01,0.001}, L2 in {0.01,0.001,0.0001}, dropout rate in {0.1,0.2,0.5}. They finally got 0.001,0.01,0.2 optimal each

2.6.2. Overall Performance

2.6.3. Impact of Two Channels and Rule Transfer

        ①Module ablation:

2.6.4. Impact of Seed Alignments

        ①Ratio of seeds:

2.6.5. Qualitative Analysis

        ①Two examples of how the rule works:

2.7. Related Work

        Introduces some related works

2.8. Conclusions

        They aim to further research word ambiguity

3. 知识补充

3.1. Adagrad Optimizer

(1)补充学习:Deep Learning 最优化方法之AdaGrad - 知乎 (zhihu.com)

4. Reference

Cao, Y. et al. (2019) 'Multi-Channel Graph Neural Network for Entity Alignment', Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, doi: 10.18653/v1/P19-1140

### DPGNN(双感知图神经网络)用于表示学习 #### 架构设计 Dual-perception graph neural network (DPGNN) 是一种旨在通过融合不同模态的信息来增强节点特征表达能力的模型架构。该方法利用两个独立但相互关联的子网分别处理局部和全局结构信息,从而实现更丰富的表征[^1]。 对于局部视角,DPGNN 能够捕捉近距离邻居之间的交互模式;而对于全局视角,则关注于远距离节点间的关系建模。这种双重机制有助于提高对复杂网络拓扑的理解力,并为下游任务提供更加鲁棒且具有区分度的嵌入向量。 #### 实现细节 为了有效实施上述理念,在具体编码过程中可以采用如下策略: - **消息传递框架**:基于经典的 GNN 消息传播范式,定义两套不同的聚合函数以适应各自的感受野需求; ```python import torch class LocalAggregator(torch.nn.Module): def forward(self, x, edge_index): # Implement local aggregation logic here pass class GlobalAggregator(torch.nn.Module): def forward(self, x, batch): # Implement global pooling/aggregation logic here pass ``` - **多尺度特征提取**:引入跨层连接或跳跃链接,使得每一层都能获取到前序层次产生的中间状态作为补充输入源之一; - **自注意力机制**:借鉴 Transformer 的成功经验,在必要位置加入 self-attention 层以便更好地权衡重要性和权重分配给各个部分的数据流路径。 #### 应用场景 得益于其独特的设计理念,DPGNN 不仅适用于传统的社交关系预测、推荐系统等领域,还能够在新兴方向上发挥重要作用,比如但不限于: - 物联网设备间的协同工作优化; - 生物医学领域内的蛋白质功能分类与药物靶点发现; - 自然语言处理中的语义解析及情感分析等自然语言理解类问题求解。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值