[ACL 2024]PokeMQA: Programmable knowledge editing for Multi-hop Question Answering

论文网址:PokeMQA: Programmable knowledge editing for Multi-hop Question Answering - ACL Anthology

论文代码:GitHub - Hengrui-Gu/PokeMQA: [ACL 2024] PokeMQA: Programmable knowledge editing for Multi-hop Question Answering

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Multi-hop Question Answering under Knowledge Editing

2.4. Programmable Editing in Memory of Multi-hop Question Answering

2.4.1. Workflow of PokeMQA

2.4.2. Programmable Scope Detector

2.4.3. Knowledge Prompt Generator

2.5. Experimental Setup

2.5.1. Evaluation Metrics

2.5.2. Baselines Methods & Language Models

2.5.3. Implementation Details

2.6. Performance Analysis

2.6.1. Main Results

2.6.2. Ablation Study

2.7. Related Work

2.8. Conclusion

3. 知识补充

3.1. Knowledge editing

3.2. 用于对比学习和判别模型的BCE变体

4. Reference


1. 心得

(1)可能不是专门做这个方向的我,有些描述或许需要结合代码才能更清晰一点

2. 论文逐段精读

2.1. Abstract

        ①Existing problems: current cascading knowledge updating methods are mix-up prompt, including question decomposition, answer generation, and conflict checking via comparing with edited facts. However, the coupling nature of them might cause conflict

        ②So they proposed Programmable knowledge editing for Multi-hop Question Answering (PokeMQA)

2.2. Introduction

        ①Example of Multi-hop question answering (MQA):

where blue lines are correct reasoning

        ②Methods for fine tune outdated knowledge (the authors used the second one): 

parameter-modification based editingmodifies the internal model weights according to edited facts through meta-learning, fine-tuning, or knowledge locating
memory-based editingleverages an external memory to explicitly store the edited facts (or termed as edits) and reason over them, while leaving LLMs parameters unchanged

        ③Existing challenges for MQA: a) conflict detection, b) the incorporation of knowledge editing instruction introduces noise

2.3. Multi-hop Question Answering under Knowledge Editing

(1)Notations

        ①A triplet \left ( s,r,o \right ) with subject s, object o and relation r, such as: 

\left ( Messi, play\, for,Inter\, Miami \right )

        ②To update this fact:

\left ( Messi, play\, for,Boca\, Juniors \right )

        ③Multi hop question: Q, where the answer of Q needs sequentially querying and retrieving multiple facts

        ④Chain of facts: 

\langle(s_1,r_1,o_1),\ldots,(s_n,r_n,o_n)\rangle

where s_{i+1}=o_{i}o_n is the final answer

        ⑤The unique inter-entity path \mathcal{P}=\langle s_{1},o_{1},\ldots,o_{n}\rangle

        ⑥Except for s_1, all other entities o_{1},\ldots,o_{n} will not allowed to appear in Q

        ⑦Edit facts: just one change of a fact such as from

(s_i,r_i,o_i)\rightarrow e=(s_{i},r_{i},o_{i}^{*})

causes cascaded changes consequently:

\langle(s_{1},r_{1},o_{1}),\ldots,(s_{i},r_{i},o_{i}^{*}),\ldots(s_{n}^{*},r_{n},o_{n}^{*})\rangle

and the inter-entity path will be:

\mathcal{P}^{*}\quad=\langle s_{1},o_{1},\ldots,o_{i}^{*},\ldots,o_{n}^{*}\rangle

(2)MQA under knowledge editing

        ①A set of edits: \mathcal{E}=\{e_{1},\ldots,e_{m}\}

        ②A language model: f

        ③Edited language model: f_{\mathrm{edit}}

(3)Edit scope

        ①Scopes of edit S(e) means the similar questions which corresponding to the same answer

syntactic  adj.句法的

2.4. Programmable Editing in Memory of Multi-hop Question Answering

2.4.1. Workflow of PokeMQA

        ①Illustration of PokeMQA:

where Prompt Generator utilizes an external knowledge base to decomposite original Q, then use Scope Detector to further generate answers

        ②When receiving a set of edits \mathcal{E}=\{e_{1},\ldots,e_{m}\}, PokeMQA first uses manually-defined template to convert each edit triplet e into a natural language statement t(感觉在图中对应的就是把原句分解成俩子问题:


原句:Who is the head of state of the country where Messi (s_1) holds a citizenship?

子问题1(t_1:What is the country of citizenship (r_1) of Messi (s_1)?

答案1:United States (o_1)

子问题2(t_2:Who is the head (r_2) of state of United States (o_1)?

答案2:Joe Biden (o_2)


), then explicitly stores them in an external memory \mathcal{M}=\{t_{1},\ldots,t_{m}\} for query and retrieval.

        ③Models are taught to excute 3 tasks by few-shot prompt:

1Identify the next subquestion (i.e., atomic question) condi tioned on the input question and current inference state in LLMs
2Detect whether this subquestion falls within the edit scope and generate answer
3Extract the answer entity for this subquestion in LLMs

        ④⭐Previous work always generated a tentative answer from model and retrieved edited facts for each question, but this was not realistic for few-shot prompts. So they change this to: retrieve subquestion in \mathcal{M}=\{t_{1},\ldots,t_{m}\}, then get answer from \mathcal{M} otherwise generate by itself

        ⑤Key entity decomposite from the Q keeps helping to prompt due to the difficult of decompositing the input for the first subquetion

2.4.2. Programmable Scope Detector

(1)Architectures

        ①Scope detector: g(t,q):{\mathcal{T}}\times\mathcal{Q}\to[0,1], which pre dicts the probability that an atomic question q falls into the scope of the edit statement t (in terms of the edit e)

        ②They employ 2 complementary models for expressiveness and computational efficiency: g_\phi and g_\psi

        ③g_\phi (predetector M_{\mathrm{pre}}) calculates the embeddings for t and q separately and models the log-likelihood by the negative squared Euclidean distance in the embedding space, which filters irrelevant edits

        ④g_\psi (conflict disambiguator M_{\mathrm{dis}}) con catenates t and q together as a unified input for the sequence classification task, which achieve accurate task classification

(2)Training scope detector

        ①Training set: \mathcal{D}_{\mathrm{train}}=\{(t_{1},q_{1}),\ldots,(t_{m},q_{m})\}

        ②BCE loss:

\mathcal{L}=-\log g(t_i,q_i)-\mathbb{E}_{q_n\sim P_n(q)}\left[\log(1-g(t_i,q_n))\right]

where P_n denotes negative sampling distribution

        ③M_{\mathrm{pre}} and M_{\mathrm{dis}} are trained separately

(3)Model selection

        ①The authors design Success Rate and Block Rate to guide early stopping

        ②Success Rate measures the accuracy to retrieve the correct edit statement t_i for a target question q_i from a set of candidates

SR=\frac{1}{N}\sum_{i=1}^{N}\mathbf{1}\left[\bigwedge_{(t,q)\in\mathcal{D}_{val}}(g(t_i,q_i)\geq g(t,q_i))\right]

where \mathbf{1}\left ( \cdot \right ) is indicator function, N denotes the size of validation set \mathcal{D}_{val}\wedge is "and gate"

        ③Block Rate quantifies the extent of detector models to inhibit the unrelated edit statements for a target question:

BR=\frac{1}{N}\sum_{i=1}^N\mathbf{1}\left[\bigwedge_{(t,q)\in\mathcal{D}_{val}^-}(g(t,q_i)<0.5)\right]

where \mathcal{D}_{val}^{-}=\mathcal{D}_{val}-\{(t_{i},q_{i})\}

2.4.3. Knowledge Prompt Generator

        ①They introduce knowledge prompt generator M_{\mathrm{gen}} (ELQ model) to quickly link Q to an entity from Wikidata

        ②Store triplets (s,r,o) in Wikidata

        ③2 basic membership properties \mathcal{R}=[r_1,r_2], where r_{1}\mathrm{=}instance\: ofr_{2}\mathrm{=}subclass\: of(这个就是大图中最左上图Messi, a human的来源)

2.5. Experimental Setup

        ①Knowledge editing dataset: MQUAKE, which including MQUAKE-CF-3Kbasedoncoun terfactual edits, and MQUAKE-T with temporal knowledge updates

        ②Hop questions in dataset: k\in \left \{ 2,3,4 \right \}

2.5.1. Evaluation Metrics

        ①Metrics: multi-hop accuracy and hop-wise answering accuracy (Hop-Acc)

2.5.2. Baselines Methods & Language Models

        ①Compared parameter updating methods: FT, ROME, MEMIT

        ②Compared memory-based method: MeLLo

        ③LLMs: LLaMa-2-7B, Vicuna-7B, GPT-3.5-turbo-instruct

2.5.3. Implementation Details

        ①Finetune g_\phi and g_\psi by DistilBERT

        ②Sampling method: stratified sampling

2.6. Performance Analysis

2.6.1. Main Results

        ①Performance table:

        ②Different acc and hop-acc of LLaMa-2-7B on MQUAKE-CF-3K:

2.6.2. Ablation Study

        ①Module ablation in GPT-3.5-turbo-instruct on MQUAKE-CF-3K:

        ②Module ablation table:

2.7. Related Work

        ①Knowledge editing methods

2.8. Conclusion

        ①Limitations: a) accuracy of retrieval, b) safe technique required

3. 知识补充

3.1. Knowledge editing

(1)定义

Knowledge Editing(知识编辑) 是指对知识库(如知识图谱、知识库、模型的知识表示等)中的现有知识进行修改、更新、修正或增添的过程。这一过程不仅限于添加新的事实或知识,还包括修改、删除、纠正错误的知识,或者在已有的知识基础上引入新的上下文和关系。

在自然语言处理和知识图谱领域,知识编辑的目的是使得知识库或模型中的知识更加准确、一致和及时。这对于提升智能系统(如问答系统、推理系统等)的表现至关重要,尤其是在知识是动态变化的环境中。

(2)应用

        ①修正错误:如果知识库中的某个事实是错误的(例如,日期、地点、人物等信息错误),知识编辑可以帮助纠正这些错误。

        ②扩展和更新知识:随着新信息的到来,知识库需要不断更新。例如,新增的科学发现、新的社会事件等都需要通过知识编辑进行更新。

        ③对抗偏见:知识库中的偏见或不准确的信息(如刻板印象、政治偏见等)也可以通过知识编辑进行修正,以提高知识的公平性和准确性。

        ④自动化学习:一些系统可以通过从大量文本数据中自动提取新事实并编辑到已有的知识库中,从而让系统不断完善其知识。

(3)挑战

        ①复杂的推理:在编辑现有知识时,可能需要考虑其与其他事实之间的关系,确保新加入的知识不会破坏原有的知识一致性。

        ②知识验证:需要有效的方法来验证知识编辑的正确性,尤其是在知识图谱中,如何确认新添加的事实没有与已有的知识发生冲突是一个挑战。

        ③处理不同来源的知识:如何整合来自多个不同来源的信息并进行一致的知识编辑,尤其是当不同来源的事实不一致时,如何做出合理的编辑决策。

(4)技术实现

        ①基于规则的编辑:利用一系列手工编写的规则进行知识库更新。例如,可以通过手动规则检查和纠正日期、地点等事实的准确性。

        ②基于学习的编辑:使用机器学习方法(如文本分类、实体关系抽取等)来自动检测知识库中的错误或不一致,并进行自动化的知识编辑。

        ③人机协作:结合人工审核和自动化编辑来确保编辑的质量和准确性。例如,自动化系统可以检测潜在的错误,而人类专家可以做出最终决定。

        ④自然语言生成(NLG)和推理:使用语言模型(如GPT、BERT)自动生成或推理出新知识,以便对知识图谱进行有效的更新。例如,可以使用大型预训练语言模型从文本中提取出新的事实,并将其作为编辑后的事实插入知识库。

3.2. 用于对比学习和判别模型的BCE变体

(1)公式

\mathcal{L}=-\log g(t_i,q_i)-\mathbb{E}_{q_n\sim P_n(q)}\left[\log(1-g(t_i,q_n))\right]

(2)解释

这个公式涉及的是一个正样本和多个负样本(取平均)的对比学习,第一项是正样本对,第二项是负样本对。作者定义g(t,q):{\mathcal{T}}\times\mathcal{Q}\to[0,1],即需要使得g(t_i,q_i)趋近于1,得到高的匹配得分来最小化第一项。第二项g(t_i,q_n)不匹配项则需要越小越好,这样第二项也能接近0。无论如何,这个损失应该都是大于0的,应该。

4. Reference

Gu, H. et al. (2024) PokeMQA: Programmable knowledge editing for Multi-hop Question Answering, ACL.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值