Connecting the Dots: Document-level Neural Relation Extraction with Edge-oriented Graphs 关系抽取论文总结

本文介绍了如何使用边导向图神经模型进行文档级的关系抽取,重点在于利用边表示不同实体对的独特上下文,从而提高跨句子关系识别的准确性。模型包括句子编码、图构建、推理和分类层,实验表明该模型在提取文档级和句子级关系时表现出优越性能。

往期文章目录链接

Relation Extraction (RE)

Relation Extraction (RE): The extraction of relations between named entities in text.

Relation Extraction is an important task of NLP. Most existing works focus on intra-sentence RE. In fact, in real-world scenarios, a large amount of relations are expressed across sentences. The task of identifying these relations is named inter-sentence RE.

Typically, inter-sentence relations occur in textual snippets with several sentences, such as documents. In these snippets, each entity is usually repeated with the same phrases or aliases, the occurrences of which are often named entity mentions and regarded as instances of the entity.

The multiple mentions of the target entities in different sentences can be useful for the identification of inter-sentential relations, as these relations may depend on the interactions of their mentions with other entities in the same document. The figure above is an good example of identifying the relationship between “ethambutol”, “isoniazid” and “scotoma”, where they all interact with the green colored entity (and its alias).

document-level RE

  • In concept, document-level RE the input is considered an annotated document. The annotations include concept-level entities as well as multiple occurrences of each entity under the same phrase of alias, i.e., entity mentions.

  • Objective: the objective of the task is given an annotated document, to identify all the related concept-level pairs in that document.

Document-level RE is not common in the general domain, as the entity types of interest can often be found in the same sentence. On the contrary, in the biomedical domain, document-level relations are particularly important given the numerous aliases that biomedical entities can have (as shown in the figure above).

Intuition

Graph-based neural approaches have proven useful in encoding long distance, inter-sentential information. These models interpret words as nodes and connections between them as edges. They typically perform on the nodes by updating the representations during training.

This paper: However, a relation between two entities depends on different contexts. It could thus be better expressed with an edge connection that is unique for the pair. A straightforward way to address this is to create graph-based models that rely on edge representations rather focusing on node representations, which are shared between multiple entity pairs.

Contribution

  • We propose a novel edge-oriented graph neural model for document-level relation extraction, which encodes information into edge representations rather than node representations.
  • Analysis indicates that the document-level graph can effectively encode document-level dependencies.
  • we show that inter-sentence associations can be beneficial for the detection of intra-sentence relations.

Overview of Proposed Model

We presented a novel edge-oriented graph neural model (EoG) for document-level relation extraction using multi-instance learning. The proposed model constructs a document-level graph with heterogeneous types of nodes and edges, modelling intra- and inter-sentence pairs simultaneously with an iterative algorithm over the graph edges.

Here is an illustration of the abstract architecture of the proposed approach.
在这里插入图片描述

Proposed Model

The proposed model consists of four layers: sentence encoding, graph construction, inference and classification layers. The model receives a document (with identified concept-level entities and their textual mentions) and encodes each sentence separately. A document-level graph is constructed and fed into an iterative algorithm to generate edge representations between the target entity nodes.

Sentence Encoding Layer

We use a Bi-LSTM to encode each sentence and then get a contextualized word representations of the input sentence. The contextualized word representations from the encoder are then used to construct a document-level graph structure.

Graph construction Layer

Graph construction consists of Node Construction and Edge Construction.

Node Construction

They form three distinct types of nodes in the graph:

  • Mention nodes (M) n m n_m nm. Mention nodes correspond to different mentions of entities in the input document. The representation of a mention node is formed as the average of the words ( w w w) that the mention contains, i.e. avg ⁡ w i ∈ m ( w i ) \operatorname{avg}_{w_{i} \in m}\left(\mathbf{w}_{i}\right) avgwim(wi).
  • Entity nodes (E) n e n_e ne. Entity nodes represent unique entity concepts. The representation of an entity node is computed as the average of the mention ( m m m) representations associated with the entity, i.e. avg ⁡ m i ∈ e ( m i ) \operatorname{avg}_{m_{i} \in e}\left(\mathbf{m}_{i}\right) avgmie(mi).
  • Sentence nodes (S) n s n_s ns. Sentence nodes correspond to sentences. A sentence node is represented as the average of the word representations in the sentence, i.e. avg ⁡ w i ∈ s ( w i ) \operatorname{avg}_{w_{i} \in s}\left(\mathbf{w}_{i}\right) avgwis(wi).

To distinguish different node types in the graph, they concatenate a node type ( t t t) embedding to each node representation. The final node representations are then estimated as n m = [ avg ⁡ w i ∈ m ( w i ) ; t m ] , n e = \mathbf{n}_{m}=[\operatorname{avg}_{w_{i} \in m}\left(\mathbf{w}_{i}\right) ; \mathbf{t}_{m}], \mathbf{n}_{e}= nm=[avgwim(wi);tm],ne= [ avg ⁡ m i ∈ e ( m i ) ; t e ] , n s = [ avg ⁡ w i ∈ s ( w i ) ; t s ] [\operatorname{avg}_{m_{i} \in e}\left(\mathbf{m}_{i}\right) ; \mathbf{t}_{e}], \mathbf{n}_{s}=[\operatorname{avg}_{w_{i} \in s}\left(\mathbf{w}_{i}\right) ; \mathbf{t}_{s}] [avgmie(mi

在使用 GitLab 时,如果遇到数据库错误提示 `ERROR: relation "ci_pipelines" does not exist`,通常意味着数据库结构未正确迁移或某些数据库表缺失。这种情况常见于 GitLab 升级过程中,尤其是在版本跨度较大或数据库迁移未完成的情况下。 ### 可能的原因 1. **数据库迁移未完成**:GitLab 的某些功能依赖于数据库表的结构变更,升级后如果没有正确执行迁移脚本,就会导致表缺失或字段错误。 2. **升级中断或失败**:在升级 GitLab 的过程中,若 `gitlab-ctl reconfigure` 或 `gitlab-rake db:migrate` 执行失败,则可能导致数据库状态不一致。 3. **插件或自定义模块冲突**:某些第三方插件或自定义代码可能依赖特定的数据库表,如果插件版本与当前 GitLab 不兼容,也可能引发此类错误。 ### 解决方案 1. **检查数据库迁移状态** 运行以下命令检查当前数据库迁移状态: ```bash sudo -u git -H bundle exec rake db:migrate:status RAILS_ENV=production ``` 如果输出中显示 `Schema migrations table does not exist` 或部分迁移为 `down` 状态,则需要手动执行迁移。 2. **执行数据库迁移** 如果发现迁移未完成,可以尝试重新执行迁移任务: ```bash sudo -u git -H bundle exec rake db:migrate RAILS_ENV=production ``` 此操作将根据当前 GitLab 版本更新数据库结构,包括创建缺失的表如 `ci_pipelines` [^2]。 3. **升级前备份与恢复** 如果迁移失败,建议先恢复到上一个稳定状态的数据库备份,再尝试升级。确保在升级前执行: ```bash gitlab-rake gitlab:backup:create ``` 4. **检查 GitLab 版本兼容性** 确保所使用的 GitLab 版本与数据库结构兼容。如果从旧版本升级,建议逐步升级中间版本,而不是直接跳过多个版本。 5. **重新安装 GitLab 数据结构** 在极端情况下,可以尝试重建 GitLab 数据库结构(注意:此操作会清除现有数据): ```bash sudo -u git -H bundle exec rake db:drop RAILS_ENV=production sudo -u git -H bundle exec rake db:create RAILS_ENV=production sudo -u git -H bundle exec rake db:migrate RAILS_ENV=production ``` 6. **查看日志排查具体错误** 检查 GitLab 的日志文件以获取更详细的错误信息: ```bash sudo tail -f /var/log/gitlab/gitlab-rails/production.log ``` ### 预防措施 - 在升级 GitLab 前务必进行数据库备份。 - 使用官方推荐的升级路径,避免跨版本直接升级。 - 升级后运行 `gitlab-rake gitlab:check` 以验证系统状态 [^2]。 ---
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值