指代消解_论文理解《Improving Coreference Resolution by Learning Entity-Level Distributed Representations》

最新推荐文章于 2025-01-03 09:29:54 发布

iteewxs

最新推荐文章于 2025-01-03 09:29:54 发布

阅读量1.2k

点赞数 2

CC 4.0 BY-SA版权

分类专栏： nlp 文章标签：指代消解 nlp 指代消歧 coreference resolution

本文链接：https://blog.youkuaiyun.com/weixin_42538265/article/details/97937372

论文《Improving Coreference Resolution by Learning Entity-Level Distributed Representations》

段落：

System Architecture
Building Representations
3.1. Mention-Pair Encoder
3.2. Cluster-Pair Encoder
Mention-Ranking Model
Cluster-Ranking Model
5.1. Cluster-Ranking Policy Network
5.2. Easy-First Cluster Ranking
5.3 Deep Learning to Search
Experiments and Results
6.1. Mention-Ranking Model Experiments
6.2. Cluster-Ranking Model Experiments

Abstract

A long-standing challenge in coreference resolution has been the incorporation of entity-level information – features defined over clusters of mentions instead of mention pairs. We present a neural network based coreference system that pro- duces high-dimensional vector representations for pairs of coreference clusters. Using these representations, our system learns when combining clusters is desirable. We train the system with a learning-to-search algorithm that teaches it which local decisions (cluster merges) will lead to a high-scoring final corefer-ence partition. The system substantially outperforms the current state-of-the-art on the English and Chinese portions of the CoNLL 2012 Shared Task dataset despite using few hand-engineered features.

指代消解中长期存在的挑战是实体级信息的整合 - 在mention集群而不是提及对上定义的特征。我们提出了一种基于神经网络的指代消解系统，该系统可以为共参照簇对生成高维矢量表示。使用这些表示，我们的系统可以了解何时组合集群是可行的。我们使用学习搜索算法训练系统，该算法教会哪些本地决策（群集合并）将导致高分最终核心分区。尽管使用了少量手工设计的功能，该系统在CoNLL 2012共享任务数据集的英文和中文部分上大大优于当前最新技术水平。

Coreference resolution, the task of identifying which mentions in a text refer to the same real- world entity, is fundamentally a clustering prob-lem. However, many recent state-of-the-art coref- erence systems operate solely by linking pairs of mentions together (Durrett and Klein, 2013; Martschat and Strube, 2015; Wiseman et al., 2015).

指代消解，即识别文本中mention哪些参考同一现实世界实体的任务，从根本上说是一个聚类问题。然而，许多最近最先进的核心系统仅通过将一对提及的方式联系起来。

An alternative approach is to use agglomera- tive clustering, treating each mention as a single- ton cluster at the outset and then repeatedly merg- ing clusters of mentions deemed to be referring to the same entity. Such systems can take advan- tage of entity-level information, i.e., features be- tween clusters of mentions instead of between just two mentions. As an example for why this is use- ful, it is clear that the clusters {Bill Clinton} and{Clinton, she} are not referring to the same entity, but it is ambiguous whether the pair of mentions Bill Clinton and Clinton are coreferent.

另一种方法是使用聚集聚类，在开始时将每个mention视为一个单一的聚类，然后反复合并被认为是指同一实体的提及聚类。这样的系统可以利用实体级信息，即提及的集群之间的特征，而不是仅仅两个mention之间的特征。作为一个有用的原因的一个例子，很明显，集群{Bill Clinton}和{Clinton, she}并不是指同一个实体，但是提到比尔克林顿和克林顿是否具有共识是不明确的。

Previous work has incorporated entity-level in- formation through features that capture hard con- straints like having gender or number agreement between clusters (Raghunathan et al., 2010; Dur- rett et al., 2013). In this work, we instead train a deep neural network to build distributed represen- tations of pairs of coreference clusters. This cap- tures entity-level information with a large numb