A Relation-Oriented Clustering Method for Open Relation Extraction

最新推荐文章于 2024-10-06 16:53:32 发布

原创

最新推荐文章于 2024-10-06 16:53:32 发布 · 680 阅读

2 ·

CC 4.0 BY-SA版权

文章标签：

#数据库 #database #NLP

本文提出一种面向关系的聚类方法RoCORE，利用预定义关系的标记数据优化非线性映射，将高维实体对表示转换为面向关系的表示，实现对未标记新关系的聚类和识别。实验表明，此方法在两个真实世界数据集上显著降低错误率。

1.Abstract

The clustering-based unsupervised relation discovery method has gradually become one of the important methods of open relation extraction (OpenRE). 基于聚类的无监督关系发现方法逐渐成为开放关系抽取（OpenRE）的重要方法之一。

However, high-dimensional vectors can encode complex linguistic information which leads to the problem that the derived clusters cannot explicitly align with the relational semantic classes.然而，高维向量可以编码复杂的语言信息，这导致派生的簇不能与关系语义类显式对齐的问题。

In this work, we propose a relation oriented clustering model and use it to identify the novel relations in the unlabeled data. 在这项工作中，我们提出了一个面向关系的聚类模型，并用它来识别未标记数据中的新关系。

Specifically, to enable the model to learn to cluster relational data, our method leverages the readily available labeled data of pre-defined relations to learn a relation oriented representation. 具体来说，为了使模型能够学习对关系数据进行聚类，我们的方法利用预定义关系的现成标记数据来学习面向关系的表示。

We minimize distance between the instance with same relation by gathering the instances towards their corresponding relation centroids to form a cluster structure, so that the learned representation is cluster-friendly. 我们通过将实例聚集到它们对应的关系质心以形成集群结构来最小化具有相同关系的实例之间的距离，从而使学习到的表示对集群友好。

To reduce the clustering bias on predefined classes, we optimize the model by minimizing a joint objective on both labeled and unlabeled data.为了减少预定义类的聚类偏差，我们通过最小化标记和未标记数据的联合目标来优化模型。

Experimental results show that our method reduces the error rate by 29.2% and 15.7%, on two datasets respectively, compared with current SOTA methods.实验结果表明，与当前的 SOTA 方法相比，我们的方法在两个数据集上分别将错误率降低了 29.2% 和 15.7%。

2.Introduction

Relation extraction (RE), a crucial basic task in the field of information extraction, is of the utmost practical interest to various fields including web search (Xiong et al., 2017), knowledge base completion (Bordes et al., 2013), and question answering (Yu et al., 2017).关系抽取（RE），信息抽取领域的一项关键基础任务，对包括网络搜索（Xiong 等人，2017 年）、知识库补全（Bordes 等人，2013 年）和问答（Yu 等人，2017 年）在内的各个领域都具有最大的实际意义。

However, conventional RE paradigms such as supervision and distant supervision are generally designed for pre-defined rel