【Part two: Related Work】Relation Extraction with Distant Supervision(DS)

本文综述了使用远程监督进行关系抽取的研究进展,包括Mintz等人的开创性工作到引入深度学习模型的最新方法。讨论了不同假设下模型的优劣,并通过实验对比展示了各方法在精度和召回率上的表现。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Mintz, M.; Bills, S.; Snow, R.; and Jurafsky, D. 2009. Distant supervision for relation  extraction without labeled data.In Proceedings of ACL, 10031011.

1)Applying the DS method to RE task for the first time.

2)They proposed the above-mentioned 'all sentence' assumption.

3)Their task is to extract relations of Freebase. They tackle it by using the existing relations in Freebase as training data: for each related pair of entities they collect all sentences that mention both entities as input observation x, and use their relation type in Freebase as label y. Together with a set of unrelated pairs of entities as negative instances, they train a classifier to predict relations.

 

Riedel, S.; Yao, L.; and McCallum, A. 2010. Modeling relations and their mentions without labeled text. In Proceedings of ECML PKDD, 148163.

1)They argue that Mintzs distant supervision assumption(all sentence) is too strong and needs to be relaxed. So they employ the at-least-one-sentence assumption: If two entities participate in a relation, at least one sentence that mentions these two entities might express that relation.

2)In their work, they jointly model two tasks: (a)Whether two entities are related.(b) Whether this relation is mentioned in a given sentence.

3)For a pair of entities that appears together in at least one sentence, a relation variable Y denotes the relation between them, or NA if there is no such relation.And for each relation mention candidate i, they define a binary relation mention variable Z i that is true if and only if mention i is indeed expressing the relation Y between the two entities. They will use Z to denote the state of all mention candidates, and ǁzǁ to represent the number of active relation mentions for a given assignment z of Z. Their truth function is as follows:

 

original distant supervision approach

 

expressed-at-Least-Once Supervision

4)Their model with expressed-at-least-once assumption leads to 91% precision  for our top 1000 predictions. When compared to 87% precision for a model based on the distant supervision assumption(Mintz), this amounts to 31% error reduction.

 

Raphael Hoffmann, Congle Zhang, Xiao Ling,Luke Zettlemoyer, and Daniel S. Weld. 2011.Knowledge-based weak supervision for information extraction of overlapping relations. In Proceedings of ACL, pages 541550.

1)Hoffmann thinks that Riedel’s sentence-level variables are binary, and they only have a single aggregate variables that takes values r Є R(target relationship)∪{NA}, thereby ruling out overlapping relations. And Riedel et al.did not report sentence level performance.

2)They proposed MULTIR system which can extract overlapping relations, for example: R= Founded(Jobs,Apple) and CEO-of(Jobs, Apple).

3)To measure the impact of modeling overlapping relations, Instead of labeling each entity pair with the set of all true Freebase facts, they created a dataset where each true relation was used to create a different training example. Training MULTIR on this data simulates effects of conflicting supervision that can come from not modeling overlaps.

4)MULTIR achieves significantly higher recall with a consistently high level of precision. At the highest recall point, MULTIR reaches 72.4% precision and 51.9% recall, for an F1 score of 60.5% :

 

 

Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, and Christopher D. Manning. 2012. Multi-instance multi-label learning for relation extraction. In Proceedings of EMNLP-CoNLL, pages 455465.

1)They propose a novel approach to MIML learning for RE. The MIML is the first RE approach that jointly models both multiple instances(by modeling the latent labels assigned to instances)and multiple labels (by providing a simple method to capture dependencies between labels).

 

2)Their work is closest to Hoffmann et al. The differences are as follows :

Distinctive                  

Hoffmann

MIML

 

 

 

Method

 

 

Use a deterministic model that

aggregates latent instance labels

into a set of labelsfor the 

corresponding tuple by OR-ing

the classification results.

Use instead an object-level

classifier that is trained jointly

with the classifier that assigns

latent labels to instances and

Can capture dependencies

between labels.

 

Train

 

Use a perceptron-style additive 

parameter update approach.

Train in a Bayesian framework.

3)MIML model plate diagram :

   

in the figure:

n is the number of distinct entity tuples in D;

Mi is the set of mentions for the ith entity  pair;

x is a sentence and z is the latent relation classification for that sentence;

wz is the weight vector for the multi-class mention-level classifier;

k is the number of known relation labels in L;

yj is the top-level classification decision for the entity pair as to whether the jth relation holds;

wj is the weight vector for the binary top-

level classifier for the jth relation.

4)MIML-RE generally outperforms the current state of the art. In the Riedel dataset, MIML-RE has higher overall recall than the Riedel et al. model, and, for the same recall point, MIML-RE’s precision is between 2 and 15 points higher.

                  


Zeng, D.; Liu, K.; Chen, Y.; and Zhao, J. 2015. Distant Supervision for Relation Extraction

via Piecewise Convolutional Neural Networks. In Proceedings of EMNLP, 17531762.(PCNN)

 

1. They pointed out the two problems existing in the DS-RE and proposed solutions  

respectively. Their main contribution is to present the PCNN model for RE.

Problems

Solutions

Wrong label in DS

Consider DS-RE as a multi-instance problem in which the uncertainty of instance labels(bags)

Noise from feature extraction

Adopt convolutional architecture with piecewise max pooling to automatically learn relevant features


2. Their model PCNN :

Follow the order from left to right in the picture, their PCNN:

1) Vector representation : word embedding with position embedding.

2) The Convolutional layer is the same as traditional CNN , it outputs feature map.

3) PCNN made some changes to the pooling layer, they think traditional CNN’s max pooling operation is insufficient for RE, because single max pooling reduces the size of the hidden layers too rapidly and is too coarse to capture fine- grained features for relation extraction. So they propose a piece-wise max pooling procedure that returns the maximum value in each segment instead of a single maximum value. The feature map is divided into three sections for pooling through two entities position. The purpose is to better capture the structured information between two entities.

4) Finally, it is classified by the softmax layer.


3. Their result comparison of the proposed method with traditional approaches. :

1) ROC

 

2) Precision values for the top 100, top 200, and top 500 extracted relation Instances upon manual evaluation

                                   

Lin, Y.; Shen, S.; Liu, Z.; Luan, H.; and Sun, M. 2016.Neural Relation Extraction with Selective Attention over Instances. In Proceedings of ACL, 21242133. (APCNN)

1. They think Zeng’s at-least-one assume, which only one sentence is active for each entity pair, will lose a large amount of rich information containing in those neglected sentences. So they propose a sentence-level attention-based model for relation extraction,  which is expected to dynamically reduce the weights of those noisy instances.

2. Their model APCNN’s architecture:

 

mi is the original sentence for an entity pair.

ri is the representation of each sentence.

The CNN in this figure used Zeng’s PCNN.

ai is the weight of each sentence vector.

r is the representation of relation r.

ai is further defined as:

           

A is a weighted diagonal matrix.

3. The comparison of APCNN two experiments and Zeng’s PCNN experiment:

PCNN+max one

Select the sentence with the maximum probability corresponds to the relation r to tag the bag.

Zeng’s PCNN can be regarded as a special case as the APCNN+att, when the weight of the sentence with the highest probability is set to 1 and others to 0.

APCNN+ave weight

Assume that all sentences in the bag have the same contribution, and use the average of all the sentence

To tag the bag.

APCNN+att weight

Use a selective attention to de-emphasize the noisy sentence.

4. Their APCNN experiments result:


Ji, G.; Liu, K.; He, S.; and Zhao, J. 2017. Distant Supervision for Relation Extraction with Sentence-Level Attention and Entity Descriptions. In Proceedings of AAAI, 30603066.(APCNN+D)

 

1. This paper think that the existing approaches have flaws on selecting valid instances and lack of background knowledge about the entities. The entity descriptions can help to recognize whether a sentence express the relation or not.

2. Their model is based on Lin’s APCNN.

In addition, they use another CNN to extract entity descriptions’ feature vectors, and let the vectors of entities(e1 , e2) be close to that of descriptions , via adding constraints on the objective function of APCNNs. So their model called APCNN+D. They extract descriptions for entities from Freebase and Wikipedia pages.

  It is worth mentioning that they use vrelation = e1 - e2  to represent the relation r.  If an instance expresses the relation r,  its feature vector should has higher similarity with vrelation , otherwise lower similarity.

  The APCNN model in their paper :

 

3. There result:

1) ROC :

 

2) Precision values for the top 100, top 200, and top 500 extracted relation Instances upon manual evaluation

 


Tianyu Liu, Kexiang Wang, B. C., and Sui, Z. 2017. A soft-label method for noise-tolerant distantly supervised relation extraction. In EMNLP, 1791-1796.(soft-label)


1. They consider that all the previous work use hard labels which are determined by distant supervision and immutable during training. So they introduce an entity-pair level(not sentence-level) denoise method which exploits semantic information from correctly labeled entity pairs to correct wrong labels dynamically during training. That means the same bag may have different labels in different epochs of training.

2. Their model is based on APCNN and then added soft-label during training. The soft-label is calculated by a joint score function which combines the relational scores based on the entity-pair representation and the confidence of the hard label.

3. The soft-label’s improvement of PCNN and APCNN:

Xiangrong Zeng , Shizhu He , Kang Liu , Jun Zhao .2018.Large Scaled Relation Extraction with Reinforcement Learning. In Proceedings of AAAI,(Reinforcement+RE)

1. To solve the wrong label problem, they propose a novel model with reinforcement learning.

2. Their model based on CNN is called PECNN (position enhance), because the position embedding is not only for composing the entity representations, but also be concatenated with the output of the pooling layer. The structure of PECNN :

 

3. The PECNN is trained with reinforcement learning, and to learn the relation extractor without the direct guide, they introduce the policy gradient method in reinforcement learning.

Training

Reinforcement 

A bag

A episode

Sentences

States

The relation  

Actions

The relation extractor

RL agent

 

1) The advantage of state (si) is calculated by :

Before all sentences in bag has been extracted, the rewards of state are set to 0 (that is ri = 0,i = 1,...,n1) , since we don’t know if this episode is good or not. So R(si) can be simplified as :

                                

The order of sentences in bag should not influence the predicted result, so we setγ=1 and we have :

           

In the experiment, rn is set to +1 or -1. If the gold relation of this bag is the same as the predicted relation, the episode reward will be set to +1.

2) Their objective function to optimize the policy is followed by Williams’ REINFORCE algorithm in1992.

4. Their result :

 


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值