Week5-2PP attachment 1

本文介绍了一种将PPattachment问题转化为二元分类的方法,输入为介词短语及其上下文,输出为高或低附着标签。实际应用中,只考虑介词及周边四个词汇作为特征,这些特征足以涵盖所需信息并确保方法的一致性和可扩展性。通过监督学习评估模型准确性,并与基线方法进行对比。

PP attachment

  • High(verbal, attached to VP)
  • Low(nominal, attached to NP)

这里写图片描述

with the net is attached to the word caught, and it has no associations with butterfly.

这里写图片描述

We could formulate the PP attachment as a binary classification problem.

  • Input: a pp and possibly the surrounding context
  • Output: a binary label: 0 or 1, low or high
  • In practice, the context only consists of 4 words:
    • the preposition
    • the verb before the preposition
    • the noun before the preposition
    • the noun after the preposition
    • Example: join board as director

Why only 4 words?
- Almost all the information need to classify a prepositional phrase’s attachment is contained in these 4 features
- Using the tuples of 4 features allow for a consistent and scalable approach

Sample tuples

这里写图片描述

Supervised learning: evaluation

  • Manually label sets of sentences
  • Split the labeled data into training and testing sets
  • Use training data to find patterns
  • Apply these patterns on the testing data
  • For evaluation: use Accuracy (the percentage of correct labels that a given algorithm has assigned on the testing data)
  • Compare with the simple baseline method

The simplest baseline method is to find the more common class (label) in the training data and assign it to al instances of the test data.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值