1.NER 进化
- 早期 ner :需要很多的human effort 来指定规则和特征
(1)什么是 NER
有两种主张:
serving as a name for something or someone
proper names and natural kind terms like biological species
and substances.
不论如何,现在主要把NER 划分为general NE ,domain-specific NE
按照粒度划分NER 任务
coarse-grained NER
:仅有少数类别,并且一个NE 只有一个entity type
fine-grained NER
:有很多类别,一个NE 可能有不同的entity types
(2) DL 中的 NER
- 需要最少的
feature engineering
,原文中 引用17-21
已经提到了一些 SOTA 的 DL 模型,原文中引用22-26
提到了一些关于NER 的 survey
论文中系统地 把DL NER 划分为三个部分:
- distributed representations for input
- context encoder (for capturing contextual dependencies for tag decoder)
- tag decoder (for predicting labels of words in the given sequence)
(3) NER 数据集和工具
见原文
(4) NER 评估
boundary
和type
都是对才为分类正确
- 三个类别:
-
True Positive (TP): entities that are recognized by
NER and match ground truth. -
False Positive (FP): entities that are recognized byNER but do not match ground truth.
-
False Negative (FN): entities annotated in the ground
truth that are not recognized by NER.
有两个分类标准:precision
recall
F_beta
用于综合两者precision = TP / TP+FP recall = TP / TP+FN
2.macro-averaged F-score
: 单独计算每个类取平均
micro-averaged F-score
:一块算,大烩菜!!
2.NER传统方法
(1)依赖 手工构造规则
依赖于手工构造的规则,可能是基于