论文笔记：Neural Architectures for Named Entity Recognition

原创于 2021-08-06 14:33:42 发布 · 267 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#python #nlp

NLP-beginner 专栏收录该内容

5 篇文章

订阅专栏

本文详细探讨了使用双向LSTM生成的分数矩阵在CRF（条件随机场）模型中的作用，介绍了如何计算词级别的得分和序列预测概率，以及如何通过最大似然估计来优化正确标签序列。重点讲解了从状态转移概率矩阵中进行的序列预测和概率计算过程。

Neural Architectures for Named Entity Recognition

2.3 CRF Tagging Models

Input sentence: $X=(X1,X2,...,XN)\mathbf{X}=\mathbf{(X_1,X_2,...,X_N)}$
Matrix of scores (output by BiLSTM) : $P∈Rn×k\mathbf{P}\in \mathbb{R}^{n\times k}$

-Score of the $j^{th}$ of $i^{th}$ word in a sentence : $P_{i,j}$
The numbers of distinct tags: $k$
Sequence of predictions: $y=(y1,y2,...,yn)\mathbf{y}=(y_1,y_2,...,y_n)$

$y\mathbf{y}$ 's score: $s(X,y)=∑i=0nAyi,yi+1+∑i=0nPi,yis(\mathbf{X}, \mathbf{y})=\sum\limits_{i=0}^nA_{y_i,y_{i+1}}+\sum\limits_{i=0}^nP_{i,y_i}$

Matrix of transition scores: $\in \mathbb{R}^{(k+2) \times(k+2)}$
- score of a transition from the tag $i$ to tag $j$ : $A_{i,j}$
- start and end tag: $y_0, y_n$

Probability for the sequence $y\mathbf{y}$ :

$p(y∣X)=es(X,y)∑y~∈Yxes(X,y~)p(\mathbf{y}|\mathbf{X})=\frac{e^{s(\mathbf{X}, \mathbf{y})}}{\sum_{\tilde{y}}\in\mathbf{Y_x}e^{s(\mathbf{X}, \mathbf{\tilde{y}})}}$

Maximize the log_probability of the correct tag sequence:

$log(p(y∣X))=s(X,y)−log(∑y~∈Yxes(X,y~)=s(X,y)−logaddy~∈Yxs(X,y~)log(p(\mathbf{y}|\mathbf{X}))={s(\mathbf{X}, \mathbf{y})}-log(\sum\limits_{{\tilde{y}}\in\mathbf{Y_x}}e^{s(\mathbf{X}, \mathbf{\tilde{y}}})=s(\mathbf{X}, \mathbf{y})-{logadd}_{\tilde{y}\in\mathbf{Y_x}} s(\mathbf{X}, \mathbf{\tilde{y}})$