2.5 所有路径的总得分
上节中,我们介绍了如何计算一个路径的标签得分eSie^{S_i}eSi,那么,还有一个问题需要解决,即怎么计算所有路径的总得分:Ptotal=P1+P2+…+PN=eS1+eS2+…+eSNP_{total} = P_1 + P_2 + … + P_N = e^{S_1} + e^{S_2} + … + e^{S_N}Ptotal=P1+P2+…+PN=eS1+eS2+…+eSN
计算总得分最简单的方法就是:枚举所有可能的路径,然后将每条路径的得分加起来,这样虽然能实现目标,但是该方式是非常低效的,训练可能需要很久很久…
在学习下边的内容前,强烈建议你拿出草稿纸和笔,跟着示例步骤,一步一步往下走,用这种方式肯定能帮助你更好地理解算法细节,从而,你可以使用你喜欢的编程语言进行实现。
Step 1:回顾CRF损失函数
在2.3节(上一片博客)中,CRF损失函数定义如下:
LossFunction=PRealPathP1+P2+...+PNLossFunction=\frac{P_{RealPath}}{P_1+P_2+...+P_N}LossFunction=P1+P2+...+PNPRealPath
为了便于计算,我们将该损失函数改写为log损失函数,如下:
LogLossFunction=logPRealPathP1+P2+...+PNLogLossFunction=log\frac{P_{RealPath}}{P_1+P_2+...+P_N}LogLossFunction=logP1+P2+...+PNPRealPath
因为在训练模型时,我们通常是最小化损失函数,因此我们在上式乘以-1:
LogLossFunctionLogLossFunctionLogLossFunction
=−logPRealPathP1+P2+...+PN=-log\frac{P_{RealPath}}{P_1+P_2+...+P_N}=−logP1+P2+...+PNPRealPath
=−logeSRealPatheS1+eS2+...+eSN=-log\frac{e^{S_{RealPath}}}{e^{S_1}+e^{S_2}+...+e^{S_N}}=−logeS1+eS2+...+eSNeSRealPath
=−(log(eSRealPath)−log(eS1+eS2+...+eSN)=-(log(e^{S_{RealPath}})-log(e^{S_1}+e^{S_2}+...+e^{S_N})=−(log(eSRealPath)−log(eS1+eS2+...+eSN)
=−(SRealPath−log(eS1+eS2+...+eSN)=-(S_{RealPath}-log(e^{S_1}+e^{S_2}+...+e^{S_N})=−(SRealPath−log(eS1+eS2+...+eSN)
=−(∑i=1Nxiyi+∑i=1N−1tyiyi+1−log(eS1+eS2+...+eSN)=-(\sum_{i=1}^{N}x_{iy_i}+\sum_{i=1}^{N-1}t_{y_iy_{i+1}}-log(e^{S_1}+e^{S_2}+...+e^{S_N})=−(∑i=1Nxiyi+∑i=1N−1tyiyi+1−log(eS1+eS2+...+eSN)
在之前的章节中,我们已经知道了如何计算真实路径得分,现在我们需要找到一种有效的方法来计算log(eS1+eS2+...+eSN)log(e^{S_1}+e^{S_2}+...+e^{S_N})log(eS1+eS2+...+eSN)。
Step2:回顾发射和转移得分
假设我们正在基于一个长度为3的句子训练模型:
x=[w0,w1,w2]\mathbf{x} = [w_0, w_1, w_2]x=[w0,w1,w2]
训练数据中,只有两个标签:
LabelSet={l1,l2}LabelSet = \{l_1,l_2\}LabelSet={l1,l2}
我们从BiLSTM层获取的发射得分结果如下表所示:
l1\mathbf{l_1}l1 | l2\mathbf{l_2}l2 | |
---|---|---|
w0\mathbf{w_0}w0 | x01x_{01}x01 | x02x_{02}x02 |
w1\mathbf{w_1}w1 | x11x_{11}x11 | x12x_{12}x12 |
w2\mathbf{w_2}w2 | x21x_{21}x21 | x22x_{22}x22 |
xijx_{ij}xij是wiw_iwi被标记为ljl_jlj的得分。
CRF层的转移得分如下表所示:
l1\mathbf{l_1}l1 | l2\mathbf{l_2}l2 | |
---|---|---|
l1\mathbf{l_1}l1 | t11t_{11}t11 | t12t_{12}t12 |
l2\mathbf{l_2}l2 | t21t_{21}t21 | t22t_{22}t22 |
tijt_{ij}tij是标签从iii到jjj的转移得分。
Step3:计算(备好纸和笔 again)
提示: 我们的目标是计算 log(eS1+eS2+…+eSN)\log(e^{S_1} + e^{S_2} + … + e^{S_N})log(eS1+eS2+…+eSN)
上式的计算是一个累加过程,其思想和动态编程相似(即使你不了解动态编程,你也可以继续,我会一步一步解释示例的,但是还是强烈推荐您去学习一下动态编程算法),w0w_0w0所有路径的总得分计算后,要计算w0→w1w_0 → w_1w0→w1的总得分,然后我们利用上述总得分计算 w0→w1→w2w_0 → w_1 → w_2w0→w1→w2的得分,该得分就是我们所需要的最终得分。
在下述步骤中,你将会看到两个变量:obs和previous,previous是先前步骤的最终结果,obs是当前单词的信息。
w0w_0w0:
obs=[x01,x02]obs = [x_{01}, x_{02}]obs=[x01,x02]
previous=Noneprevious = Noneprevious=None
如果该语句中仅有一个单词w0w_0w0,我们没有之前的词的结果,所以 previous 为 None。此外,我们也只能获取到第一个单词,其信息 obs=[x01,x02]obs = [x_{01}, x_{02}]obs=[x01,x02],其中x01x_{01}x01 和 x02x_{02}x02即上述所说的发射得分。
那么 w0w_0w0所有可能路径的总得分即为:
TotalScore(w0)=log(ex01+ex02)TotalScore(w_0)=\log (e^{x_{01}} + e^{x_{02}})TotalScore(w0)=log(ex01+ex02)
w0w_0w0 → w1w_1w1:
obs=[x11,x12]obs = [x_{11}, x_{12}]obs=[x11,x12]
previous=[x01,x02]previous = [x_{01}, x_{02}]previous=[x01,x02]
1)将 previous扩展为:
previous=(x01x01x02x02)previous=(\begin{matrix} x_{01} & x_{01} \\ x_{02} & x_{02}\end{matrix})previous=(x01x02x01x02)
2)将 obs 扩展为:
obs=(x11x12x11x12)obs=(\begin{matrix} x_{11} & x_{12} \\ x_{11} & x_{12}\end{matrix})obs=(x11x11x12x12)
我们之所以这样做的原始是:矩阵可以使得所有得分计算的效率更高。
3)将 previous obs和转移得分进行相加:
scores=(x01x01x02x02)+(x11x12x11x12)+(t11t12t21t22)scores=(\begin{matrix} x_{01} & x_{01} \\ x_{02} & x_{02}\end{matrix})+(\begin{matrix} x_{11} & x_{12} \\ x_{11} & x_{12}\end{matrix})+(\begin{matrix} t_{11} & t_{12} \\ t_{21} & t_{22}\end{matrix})scores=(x01x02x01x02)+(x11x11x12x12)+(t11t21t12t22)
得到
scores=(x01+x11+t11x01+x12+t12x02+x11+t21x02+x12+t22)scores=(\begin{matrix} x_{01}+ x_{11}+t_{11} & x_{01} +x_{12}+t_{12}\\ x_{02}+x_{11}+ t_{21} & x_{02}+ x_{12}+ t_{22}\end{matrix})scores=(x01+x11+t11x02+x11+t21x01+x12+t12x02+x12+t22)
更新 previous:
previous=[log(ex01+x11+t11+ex02+x11+t21),log(ex01+x12+t12+ex02+x12+t22)]previous=[log(e^{x_{01}+x_{11}+t_{11}}+e^{x_{02}+x_{11}+t_{21}}),log(e^{x_{01}+x_{12}+t_{12}}+e^{x_{02}+x_{12}+t_{22}})]previous=[log(ex01+x11+t11+ex02+x11+t21),log(ex01+x12+t12+ex02+x12+t22)]
这样,第二轮迭代就完成了,从 w0→w1w_0 → w_1w0→w1的所有路径: (label1label_1label1 → label1label_1label1, label1label_1label1 → label2label_2label2, label2label_2label2 → label1label_1label1, label2label_2label2 → label2label_2label2),其总得分可以用下式计算:
TotalScore(w0→w1)TotalScore(w_0 → w_1)TotalScore(w0→w1)
=log(eprevious[0]+eprevious[1])=\log (e^{previous[0]} + e^{previous[1]})=log(eprevious[0]+eprevious[1])
=log(elog(ex01+x11+t11+ex02+x11+t21)+elog(ex01+x12+t12+ex02+x12+t22))=\log (e^{\log(e^{x_{01}+x_{11}+t_{11}} + e^{x_{02}+x_{11}+t_{21}})}+
e^{\log(e^{x_{01}+x_{12}+t_{12}} + e^{x_{02}+x_{12}+t_{22}})}
)=log(elog(ex01+x11+t11+ex02+x11+t21)+elog(ex01+x12+t12+ex02+x12+t22))
=log(ex01+x11+t11+ex02+x11+t21+ex01+x12+t12+ex02+x12+t22)=\log(e^{x_{01}+x_{11}+t_{11}}+e^{x_{02}+x_{11}+t_{21}}+e^{x_{01}+x_{12}+t_{12}}+e^{x_{02}+x_{12}+t_{22}})=log(ex01+x11+t11+ex02+x11+t21+ex01+x12+t12+ex02+x12+t22)
这就是我们的目标log(eS1+eS2+…+eSN)\log(e^{S_1} + e^{S_2} + … + e^{S_N})log(eS1+eS2+…+eSN)。
从上式中我们可以发现
- S1=x01+x11+t11S_1 = x_{01}+x_{11}+t_{11}S1=x01+x11+t11 (label1label_1label1 → label1label_1label1)
- S2=x02+x11+t21S_2 = x_{02}+x_{11}+t_{21}S2=x02+x11+t21 (label2label_2label2 → label1label_1label1)
- S3=x01+x12+t12S_3 = x_{01}+x_{12}+t_{12}S3=x01+x12+t12 (label1label_1label1 → label2label_2label2)
- S4=x02+x12+t22S_4 = x_{02}+x_{12}+t_{22}S4=x02+x12+t22 (label2label_2label2 → label2label_2label2)
w0w_0w0 → w1w_1w1 → w2w_2w2:
当你读到这里的时候,其实你已经学会了,在这次迭代中,我们知道按照上述的方式进行相同操作即可
obs=[x21,x22]obs = [x_{21}, x_{22}]obs=[x21,x22]
previous=[log(ex01+x11+t11+ex02+x11+t21),log(ex01+x12+t12+ex02+x12+t22)]previous=[\log (e^{x_{01}+x_{11}+t_{11}} + e^{x_{02}+x_{11}+t_{21}}), \log (e^{x_{01}+x_{12}+t_{12}} + e^{x_{02}+x_{12}+t_{22}})]previous=[log(ex01+x11+t11+ex02+x11+t21),log(ex01+x12+t12+ex02+x12+t22)]
1)将previous扩展为:
previous=(log(ex01+x11+t11+ex02+x11+t21)log(ex01+x11+t11+ex02+x11+t21)log(ex01+x12+t12+ex02+x12+t22)log(ex01+x12+t12+ex02+x12+t22))previous=\left( \begin{matrix}\log (e^{x_{01}+x_{11}+t_{11}} + e^{x_{02}+x_{11}+t_{21}}) & \log (e^{x_{01}+x_{11}+t_{11}} + e^{x_{02}+x_{11}+t_{21}})\\\log (e^{x_{01}+x_{12}+t_{12}} + e^{x_{02}+x_{12}+t_{22}})&\log (e^{x_{01}+x_{12}+t_{12}} + e^{x_{02}+x_{12}+t_{22}})\end{matrix}\right)previous=(log(ex01+x11+t11+ex02+x11+t21)log(ex01+x12+t12+ex02+x12+t22)log(ex01+x11+t11+ex02+x11+t21)log(ex01+x12+t12+ex02+x12+t22))
2)将obs扩展为:
obs=(x21x22x21x22)obs = \left( \begin{matrix} x_{21} & x_{22}\\
x_{21} & x_{22} \end{matrix}\right)obs=(x21x21x22x22)
3)将 previous,obs和转移得分加起来:
scores=(log(ex01+x11+t11+ex02+x11+t21)log(ex01+x11+t11+ex02+x11+t21)log(ex01+x12+t12+ex02+x12+t22)log(ex01+x12+t12+ex02+x12+t22))+(x21x22x21x22)+(t11t12t21t22)scores=\left( \begin{matrix}\log (e^{x_{01}+x_{11}+t_{11}} + e^{x_{02}+x_{11}+t_{21}}) & \log (e^{x_{01}+x_{11}+t_{11}} + e^{x_{02}+x_{11}+t_{21}})\\\log (e^{x_{01}+x_{12}+t_{12}} + e^{x_{02}+x_{12}+t_{22}})&\log (e^{x_{01}+x_{12}+t_{12}} + e^{x_{02}+x_{12}+t_{22}})\end{matrix}\right)+\left( \begin{matrix} x_{21} & x_{22}\\
x_{21} & x_{22} \end{matrix}\right)+(\begin{matrix} t_{11} & t_{12} \\ t_{21} & t_{22}\end{matrix})scores=(log(ex01+x11+t11+ex02+x11+t21)log(ex01+x12+t12+ex02+x12+t22)log(ex01+x11+t11+ex02+x11+t21)log(ex01+x12+t12+ex02+x12+t22))+(x21x21x22x22)+(t11t21t12t22)
然后:
scores=(log(ex01+x11+t11+ex02+x11+t21)+x21+t11log(ex01+x11+t11+ex02+x11+t21)+x22+t12log(ex01+x12+t12+ex02+x12+t22)+x21+t21log(ex01+x12+t12+ex02+x12+t22)+x22+t22)scores=\left( \begin{matrix}\log (e^{x_{01}+x_{11}+t_{11}} + e^{x_{02}+x_{11}+t_{21}})+x_{21}+ t_{11} & \log (e^{x_{01}+x_{11}+t_{11}} + e^{x_{02}+x_{11}+t_{21}})+x_{22}+t_{12}\\\log (e^{x_{01}+x_{12}+t_{12}} + e^{x_{02}+x_{12}+t_{22}})+x_{21}+t_{21}&\log (e^{x_{01}+x_{12}+t_{12}} +e^{x_{02}+x_{12}+t_{22}})+ x_{22}+ t_{22}\end{matrix}\right)scores=(log(ex01+x11+t11+ex02+x11+t21)+x21+t11log(ex01+x12+t12+ex02+x12+t22)+x21+t21log(ex01+x11+t11+ex02+x11+t21)+x22+t12log(ex01+x12+t12+ex02+x12+t22)+x22+t22)
更新previous:
previous=[previous =[previous=[
logelog(ex01+x11+t11+ex02+x11+t21)+x21+t11+elog(ex01+x11+t11+ex02+x11+t21)+x22+t12),log(elog(ex01+x12+t12+ex02+x12+t22)+x21+t21+elog(ex01+x12+t12+ex02+x12+t22)+x22+t22)\begin{matrix} \log e^{\log (e^{x_{01}+x_{11}+t_{11}} + e^{x_{02}+x_{11}+t_{21}})+x_{21}+ t_{11}}+e^{\log (e^{x_{01}+x_{11}+t_{11}} + e^{x_{02}+x_{11}+t_{21}})+x_{22}+t_{12}}),\\\log(e^{\log (e^{x_{01}+x_{12}+t_{12}} + e^{x_{02}+x_{12}+t_{22}})+x_{21}+t_{21}}+e^{\log (e^{x_{01}+x_{12}+t_{12}} +e^{x_{02}+x_{12}+t_{22}})+ x_{22}+ t_{22}})\end{matrix}logelog(ex01+x11+t11+ex02+x11+t21)+x21+t11+elog(ex01+x11+t11+ex02+x11+t21)+x22+t12),log(elog(ex01+x12+t12+ex02+x12+t22)+x21+t21+elog(ex01+x12+t12+ex02+x12+t22)+x22+t22)
]]]
=[log((ex01+x11+t11+ex02+x11+t21)ex21+t11+(ex01+x12+t12+ex02+x12+t22)ex21+t21),=[\log(
(e^{x_{01}+x_{11}+t_{11}} + e^{x_{02}+x_{11}+t_{21}})e^{x_{21} + t_{11}}
+
(e^{x_{01}+x_{12}+t_{12}} + e^{x_{02}+x_{12}+t_{22}})e^{x_{21} + t_{21}}
),=[log((ex01+x11+t11+ex02+x11+t21)ex21+t11+(ex01+x12+t12+ex02+x12+t22)ex21+t21),
log((ex01+x11+t11+ex02+x11+t21)ex22+t12+(ex01+x12+t12+ex02+x12+t22)ex22+t22)]\log(
(e^{x_{01}+x_{11}+t_{11}} + e^{x_{02}+x_{11}+t_{21}})e^{x_{22} + t_{12}}
+
(e^{x_{01}+x_{12}+t_{12}} + e^{x_{02}+x_{12}+t_{22}})e^{x_{22} + t_{22}})]log((ex01+x11+t11+ex02+x11+t21)ex22+t12+(ex01+x12+t12+ex02+x12+t22)ex22+t22)]
如上所述,我们使用新更新的previous来计算总得分:
TotalScore(w0→w1→w2)TotalScore(w_0 → w_1 → w_2)TotalScore(w0→w1→w2)
=log(eprevious[0]+eprevious[1])=\log (e^{previous[0]} + e^{previous[1]})=log(eprevious[0]+eprevious[1])
=log(elog((ex01+x11+t11+ex02+x11+t21)ex21+t11+(ex01+x12+t12+ex02+x12+t22)ex21+t21)=\log (e^{\log(
(e^{x_{01}+x_{11}+t_{11}} + e^{x_{02}+x_{11}+t_{21}})e^{x_{21} + t_{11}}
+
(e^{x_{01}+x_{12}+t_{12}} + e^{x_{02}+x_{12}+t_{22}})e^{x_{21} + t_{21}}
)}=log(elog((ex01+x11+t11+ex02+x11+t21)ex21+t11+(ex01+x12+t12+ex02+x12+t22)ex21+t21)
+elog((ex01+x11+t11+ex02+x11+t21)ex22+t12+(ex01+x12+t12+ex02+x12+t22)ex22+t22))+e^{\log(
(e^{x_{01}+x_{11}+t_{11}} + e^{x_{02}+x_{11}+t_{21}})e^{x_{22} + t_{12}}
+
(e^{x_{01}+x_{12}+t_{12}} + e^{x_{02}+x_{12}+t_{22}})e^{x_{22} + t_{22}})}
)+elog((ex01+x11+t11+ex02+x11+t21)ex22+t12+(ex01+x12+t12+ex02+x12+t22)ex22+t22))
=log(ex01+x11+t11+x21+t11+ex02+x11+t21+x21+t11=\log (e^{x_{01}+x_{11}+t_{11}+x_{21}+t_{11}}+e^{x_{02}+x_{11}+t_{21}+x_{21}+t_{11}}=log(ex01+x11+t11+x21+t11+ex02+x11+t21+x21+t11
+ex01+x12+t12+x21+t21+ex02+x12+t22+x21+t21+e^{x_{01}+x_{12}+t_{12}+x_{21}+t_{21}}+e^{x_{02}+x_{12}+t_{22}+x_{21}+t_{21}}+ex01+x12+t12+x21+t21+ex02+x12+t22+x21+t21
+ex01+x11+t11+x22+t12+ex02+x11+t21+x22+t12+e^{x_{01}+x_{11}+t_{11}+x_{22}+t_{12}}+e^{x_{02}+x_{11}+t_{21}+x_{22}+t_{12}}+ex01+x11+t11+x22+t12+ex02+x11+t21+x22+t12
+ex01+x12+t12+x22+t22+ex02+x12+t22+x22+t22)+e^{x_{01}+x_{12}+t_{12}+x_{22}+t_{22}}+e^{x_{02}+x_{12}+t_{22}+x_{22}+t_{22}})+ex01+x12+t12+x22+t22+ex02+x12+t22+x22+t22)
撒花
我们已经得到了目标:log(eS1+eS2+…+eSN)\log(e^{S_1} + e^{S_2} + … + e^{S_N})log(eS1+eS2+…+eSN),因为该示例中,这一句话中有3个词,有两个标签,因此,总共有8个可能的路径。
虽然该过程看起来比较复杂,但是用编程语言实现该算法是很简单的,因为计算机比较擅长处理重复性工作。
下一篇博文
下篇博客将介绍如何预测一句话的标签。