返回主目录
这是一个系列的文章,点击返回综合目录页
Add-one Smoothing
P
A
d
d
−
1
(
W
i
∣
W
i
−
1
)
=
C
(
W
i
−
1
,
W
i
)
+
1
C
(
W
i
)
+
V
P_{Add-1}(W_i|W_{i-1}) = \frac{C(W_{i-1},W_{i})+1}{C(W_i)+V}
PAdd−1(Wi∣Wi−1)=C(Wi)+VC(Wi−1,Wi)+1
Add-K Smoothing
P A d d − K ( W i ∣ W i − 1 ) = C ( W i − 1 , W i ) + K C ( W i ) + K V P_{Add-K}(W_i|W_{i-1}) = \frac{C(W_{i-1},W_{i})+K}{C(W_i)+KV} PAdd−K(Wi∣Wi−1)=C(Wi)+KVC(Wi−1,Wi)+K
Interpolation
核心思路:在当前的语料库中没有出现,不代表未来不会出现,因此,在计算Trigram概率的同时,考虑Unigram,Bigram,Trigram出现的频次
P I n t e r p o l a t i o n ( W n ∣ W n − 1 , W n − 2 ) = λ 1 P ( W n ∣ W n − 1 , W n − 2 ) + λ 2 P ( W n ∣ W n − 1 ) + λ 3 P ( W n ) P_{Interpolation}(W_n|W_{n-1},W_{n-2}) = \lambda _1P(W_n|W_{n-1},W_{n-2})+\lambda _2P(W_n|W_{n-1})+\lambda _3P(W_n) PInterpolation(Wn∣Wn−1,Wn−2)=λ1P(Wn∣Wn−1,Wn−2)+λ2P(Wn∣Wn−1)+λ3P(Wn)
Good-Turning Smoothing
对于 没有 出现过的单词
P
M
L
E
=
0
P_{MLE} = 0
PMLE=0
P
G
T
=
N
1
N
P_{GT} = \frac{N_1}{N}
PGT=NN1
N 1 N_1 N1表示出现1次的单词的数量
对于 出现过的单词
P
M
L
E
=
C
N
P_{MLE} = \frac{C}{N}
PMLE=NC
P
G
T
=
(
C
+
1
)
N
c
+
1
N
c
∗
N
P_{GT} = \frac{(C+1)N_{c+1}}{N_c*N}
PGT=Nc∗N(C+1)Nc+1