N-grams语言模型解析-优快云博客

N-grams 语言模型

P(W) = P(W1,W2,W3...Wn)
= P(W5|W1,W2,W3,W4)
计算给定句子的概率，主要说明了这些词语结合的如何
上面的计算，依赖Chain Rule of Probability.
主要是条件概率

P(A|B) = P(A,B)/P(B)

P(A,B) = P(A)P(A|B)

P(A,B,C,D)=P(A)P(B|A)P(C|A,B)P(D|A,B,C)

更一般：

P(X1,X2,X3,X4,...Xn) = P(X1)P(X1|X2)P(X3|X1,X2)P(X4|PX1,PX2,PX3)...P(Xn|PX1,PX2,PX3...PXn-1)

P(W1,W2...Wn) = {连乘i}P(Wi|W1,W2...Wi-1)

P(the | its water is so transparent that) =
Count(its water is so transparent that the) / Count(its water is so transparent that) ----无法这么计算，句子太多了

所以，用Markov 假设：
P(the |tis water is so transparent that) = P(the |that) --二元模型
= P(the|transparent that) 　　　　　　　　　　　　 --三元模型

所以，更一般的表达式()：
P(Wi|W1,W2...Wi-1) ~ P(Wi|Wi-k...Wi-1)

P(W1,W2...Wn) ~ {连乘i}P(Wi|Wi-k..Wi-1)

简单的模型
P(W1,W2...Wn) ~ {连乘i}P(Wi)
二元模型
P(Wi|W1,W2...Wi-1) ~ P(Wi|Wi-1)

虽然有可能句子的距离很长，不过通常三元、四元模型就能很好的解决