standford NLP课程笔记无语言模型

最新推荐文章于 2024-04-26 14:06:01 发布

cherrygirl1989

最新推荐文章于 2024-04-26 14:06:01 发布

阅读量2.1k

点赞数

本文链接：https://blog.youkuaiyun.com/snowswallowhe/article/details/46472097

版权

语言模型用于计算句子概率，涉及机器翻译、拼写纠正和语音识别等领域。通过N-gram模型解决联合概率计算，但存在远距离依赖问题。加一平滑解决了训练集中未出现词组的问题，但可能对数据影响过大。Good-Turing平滑利用只见过一次的事件估计未见过的事件，提供更合理的概率估算。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

语言模型的目的是给句子计算概率。为什么要计算句子的概率呢？这在多个领域都有作用。比如

在机器翻译领域（machine translation），可以用来区分哪个翻译好，哪个翻译不好，如P(high winds tonite) > P(large winds tonite)

在拼写矫正领域（spell correction），可以用来矫正错误的拼写，如 the office is about fifteen minuets from my house. 由于P(about fifteen minutes)>P(about fifteen minuets)，所以这里很可能minuets 拼写错了

在语音识别领域，P(I saw a van ) >> P(eyes awe of an)，因此听到类似的读音，前者的可能性更大

如何计算一个句子的概率呢，也就是下述公式

P(W)=P(w1,w2,w3............wn)

计算联合概率P(W)或者计算边缘概率P(wn | w1,w2......wn-1)的模型就是语言模型

如何计算联合概率

P(its water is so transparent that)

这依赖与概率的链式法则

P(A|B) = P(A,B)/P(B)

P(A,B) = P(A|B)P(B)

P(A,B,C) =P(A)P(B|A)P(C|A,B)

P(A,B,C,D) = P(D|A,B,C)P(C|A,B)P(B|A)P(A)

P(x1,x2,x3......xn) = P(x1)P(x2|x1)P(x3|x1,x2)......P(xn|x1,x2,x3......xn-1)

$P(w_1,w_{2}...w_n)=\prod P(w_i|w_1,w_2...w_{i-1})$

P(its water is so transparent) = P(its)*P(water|its)*P(is|its water)*P(so|its water is)*P(transparent|its water is so)

如何计算概率呢？数数吗？

P(the| its water is so transparent that) = count(its water is so transparent that the) / count(its water is so transparent that)

这样是不可行的！因为英文能组成的句子量是巨大的，我们不能从有限的语料中数数的出来的结果来估计概率

解决的办法：简单假设，一个词只与它前面的那个词有关，即P(the| its water is so transparent that) =P(the|that)

或者，只跟它前面的两个词有关，即P(the| its water is so transparent that) =P(the|transparent that)

$P(w_1,w_{2}...w_n)\approx \prod P(w_i|w_{i-k}...w_{i-1})$

也就是条件概率

$P(w_i|w_1...w_{i-1})\approx P(w_i|w_{i-k}...w_{i-1})$

最简单的模型是一元模型（unigram model），每个词跟前面的0个词有关，则

$P(w_1,w_2...w_n)\approx \prod P(w_i)$

二元模型（bigram model），每个词跟前面的1个词有关，则

$P(w_1,w_2...w_n)\approx \prod P(w_i|w_{i-1})$

可以扩展成三元，四元，N元模型

但是这个语言模型有它的不足之处，因为无法解决远距离依赖，比如

the computer which I had just put into the machine room on the fifth floorcrashed

如何计算N-gram模型的参数呢？

最大似然估计

对于二元模型

$P(w_{i}|w_{i-1}) = \frac{count(w_{i-1},w_{i}))}{count(w_{i-1})}=\frac{c(w_{i-1},w_{i}))}{c(w_{i-1})}$

比如对于语料

<s>I do not like green eggs and ham</s>

那么

P(I|<s>) = c(<s>,I)/c(<s>) = 2/3

P(Sam|<s>) = c(<s>,Sam)/c(<s>) = 1/3

P(am|I) = c(I,am)/c(I) = 2/3

P(</s>|Sam) = c(Sam,</s>)/c(Sam)=1/2

P(Sam|am)=c(am,Sam)/c(am)=1/2

P(do|I)=c(I,do)/c(I)=1/3

采用berkeley restaurant project sentence作为语料，共有9222个句子，计算二元模型的次数如下表

	I	want	to	eat	chinese	food	lunch	spend
I	5	827	0	9	0	0	0	2
want	2	0	608	1	6	6	5	1
to	2	0	4	686	2	0	6	211
eat	0	0	2	0	16	2	42

最低0.47元/天解锁文章

standford NLP课程笔记无 语言模型

standford NLP课程笔记无语言模型