- ARPA format language model [slides, manpage, blog, htkbook]
- Note: the log prob’s base is 10
- http://www.statmt.org/book/slides/07-language-models.pdf
- 4-6 - Back-off and Interpolation - Stanford NLP - Dan Jurafsky & Chris Manning
- Lecture 7: Finite State Transducers, Language Modeling, and Speech Recognition Search
Back-off model:
S(wi|wi−1i−k+1)=⎧⎩⎨⎪⎪⎪⎪C(wii−k+1)C(wi−1i−k+1)α⋅S(wi|wi−1i−k+2)ifC(wi−1i−k+1)>0otherwise
Toolkits:
Libraries:
- NGram
- Create and manipulate n-gram language models encoded with weighted FSTs.
- Thrax
- compile regular expressions and content-dependent rewrite grammars into weighted FSTs.
本文介绍了ARPA格式的语言模型,包括其基本概念、回退模型的数学表达式及常用的工具包如SRILM和IRSTLM等。同时,文中还提到了如何利用加权有限状态转导器(FST)来创建和操作n元语言模型。
5597

被折叠的 条评论
为什么被折叠?



