n-gram特征提取与语言模型构建
1. n-gram基础
1.1 n-gram生成示例
以 n = 4 和示例文本 “After, there were several follow-up questions. The New York Times asked when the bill would be signed” 为例,生成的四元组(four-grams)如下:
('<s>', '<s>', '<s>', 'After')
('<s>', '<s>', 'After', ',')
('<s>', 'After', ',', 'there')
('After', ',', 'there', 'were')
(',', 'there', 'were', 'several')
('there', 'were', 'several', 'follow')
('were', 'several', 'follow', 'up')
('several', 'follow', 'up', 'questions')
('follow', 'up', 'questions', '.')
('up', 'questions', '.', '</s>')
('questions', '.', '</s>', '</s>')
('.', '</s>', '</s>', '</s>')
('<s>', '<s>', '<s&g
超级会员免费看
订阅专栏 解锁全文
689

被折叠的 条评论
为什么被折叠?



