http://tieba.baidu.com/p/6070002023
from nltk.corpus import brown
brown_tagged_sents=brown.tagged_sents(categories=‘news’)
brown_sents = brown.sents(categories=‘news’)
import nltk
nltk.download(‘brown’)
nltk.download(‘universal_tagset’)
import nltk.tag.brill
from nltk.corpus import brown
brown_tagged_sents = brown.tagged_sents(categories=‘news’, tagset=‘universal’)
brown_sents = brown.sents(categories=‘news’)
size = int(len(brown_tagged_sents) * 0.9)
train_sents = brown_tagged_sents[:size]
#set up first stage of tagging
print(size)
[nltk_data] Downloading package brown to
[nltk_data] C:\Users\Lenovo\AppData\Roaming\nltk_data…
[nltk_data] Package brown is already up-to-date!
[nltk_data] Downloading package universal_tagset to
[nltk_data] C:\Users\Lenovo\AppData\Roaming\nltk_data…
[nltk_data] Package universal_tagset is already up-to-date!
4160
import re
patterns=[

本文演示了如何使用NLTK库中的Brill标注器对英文语料库Brown进行标注。首先,从Brown语料库中提取新闻类别,并进行预处理。接着,通过正则表达式设置初步标注器,然后利用BrillTaggerTrainer训练 Brill 标注器,最终评估标注器的性能,展示了一些规则和准确率。
最低0.47元/天 解锁文章
926

被折叠的 条评论
为什么被折叠?



