自然语言处理中的数据管理与处理
1. 词汇数据查询与处理
在自然语言处理中,词汇数据的查询和处理是基础操作。例如,我们有如下词汇数据:
"sleep","sli:p","v.i","a condition of body and mind ..."
"walk","wo:k","v.intr","progress by lifting and setting down each foot ..."
"wake","weik","intrans","cease to sleep"
可以使用 Python 代码进行查询和处理:
import csv
lexicon = csv.reader(open('dict.csv'))
pairs = [(lexeme, defn) for (lexeme, _, _, defn) in lexicon]
lexemes, defns = zip(*pairs)
defn_words = set(w for defn in defns for w in defn.split())
sorted(defn_words.difference(lexemes))
运行上述代码后,会得到如下结果:
['...', 'a', 'and', 'body', 'by', 'cease', 'condition', 'down', 'each',
'foot', 'lif
超级会员免费看
订阅专栏 解锁全文

被折叠的 条评论
为什么被折叠?



