分词的第三方模块
介绍用过的 Python 两种分词的模块 jieba 和 snownlp,直接上实例:
1、实例 jieba
from jieba import posseg as pseg
# 采用结巴分词进行分词,返回分词和词性
cur_tuple_words = pseg.lcut(words)
for word, flag in cur_tuple_words:
print(word)
print(flag)
2、实例 snownlp
from snownlp import SnowNLP
s = SnowNLP(text)
fenciList = s.tags
for word, flag in fenciList:
print(word)
print(flag)
在使用过程中,发现两者存在一个很明显的区别——结巴在导入时花费时间比较长,snownlp明显短。用代码来说明,请看下面:
- import snownlp 大概三四秒
import time
start_time = time.time()
from snownlp import SnowNLP
end_time = time.time()
print(end_time - start_time)
输出 3.98864293098
- import jieba 在 10秒左右
import time
start_time = time.time()
from jieba import posseg as pseg
end_time = time.time()
print(end_time - start_time)
输出 10.1280369759
业务需求用到,没深究,若有错误欢迎请指出。
友情链接:jieba(结巴)分词种词性简介