1简单问题
读取text
#encoding=utf-8
file='test.txt'
fn=open(file,"r")
print fn.read()
fn.close()
scrapy不打印调试信息
scrpay crawl spider_name -s LOG_FILE=all.log
2分词
jieba分词:
words = pseg.cut("他改变了中国")
for word, flag in words:
print("{0} {1}".format(word, flag))
自定义词典/去停用词
https://blog.youkuaiyun.com/qq_30262201/article/details/80128076
pyltp分词:
https://blog.youkuaiyun.com/sinat_33731745/article/details/79406878
https://www.jianshu.com/p/f78453f5d1ca
pyltp分词官方文章:
https://pyltp.readthedocs.io/zh_CN/latest/api.html#id19
清华THULAC:
http://thulac.thunlp.org/#编译和安装
各大分词网站试用:
https://blog.youkuaiyun.com/sinat_26917383/article/details/77067515