
Python
文章平均质量分 76
jiahui_zhu
这个作者很懒,什么都没留下…
展开
-
Python Ajax爬虫
爬B站,Ajax,翻页'''Created on 2015-10-9'''#encoding=utf-8from lxml import htmlfrom time import sleepimport requests#xpathtitle_xpath = "//div[@class='l-item']/a[@class='title']/text()"play_xpat原创 2015-12-03 20:17:06 · 1278 阅读 · 0 评论 -
Python利用结巴分词进行中文分词
利用结巴分词进行中文分词,选择全模式,建立词倒排索引,并实现一般多词查询和短语查询# -*- coding: utf-8 -*-import jieba'''Created on 2015-11-23'''def word_split(text): """ Split a text in words. Returns a list of tuple that con原创 2015-12-03 20:24:40 · 6776 阅读 · 1 评论 -
Python 英文分词
Python 英文分词,词倒排索引,一般多次查询'''Created on 2015-11-18'''#encoding=utf-8# List Of English Stop Words# http://armandbrahaj.blog.al/2009/04/14/list-of-english-stop-words/_WORD_MIN_LENGTH = 3_STOP_WO原创 2015-12-03 20:19:14 · 14340 阅读 · 0 评论 -
python 简单爬虫实现
静态网页,爬时光网,加翻页功能'''Created on 2015-9-28'''from lxml import htmlfrom time import sleep#the name of Male starnames_xpath = "//strong[@class='px14']/a/text()"#Introductionintroductions_xpath原创 2015-12-03 20:13:49 · 621 阅读 · 0 评论