
python
爬到你心上
这个作者很懒,什么都没留下…
展开
-
努努书坊小说爬虫
import requestsfrom bs4 import BeautifulSoupimport reurl = ‘https://www.kanunu8.com/book3/7562/150394.html’res = requests.get(url)html = (res.content).decode(‘gbk’)soup = BeautifulSoup(html,‘lxm...原创 2019-04-04 00:40:14 · 959 阅读 · 0 评论 -
百度域名多线程采集
import requestsfrom bs4 import BeautifulSoupfrom urllib.parse import urlparseimport timeimport threadingdef spider(num,keyword):for sum in range(0, 800, 100):urlSearch = ‘https://www.baidu.com/...原创 2019-04-02 08:50:21 · 360 阅读 · 0 评论 -
acg456漫画网站爬虫
import requestsimport jsonfrom urllib import requestimport osimport timefor pn in range(1,182): #共1-100章 pn = '%03d' %pn #三位数补零 ...原创 2019-04-04 01:32:29 · 86294 阅读 · 0 评论 -
python即时更新新闻标题
import requestsfrom bs4 import BeautifulSoupimport refile = open('titles.txt','r',encoding='utf8') #titles.txt是一开始就更新目录的档案title_list = file.read() #旧的储存的标题#以下找到新的标题url = 'https://tw.app...原创 2019-04-04 01:28:03 · 384 阅读 · 0 评论 -
高德和百度爬虫
#下面这行一定要加不会会报错#coding=utf-8import requests,json,timedef baidu_map(keyword): #city_code是 全国城市代码 37-373 for city_code in range(265,266): #页数设成最多翻20页 for pn in range(0,30):...原创 2019-04-04 01:22:44 · 778 阅读 · 0 评论 -
图吧爬虫
import requestsimport time,jsonfile = open('mapbar.txt','w',encoding='utf-8')def mapbar(keyword): time_now = int(time.time() * 1000) for pn in range(1,30): parameter = { ...原创 2019-04-04 01:20:10 · 239 阅读 · 0 评论 -
城市分际际爬虫
import requests,timefrom bs4 import BeautifulSoupfile = open(‘go007.txt’,‘w’,encoding=‘utf-8’)header = {‘Accept’:‘text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,/;q=0....原创 2019-04-04 01:02:49 · 285 阅读 · 0 评论 -
python查询ip的城市
import geoip2.databasereader = geoip2.database.Reader(r’C:\Users\name\PycharmProjects\test\GeoLite2-City.mmdb’)response = reader.city(‘103.235.46.39’)print(response.country.iso_code)#GeoLite2-City...原创 2019-04-04 00:59:05 · 746 阅读 · 0 评论 -
8591游戏网查询成交纪錄
#注意爬取太频繁IP会被封锁好几天#因为有的页面因为游戏停止买卖 所以range会跳空很多号码import requests,timefrom bs4 import BeautifulSoupfile = open(“8591.txt”,‘a+’)header ={‘Accept’:‘text/html,application/xhtml+xml,application/xml;q=0...原创 2019-04-04 00:43:07 · 832 阅读 · 0 评论 -
python微博内容提取
import requestsimport reimport jsonfrom bs4 import BeautifulSoup#微博要用cookies登入#一个知识点 有script里的内容用正则取出再处理headers = { 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,im...原创 2019-04-13 22:17:44 · 2003 阅读 · 2 评论