- 博客(12)
- 问答 (1)
- 收藏
- 关注
原创 python微博内容提取
import requestsimport reimport jsonfrom bs4 import BeautifulSoup#微博要用cookies登入#一个知识点 有script里的内容用正则取出再处理headers = { 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,im...
2019-04-13 22:17:44
1994
2
原创 acg456漫画网站爬虫
import requestsimport jsonfrom urllib import requestimport osimport timefor pn in range(1,182): #共1-100章 pn = '%03d' %pn #三位数补零 ...
2019-04-04 01:32:29
86268
原创 python即时更新新闻标题
import requestsfrom bs4 import BeautifulSoupimport refile = open('titles.txt','r',encoding='utf8') #titles.txt是一开始就更新目录的档案title_list = file.read() #旧的储存的标题#以下找到新的标题url = 'https://tw.app...
2019-04-04 01:28:03
379
原创 高德和百度爬虫
#下面这行一定要加不会会报错#coding=utf-8import requests,json,timedef baidu_map(keyword): #city_code是 全国城市代码 37-373 for city_code in range(265,266): #页数设成最多翻20页 for pn in range(0,30):...
2019-04-04 01:22:44
774
原创 图吧爬虫
import requestsimport time,jsonfile = open('mapbar.txt','w',encoding='utf-8')def mapbar(keyword): time_now = int(time.time() * 1000) for pn in range(1,30): parameter = { ...
2019-04-04 01:20:10
235
原创 城市分际际爬虫
import requests,timefrom bs4 import BeautifulSoupfile = open(‘go007.txt’,‘w’,encoding=‘utf-8’)header = {‘Accept’:‘text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,/;q=0....
2019-04-04 01:02:49
278
原创 python查询ip的城市
import geoip2.databasereader = geoip2.database.Reader(r’C:\Users\name\PycharmProjects\test\GeoLite2-City.mmdb’)response = reader.city(‘103.235.46.39’)print(response.country.iso_code)#GeoLite2-City...
2019-04-04 00:59:05
736
原创 8591游戏网查询成交纪錄
#注意爬取太频繁IP会被封锁好几天#因为有的页面因为游戏停止买卖 所以range会跳空很多号码import requests,timefrom bs4 import BeautifulSoupfile = open(“8591.txt”,‘a+’)header ={‘Accept’:‘text/html,application/xhtml+xml,application/xml;q=0...
2019-04-04 00:43:07
810
原创 努努书坊小说爬虫
import requestsfrom bs4 import BeautifulSoupimport reurl = ‘https://www.kanunu8.com/book3/7562/150394.html’res = requests.get(url)html = (res.content).decode(‘gbk’)soup = BeautifulSoup(html,‘lxm...
2019-04-04 00:40:14
950
原创 百度域名多线程采集
import requestsfrom bs4 import BeautifulSoupfrom urllib.parse import urlparseimport timeimport threadingdef spider(num,keyword):for sum in range(0, 800, 100):urlSearch = ‘https://www.baidu.com/...
2019-04-02 08:50:21
356
原创 小鸭网站單一影片下载
import requestsimport m3u8import osheaders = {‘User-Agent’: ‘Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36’}url = ‘http://youku1...
2019-04-02 08:48:10
777
原创 selenium登入脸书指定社团
from selenium import webdriverimport timechrome_path = “C:\selenium_driver_chrome\chromedriver.exe” #chromedriver.exe执行档所存在的路径username = ‘你的脸书帐号’pwd = ‘你的脸书密码’options = webdriver.ChromeOptions()...
2019-04-02 08:44:03
272
空空如也
python selenium获取标签属性的部份如何实现呢
2019-03-11
TA创建的收藏夹 TA关注的收藏夹
TA关注的人