
爬虫
名字1001
这个作者很懒,什么都没留下…
展开
-
Scrapy
pip install Scrapy scrapy.cfg myproject/ __init__.py items.py middlewares.py pipelines.py settings.py spiders/ __init__.py spider1.py spider2.py ...原创 2019-10-01 23:38:41 · 155 阅读 · 0 评论 -
python使用正则
import re import requests from fake_useragent import UserAgent url = 'https://www.baidu.com' m = re.match(r'\w+',url) print(m.group()) url2 = 'http://www.97xs.org/11/11389/3303898.html' headers = {...原创 2019-10-05 10:35:32 · 245 阅读 · 0 评论 -
爬虫proxy代理,简单实例
from urllib.request import Request,build_opener from fake_useragent import UserAgent from urllib.request import ProxyHandler url = 'https://www.qidian.com' headers = { "User-Agent":"Mozilla/5.0...原创 2019-10-04 19:59:30 · 1705 阅读 · 0 评论 -
python爬虫,使用cookie实例
from urllib.request import Request,build_opener,HTTPCookieProcessor from urllib.parse import urlencode from fake_useragent import UserAgent from http.cookiejar import MozillaCookieJar def get_cookie...原创 2019-10-04 19:56:45 · 376 阅读 · 0 评论 -
爬虫,使用请求头User-Agent
from urllib.request import urlopen from urllib.request import Request url = 'https://www.hao123.com' header = { 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like ...原创 2019-10-03 22:02:03 · 551 阅读 · 0 评论 -
爬虫,get请求,添加参数
arg = '编程语言' url = 'https://www.baidu.com/s?wd={}'.format(quote(arg)) #url = 'https://www.baidu.com/s?wd=python' headers = { "User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.3...原创 2019-10-03 22:00:41 · 696 阅读 · 1 评论 -
爬虫,post请求
from urllib.request import Request,urlopen from urllib.parse import urlencode url = 'https://www.zhihu.com/signin?next=%2F' args = { 'username':'16633970705', 'password':'16633970705' } f_dat...原创 2019-10-03 21:59:40 · 237 阅读 · 0 评论 -
爬虫,多个页面
from urllib.request import Request,urlopen for i in range(3): url = 'http://hao123.zongheng.com/store/c0/w0/s0/p{0}/all.html'.format(i+1) r = Request(url) html = urlopen(r).read().decode...原创 2019-10-03 21:58:44 · 561 阅读 · 0 评论 -
爬虫忽略ssl证书
from urllib.request import Request,urlopen import ssl url = 'https://www.hao123.com' r = Request(url) #忽略证书 context = ssl._create_unverified_context() response = urlopen(r,context=context) print(re...原创 2019-10-03 21:57:42 · 557 阅读 · 0 评论 -
学习scrapy,起点小说简易爬虫
# -*- coding: utf-8 -*- import scrapy class QidianSpider(scrapy.Spider): name = 'qidian' allowed_domains = ['qidian.com'] start_urls = ['https://read.qidian.com/chapter/ZOJ_pWRoTg9wKI0S...原创 2019-10-03 14:50:50 · 1041 阅读 · 2 评论 -
OCR百度api,python实现图像文字识别
from urllib.request import Request,urlopen # client_id 为官网获取的AK, client_secret 为官网获取的SK url = 'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=【API Key】&clien...原创 2019-10-06 00:40:26 · 1646 阅读 · 1 评论