
爬虫
YWF331
这个作者很懒,什么都没留下…
展开
-
爬虫-下载网页
from urllib.request import urlopenfrom urllib.error import URLError,HTTPErrorurl = 'http://sou.zhaopin.com/jobs/searchresult.ashx?jl=%E8%A5%BF%E5%AE%89&kw=python&sm=0&p=1'#url = 'http:/...原创 2018-03-21 01:25:32 · 1006 阅读 · 0 评论 -
爬虫-最终版
#!/usr/bin/env python3# -*- coding:utf-8 -*-from time import sleepfrom selenium import webdriverfrom selenium.webdriver.chrome.options import Optionsfrom selenium.webdriver.support.ui import Sel...原创 2018-07-12 15:49:54 · 408 阅读 · 0 评论 -
爬虫-上版优化
目的:减少点击量#!/usr/bin/env python3# -*- coding:utf-8 -*-from time import sleepfrom selenium import webdriverfrom selenium.webdriver.chrome.options import Optionsfrom selenium.webdriver.support.ui ...原创 2018-07-12 13:14:53 · 448 阅读 · 0 评论 -
爬虫-招标信息()
这个写的很舒服#!/usr/bin/env python3# -*- coding: utf-8 -*-import requestsimport reimport jsonimport randomfrom time import sleepfrom datetime import datefrom functools import reduceclass Prov...原创 2018-07-17 15:31:49 · 3838 阅读 · 0 评论 -
爬虫-
from selenium import webdriverfrom selenium.webdriver.support.ui import Selectfrom datetime import date,timedeltafrom re import findalltoday = str(date.today())yesterday = str(date.today() - tim...原创 2018-07-11 09:36:36 · 264 阅读 · 0 评论 -
爬虫-requests,微信公众号推送
去掉了time和bs4,不好用#!/usr/bin/env python3# -*- coding:utf-8 -*-import re,sys,time,json,requestsimport urllibfrom datetime import date,timedelta# 获取页面链接信息class PageHelperInfo(object): def pag...原创 2018-07-09 13:43:12 · 789 阅读 · 0 评论 -
爬虫-限速
from urllib.request import urlopen,urlparsefrom urllib.error import URLError,HTTPErrorimport reimport timefrom datetime import datetime#url = 'http://sou.zhaopin.com/jobs/searchresult.ashx?jl=%E...原创 2018-03-25 19:36:55 · 1006 阅读 · 0 评论 -
爬虫-链接深度
from urllib.request import urlopenfrom urllib.error import URLError,HTTPErrorimport reimport time#url = 'http://sou.zhaopin.com/jobs/searchresult.ashx?jl=%E8%A5%BF%E5%AE%89&kw=python&sm=0...原创 2018-03-25 17:59:05 · 480 阅读 · 0 评论 -
爬虫-ip代理
from urllib.request import urlopen,Request,ProxyHandler,build_opener,install_openerurl = 'http://ip.chinaz.com/getip.aspx'#局部代理# handler = ProxyHandler({'http':'115.223.215.51:9000'})# opener = ...原创 2018-03-21 14:00:31 · 447 阅读 · 0 评论 -
爬虫-用户代理
from urllib.request import urlopen,Requestfrom urllib.error import URLError,HTTPErrorpcUserAgent = {"safari 5.1 – MAC":"User-Agent:Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWeb...原创 2018-03-21 13:22:51 · 1199 阅读 · 0 评论 -
beautiful soup用法
python beautiful soup库的超详细用法:https://cuiqingcai.com/1319.htmlhttps://blog.youkuaiyun.com/love666666shen/article/details/77512353转载 2018-11-21 10:47:19 · 308 阅读 · 0 评论