
xpath
文章平均质量分 76
Arthur54271
人生苦短,我用Python
展开
专栏收录文章
- 默认排序
- 最新发布
- 最早发布
- 最多阅读
- 最少阅读
-
Python3-爬虫~selenium\phantomjs\豆瓣音乐例子
from selenium import webdriver import os,time from lxml import etree #豆瓣音乐 root_dir='douban' if not os.path.exists(root_dir): os.mkdir(root_dir) #访问 driver=webdriver.PhantomJS() base_url='https:...原创 2018-05-18 10:39:27 · 302 阅读 · 0 评论 -
Python3-爬虫~selenium\phantomjs\豆瓣登录过程中处理验证码
#豆瓣登录 from selenium import webdriver from selenium.webdriver.common.action_chains import ActionChains import os,time driver=webdriver.PhantomJS() driver.get('https://www.douban.com/') #网络请求时间 time.sle...原创 2018-05-18 12:57:14 · 617 阅读 · 0 评论 -
Python3~xpath
from lxml import etree from urllib import request import ssl ssl._create_default_https_context=ssl._create_unverified_context html=''' <bookstore> <title>新华书店</title> <bo...原创 2018-05-13 11:46:54 · 773 阅读 · 0 评论 -
Python3~xpath应用糗事百科爬虫
from urllib import request from lxml import etree import re import ssl import json ssl._create_default_https_context=ssl._create_unverified_context def spider(page): base_url='https://www.qiushi...原创 2018-05-14 14:07:12 · 308 阅读 · 0 评论 -
Python3~scrapy项目之爬取当前页和下一页
# -*- coding: utf-8 -*- import scrapy from urllib import request from Py06_2018_3_16.items import TencentItem class tencentNextPageSpider(scrapy.Spider): name = 'tencent_next_page' allowed_do...原创 2018-05-30 18:59:59 · 10488 阅读 · 0 评论 -
Python3~scrapy项目之下载网页图片
# -*- coding: utf-8 -*- import scrapy,re,os from PY_2018_03_17.items import TuKuItem from urllib import request class TukuSpider(scrapy.Spider): name = 'tuku' allowed_domains = ['lanrentuku.c...原创 2018-05-31 14:36:47 · 644 阅读 · 0 评论