1、selenium是什么?是一个浏览器的自动化测试工具,就是通过写代码去操作浏览器,让浏览器做一些自动化的工作
selenium如何操作谷歌浏览器
安装selenium,pip install selenium
步骤:selenium操作谷歌浏览器,其实是操作谷歌浏览器的驱动,由驱动再去驱动浏览器
谷歌浏览器驱动下载地址
http://chromedriver.storage.googleapis.com/index.html
http://npm.taobao.org/mirrors/chromedriver/
谷歌浏览器和驱动之间关系映射表
http://blog.youkuaiyun.com/huilan_same/article/details/51896672
操作步骤:
见代码
phantomjs
是什么?就是一款浏览器,是一款无界面的浏览器,但是爬虫经常使用这个浏览器。
必须登录才能爬取,动态数据。
(1)数据在html中
(2)不在html,在接口中,捕获接口,分析接口。
返回的一般都是json格式数据,是html格式数据
(3)数据在js文件中,解析之,正则表达式
当情况更加复杂的时候怎么办?
放大招,放绝招。一般不要放,因为效率低
selenium如何操作phantomjs
见代码
目的是什么?phantomjs可以得到执行完js之后的网页代码
2、headlesschrome
phantomjs是无界面浏览器
谷歌无界面模式
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument(’–headless’)
chrome_options.add_argument(’–disable-gpu’)
selenium驱动火狐浏览器
下载火狐驱动
https://github.com/mozilla/geckodriver/releases
版本映射
https://blog.youkuaiyun.com/yinshuilan/article/details/79730239
firefox_options = webdriver.FirefoxOptions()
firefox_options.set_headless()
firefox_options.add_argument('--disable-gpu')
操作谷歌
from selenium import webdriver
import time
# 创建一个浏览器对象
path = r'C:\Users\ZBLi\Desktop\1805\day06\ziliao\chromedriver.exe'
driver = webdriver.Chrome(executable_path=path)
# 让浏览器打开百度
url = 'http://www.baidu.com/'
driver.get(url)
# time.sleep(5)
driver.implicitly_wait(10)
'''
下面的操作依赖上面的响应,所以每次只要是耗时的操作,都需要停顿
(1)显示等待
time.sleep(10)
一直等待10s
(2)隐示等待
driver.implicitly_wait(10)
最多等待10s
动态加载
1、请求,得到的是空的html内容
2、在发送ajax请求,得到json格式数据
3、执行里面的js代码,根据DOM操作添加html内容
'''
# 找到输入框
my_input = driver.find_element_by_id('kw')
'''
find_element_by_id
find_element_by_xpath
find_elements_by_xpath
find_element_by_class_name
find_element_by_css_selector
find_element_by_link_text
find_elements_by_class_name
find_elements_by_css_selector
find_elements_by_link_text
'''
# 向这个框里面写内容
my_input.send_keys('清纯美女')
time.sleep(3)
# 查找百度一下按钮
button = driver.find_element_by_id('su')
button.click()
time.sleep(5)
# 查找指定链接
a_href = driver.find_elements_by_link_text('清纯美女_海量精选高清图片_百度图片')[0]
a_href.click()
time.sleep(10)
# 退出浏览器
driver.quit()
操作phantomjs
from selenium import webdriver
import time
'''
\n \r \t
'''
path = r'C:\Users\ZBLi\Desktop\1805\day06\ziliao\phantomjs-2.1.1-windows\bin\phantomjs.exe'
driver = webdriver.PhantomJS(executable_path=path)
url = 'http://www.baidu.com/'
driver.get(url)
time.sleep(5)
# 拍照片方式记录走到哪了
driver.save_screenshot('./png/baidu.png')
driver.find_element_by_id('kw').send_keys('气质美女')
time.sleep(2)
driver.find_element_by_id('su').click()
time.sleep(5)
driver.save_screenshot('./png/qizhi.png')
driver.quit()
得到执行js之后的代码
from selenium import webdriver
import time
path = r'C:\Users\ZBLi\Desktop\1805\day06\ziliao\phantomjs-2.1.1-windows\bin\phantomjs.exe'
driver = webdriver.PhantomJS(executable_path=path)
url = 'https://movie.douban.com/typerank?type_name=%E7%88%B1%E6%83%85&type=13&interval_id=100:90&action='
driver.get(url)
# 一般都是7-15s
time.sleep(7)
# driver.implicitly_wait(20)
# 将网页内容保存到文件中
# with open('douban1.html', 'w', encoding='utf8') as fp:
# 得到字符串格式内容
# fp.write(driver.page_source)
# 解析内容
# from lxml import etree
# tree = etree.HTML(driver.page_source)
driver.save_screenshot('./png/douban.png')
driver.quit()
模拟滚动条滚动到底部
from selenium import webdriver
import time
path = r'C:\Users\ZBLi\Desktop\1805\day06\ziliao\phantomjs-2.1.1-windows\bin\phantomjs.exe'
driver = webdriver.PhantomJS(executable_path=path)
url = 'https://movie.douban.com/typerank?type_name=%E7%88%B1%E6%83%85&type=13&interval_id=100:90&action='
driver.get(url)
# 一般都是7-15s
time.sleep(7)
driver.save_screenshot('./png/douban1.png')
# 模拟滚动条滚动到底部 document.documentElement.scrollTop
js = 'document.body.scrollTop=10000'
driver.execute_script(js)
time.sleep(5)
driver.save_screenshot('./png/douban2.png')
driver.quit()
无界面谷歌
from selenium import webdriver
import time
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
path = r'C:\Users\ZBLi\Desktop\1805\day06\ziliao\chromedriver.exe'
driver = webdriver.Chrome(executable_path=path, chrome_options=chrome_options)
driver.get('http://www.baidu.com/')
time.sleep(3)
driver.save_screenshot('./png/guge.png')
driver.quit()
驱动火狐
from selenium import webdriver
import time
firefox_options = webdriver.FirefoxOptions()
firefox_options.set_headless()
firefox_options.add_argument('--disable-gpu')
path = r'C:\Users\ZBLi\Desktop\1805\day06\ziliao\geckodriver.exe'
driver = webdriver.Firefox(executable_path=path, firefox_options=firefox_options)
driver.get('http://www.baidu.com/')
time.sleep(5)
driver.save_screenshot('./png/huohu.png')
driver.quit()