谷歌支持无头浏览器已经一段时间了,目前逐渐替代PhantomJS成为爬虫程序猿的挚爱了。
以下为代码样例,供猿猿们参考。
一、参考
二、环境
- MacOS == 10.12.6 (16G29)
- Chrome == 61.0.3163.100 (正式版本) (64 位)
- selenium == 3.6.0
- Python == 2.7.14
- ChromeDriver == 2.33.506106
三、步骤
3.1 启动chromedriver
$ chromedriver
Starting ChromeDriver 2.33.506106 (8a06c39c4582fbfbab6966dbb1c38a9173bfb1a2) on port 9515
Only local connections are allowed.
3.2 代码
#!/usr/bin/env python
# -*- coding:utf-8 -*-
from selenium import webdriver
from selenium.webdriver.common.proxy import Proxy, ProxyType
options = webdriver.ChromeOptions()
# tell selenium to use the dev channel version of chrome
# NOTE: only do this if you have a good reason to
# options.binary_location = '/usr/bin/google-chrome-unstable' # path to google Chrome bin
options.add_argument('headless')
# set the window size
options.add_argument('window-size=1200x600')
# with proxy
proxy_url = 'ip:port'
proxy = Proxy({
'proxyType': ProxyType.MANUAL,
'httpProxy': proxy_url,
'sslProxy': proxy_url # 需要信任代理服务器CA证书
})
desired_capabilities = options.to_capabilities()
proxy.add_to_capabilities(desired_capabilities)
# initialize the driver
# driver = webdriver.Chrome(chrome_options=options)
driver = webdriver.Chrome(chrome_options=options, desired_capabilities=desired_capabilities)
driver.get('https://www.baidu.com')
# wait up to 10 seconds for the elements to become available
driver.implicitly_wait(10)
driver.get_screenshot_as_file('baidu_index.png')
# use css selectors to grab the search inputs
text = driver.find_element_by_css_selector('#kw')
search = driver.find_element_by_css_selector('#su')
text.send_keys('headless chrome')
driver.get_screenshot_as_file('baidu_main-page.png')
# search
search.click()
driver.get_screenshot_as_file('search-result.png')
results = driver.find_elements_by_xpath('//div[@class="result c-container "]')
for result in results:
res = result.find_element_by_css_selector('a')
title = res.text
link = res.get_attribute('href')
print 'Title: %s \nLink: %s\n' % (title, link)
输出:
Title: Headless Chrome入门 - 简书
Link: http://www.baidu.com/link?url=VxjEiEVtl5fZX-AhWqc-AuoRP9Xy_uXIG1cqs43UbiSacUTqH0j7lDYsnYUpOXrC
Title: 技能树升级——Chrome Headless模式 - 全栈客栈 - SegmentFault
Link: http://www.baidu.com/link?url=CDylpWK8vIuZ8p60MUi_3KlThi-zxPw3bSr5AGPg2QsmTfoathDvfZGnEV2IZejOjw0cF5N4o0exxX1cqf9R-q
Title: 使用Headless Chrome 进行页面渲染 - 知乎专栏
Link: http://www.baidu.com/link?url=IyI0z_PmzMzH6mrw0-YndTwp7WiKmhVF-_ZuXMuPnfyF2MEaBB0BCit0BXpcrfsX
Title: 初探Headless Chrome - WEB前端 - 伯乐在线
Link: http://www.baidu.com/link?url=sw2qqcurzmwTu9n0orvk_LKIvMmiaWlCxlPtvuyOgsKzzxaV3Car6zbRRdpZumDX
Title: 初探Headless Chrome - 知乎专栏
Link: http://www.baidu.com/link?url=6nOyOVHD5AoBjugMoJTxDXhw5EBSYpF9fQMQfbu8WgCf0E_Wbalq6Hbj-KqBGwgm
Title: 通过Headless Chrome执行Selenium脚本 - 优快云博客 Link: http://www.baidu.com/link?url=WSKRO7xRvGfbRIUKKnULwE0FeYNvyjLnEtiHWj108kxsQ7MUd1zPNXLph7WSkYXkiRLh8B3DBYSW8GNdI8wGBq Title: Web自动化之Headless Chrome开发工具库-图灵社区 Link: http://www.baidu.com/link?url=jZletPMcLn7z_liopLphjzknRWshmbsrCUr0K25MY7pbk5smOObJahHbvUrHz_2qnZdEUzcEm8IK0QriythwZa Title: 在headless模式下运行selenium - 曾经的自己 - SegmentFault Link: http://www.baidu.com/link?url=jbe9GNh-2nDbd1KiMkh64EwQD6JvBXdQ_ndtkl-z_Hy2mn8GGnftg2BDnMn3x1rUMwkdwkwuo7dqMZMnVAtHGq Title: linux 安装 Headless Chrome - bambooleaf - 优快云博客 Link: http://www.baidu.com/link?url=jruVom6bFUCrLluHA4aN8ITgq3HlBlR3rYNYC36VlqIBjuFRocIewfKVvw6pleX3v1l2joOaO3-f9NxrVGjUdq