需求
selenium爬虫时需要添加IP代理,而IP代理使用的是账密认证,这时使用下面普通的添加代理方式是无法获取到请求页面
ChromeOption.add_argument("--proxy-server=http://ip:port")
需要使用以下代码,提供格式为user:password:IP_ADDRESS:port的代理即可
import os
import zipfile
from selenium import webdriver
manifest_json = """
{
"version": "1.0.0",
"manifest_version": 2,
"name": "Chrome Proxy",
"permissions": [
"proxy",
"tabs",
"unlimitedStorage",
"storage",
"<all_urls>",
"webRequest",
"webRequestBlocking"
],
"background": {
"scripts": ["background.js"]
},
"minimum_chrome_version":"22.0.0"
}
"""
def get_background_js(proxy_user, proxy_pass, proxy_host, proxy_port):
print(proxy_user, proxy_pass, proxy_host, proxy_port)
return """var config = {
mode: "fixed_servers",
rules: {
singleProxy: {
scheme: "http",
host: "%s",
port: parseInt(%s)
},
bypassList: ["localhost"]
}
};
chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});
function callbackFn(details) {
return {
authCredentials: {
username: "%s",
password: "%s"
}
};
}
chrome.webRequest.onAuthRequired.addListener(
callbackFn,
{urls: ["<all_urls>"]},
['blocking']
);
""" % (proxy_host, proxy_port, proxy_user, proxy_pass)
def get_chromedriver(proxy=False, user_agent=None):
chrome_options = webdriver.ChromeOptions()
if proxy:
pluginfile = 'proxy_auth_plugin.zip'
with zipfile.ZipFile(pluginfile, 'w') as zp:
zp.writestr("manifest.json", manifest_json)
zp.writestr("background.js", get_background_js(*proxy.split(":")))
chrome_options.add_extension(pluginfile)
if user_agent:
chrome_options.add_argument(' user-agent=%s' % user_agent)
driver = webdriver.Chrome(
chrome_options=chrome_options)
return driver
本文介绍了在使用selenium爬虫时如何处理需要账密认证的IP代理问题,通过提供user:password:IP_ADDRESS:port格式的代理实现对网页的访问。
7053

被折叠的 条评论
为什么被折叠?



