python爬虫如何使用代理ip

python requests和selenium使用代理ip

很多时候,我们的爬虫ip被封,这个时候就需要用到代理ip了。

requests使用代理ip

这里假设代理的用户和密码,ip和端口分别为:
proxyUser = 4858555769507840
proxyPass = X7nEeMi
proxyHost = “122.239.176.108”
proxyPort = “5021”
requests代码:

requests.post(url, data=collect_data, headers=headers,cookies=cookies,proxies={'http': 'http://4858555769507840:X7nEeMi@122.239.176.108:5021', 'https': 'http://4858555769507840:X7nEeMi@122.239.176.108:5021'}, verify=False)

selenium使用代理ip

def create_proxy_auth_extension(proxy_host, proxy_port,
                                proxy_username, proxy_password,
                                scheme='http', plugin_path=None):
    if plugin_path is None:
        plugin_path = r'./proxy_auth_plugin.zip'

    manifest_json = """
        {
            "version": "1.0.0",
            "manifest_version": 2,
            "name": "Chrome Proxy",
            "permissions": [
                "proxy",
                "tabs",
                "unlimitedStorage",
                "storage",
                "<all_urls>",
                "webRequest",
                "webRequestBlocking"
            ],
            "background": {
                "scripts": ["background.js"]
            },
            "minimum_chrome_version":"22.0.0"
        }
        """

    background_js = string.Template(
        """
        var config = {
            mode: "fixed_servers",
            rules: {
                singleProxy: {
                    scheme: "${scheme}",
                    host: "${host}",
                    port: parseInt(${port})
                },
                bypassList: ["foobar.com"]
            }
          };

        chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});

        function callbackFn(details) {
            return {
                authCredentials: {
                    username: "${username}",
                    password: "${password}"
                }
            };
        }

        chrome.webRequest.onAuthRequired.addListener(
            callbackFn,
            {urls: ["<all_urls>"]},
            ['blocking']
        );

        chrome.webRequest.onBeforeSendHeaders.addListener(function (details) {
	        details.requestHeaders.push({name:"connection",value:"close"});
            return {
            requestHeaders: details.requestHeaders
            };
        },
        {urls: ["<all_urls>"]},
            ['blocking']
        );
        """
    ).substitute(
        host=proxy_host,
        port=proxy_port,
        username=proxy_username,
        password=proxy_password,
        scheme=scheme,
    )

    with zipfile.ZipFile(plugin_path, 'w') as zp:
        zp.writestr("manifest.json", manifest_json)
        zp.writestr("background.js", background_js)

    return plugin_path
chrome_options = webdriver.ChromeOptions()
proxy_auth_plugin_path = create_proxy_auth_extension(
    proxy_host=proxyHost,
    proxy_port=proxyPort,
    proxy_username=proxyUser,
    proxy_password=proxyPass)
chrome_options.add_extension(proxy_auth_plugin_path)
driver = webdriver.Chrome(chrome_options=chrome_options)

注意:selenium使用有用户名和密码的代理ip时候,不能使用无头模式

selenium工具被浏览器检测出来

在代码中添加如下参数,可以让浏览器检测的windows.navigator.webdriver变量值为undefined

chrome_options.add_argument("--disable-blink-features")
chrome_options.add_argument("--disable-blink-features=AutomationControlled")
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值