爬虫入门2——爬代理ip地址

最新推荐文章于 2025-10-17 11:52:41 发布

原创最新推荐文章于 2025-10-17 11:52:41 发布 · 595 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#爬虫

我的Python学习专栏收录该内容

60 篇文章

订阅专栏

本文介绍了一种使用Python爬取西刺代理网站上的IP地址的方法。通过发送HTTP请求并伪装用户代理，解析网页源码获取所需的IP列表。此过程涉及Python的urllib库和正则表达式操作。

import urllib.request
import re
def url_open(url):
    req=urllib.request.Request(url)
    req.add_header('User-Agent','Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36')
    page=urllib.request.urlopen(req)  
    html=page.read().decode('utf-8')
    return html

def get_img(url):

    p=r'(?:(?:\d\d\d|\d\d|\d)\.){3}(?:\d\d\d|\d\d|\d)'
    iplist=re.findall(p,html)

    for each in iplist:
        print(each)

if __name__=='__main__':
     url='http://www.xicidaili.com/'
     html=url_open(url)
     iplist=get_img(html)