获取代理IP地址(BeautifulSoup)

最新推荐文章于 2024-06-11 23:14:14 发布

转载最新推荐文章于 2024-06-11 23:14:14 发布 · 284 阅读

0 ·

CC 4.0 BY-SA版权

原文链接：http://www.cnblogs.com/tmyyss/p/4207556.html

本文介绍了一种使用Python的BeautifulSoup库从网站抓取代理IP数据的方法。通过设置请求头并发送GET请求到指定URL，利用BeautifulSoup解析网页内容，最后筛选出所需的代理IP信息。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

前天用正则的方式获取网站的代理IP数据，今天为了学习BeautifulSoup,用BeautifulSoup实现了一下。

 1 #!/usr/bin/python
 2 
 3 import requests
 4 from bs4 import BeautifulSoup
 5 
 6 
 7 headers={'Host':"www.ip-adress.com",
 8         'User-Agent':"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0",
 9         'Accept':"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
10         'Accept-Language':"zh-cn,zh;q=0.8,en-us;q=0.5,en;q=0.3",
11         'Accept-Encoding':"gzip, deflate",
12         'Referer':"http://www.ip-adress.com/Proxy_Checker/",
13         'Connection':'keep-alive'
14 }
15 
16 url="http://www.ip-adress.com/proxy_list/"
17 req=requests.get(url,headers=headers)
18 soup=BeautifulSoup(req.text) //BeautifulSoup(str)
19 rsp=soup.find_all('tr',{'class':'odd'})
20 rsp1=soup.find_all('tr',{'class':'even'})
21 for eliment in rsp:
22         print eliment.td.text //the first one
23 
24 for eliment1 in rsp1:
25         print eliment1.td.text