the python challenge 解答

最新推荐文章于 2022-08-19 15:18:32 发布

原创最新推荐文章于 2022-08-19 15:18:32 发布 · 741 阅读

0 ·

CC 4.0 BY-SA版权

python 专栏收录该内容

4 篇文章

订阅专栏

本文介绍了一种使用Python的urllib和BeautifulSoup库来爬取网页数据的方法。通过不断迭代请求特定URL并解析返回的HTML内容，可以提取出所需的数据。此过程展示了如何解析HTML文档、提取文本信息，并进行简单的数据处理。

4.用到urllib和beautifulsoup

import urllib.request
from bs4 import BeautifulSoup

nothing='12345'
for x in range(1,400):
    #读取页面
   f = urllib.request.urlopen("http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing="+nothing)
    #获取页面HTML内容
    html_doc2 = f.read()
    #解析HTML，获得一个SOUP对象
    soup2 = BeautifulSoup(html_doc2, "html.parser")
    #利用.get_text()获得字符串，string.spilt()获得一个列表
    list=soup2.get_text().split()
    #根据特性取最后一个字段，重复下载，重复400次
   count=len(list)
    nothing=list[count-1]
    print(nothing)