python学习笔记第十一章

最新推荐文章于 2021-12-19 10:14:41 发布

正心全栈编程

最新推荐文章于 2021-12-19 10:14:41 发布

阅读量445

点赞数

本文介绍如何使用Python的requests模块从Web下载文件，并演示了检查HTTP响应的状态码、处理不存在的页面异常、解析HTML内容及使用BeautifulSoup进行网页内容抓取的方法。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

# 用 requests 模块从 Web 下载文件
# Import library files
import requests
# 发送一个http请求 send http requests
res = requests.get('http://www.gutenberg.org/cache/epub/1112/pg1112.txt')
# check error 检查错误
res.raise_for_status()
# Check state code 检验状态码
print(res.status_code == requests.codes.ok)
print(len(res.text))
# print the text of less then 250
print(res.text[:250])
# open a file
playFile = open('RomeoAndJuliet.txt', 'wb')
# iter_content()方法在循环的每次迭代中，返回一段内容。每一段都是 bytes 类型，你需要指定一段包含多少字节。
for chunk in res.iter_content(100000):
    # write text
    playFile.write(chunk)
playFile.close()
# requests 抓取的是网页的原始数据，如果不是txt文件的话就会将html等数据也抓取下来

import requests
res = requests.get('http://inventwithpython.com/page_that_does_not_exist')
try:
    res.raise_for_status()
except Exception as exc:
    print('There was a problem: %s' % (exc))

import requests, bs4
res = requests.get('http://nostarch.com')
res.raise_for_status()
noStarchSoup = bs4.BeautifulSoup(res.text, "lxml")
exampleFile = open('example.html')
exampleSoup = bs4.BeautifulSoup(exampleFile, "lxml")
print(noStarchSoup)

import requests, bs4
exampleFile = open('example.html')
exampleSoup = bs4.BeautifulSoup(exampleFile.read(), "lxml")
# 用 select()方法寻找元素
elems = exampleSoup.select('#author')
print(type(elems))
print(len(elems))
print(str(elems[0]))
print(elems[0].attrs)
pElems = exampleSoup.select('p')
print(str(pElems[0]))
print(pElems[0].getText())