day01 urllib.request 的简单使用

最新推荐文章于 2021-05-29 12:01:18 发布

转载最新推荐文章于 2021-05-29 12:01:18 发布 · 126 阅读

0 ·

CC 4.0 BY-SA版权

原文链接：http://www.cnblogs.com/mai1994/p/10719886.html

文章标签：

#python #爬虫

本文介绍了一种使用Python的urllib库实现的简易网络爬虫案例，演示了如何抓取网页内容并将其保存为本地HTML文件。通过实例，展示了GET请求、URL转译以及数据读写的操作流程。

 简单网络爬虫案例
 1 import urllib.request
 2 
 3 # 请求百度地址
 4 url = "http://www.baidu.com/"
 5 # get请求
 6 response = urllib.request.urlopen(url)
 7 # 将文件获取内容转换成字符串
 8 data = response.read().decode("utf-8")
 9 
10 # 将数据写入文件
11 with open("baidu.html","w",encoding="utf-8") as f :
12     f.write(data)
13 # python爬取回来的数据： str byte
14 # 如果爬回来是bytes类型 : 但是你写入的时候需要字符串 decode("utf-8")
15 # 如果爬回来是str类型 : 但是你写入的时候需要bytes encode("utf-8")

　中文关键转译获取
 1 import  urllib.request
 2 import urllib.parse
 3 import string
 4 
 5 #请求百度地址
 6 url = "http://www.baidu.com/s?wd="
 7 
 8 #拼接字符串（汉字）
 9 name = "美女"
10 final_url = url + name
11 
12 #将包含汉字的网址进行转译
13 new_url=urllib.parse.quote(final_url,safe=string.printable)
14 
15 #打印转译字符
16 print(new_url)
17 
18 #get的请求
19 response = urllib.request.urlopen(new_url)
20 
21 # 将文件获取内容转换成字符串
22 data = response.read().decode()
23 
24 #将爬取到的数据保存到本地
25 with open("baidu.html","w",encoding = "utf-8") as f :
26     f.write(data)