【Python---网络爬虫】3.Python基本库的使用_urlopen方法中的data 参数可以不是 bytes 类型-优快云博客

本文链接：https://blog.youkuaiyun.com/m0_37827925/article/details/102619712

本文深入讲解Python内置的urllib库，包括其四个核心模块：request、error、parse和robotparser的功能与用法。通过实例演示如何使用urllib发送GET和POST请求，解析URL，处理异常，以及设置超时时间。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一.使用urllib

urllib库是Python内置的HTTP请求库，不需要安装就能直接使用。
它主要包含了以下四个模块：

request：最基本的HTTP请求模块。
error：异常处理模块
parse：工具模块
robotparser：

发送请求

使用urllib的request模块，可以方便的实现请求发送与响应。
1.urlopen()
爬取Python官网：

import urllib.request
response = urllib.request.urlopen(‘http://www.python.org’)
print(response.read().decode(‘utf-8’))

在这里插入图片描述
爬取的结果

以上是抓取到的网页源代码。

看下返回的类型，使用type()方法输出相应的类型：

import urllib.request
response = urllib.request.urlopen(‘http://www.python.org’)
print(tyoe(response))

在这里插入图片描述
输出结果如下:

它是一个HTTPResponse对象，包含 read()、 readinto ()、 getheader(name)、
getheaders() 、 fileno()等方法，以及 msg 、 version 、status 、reason 、debuglevel 、losed 等属性。

得到对象后，把它赋值为response变量，就可以使用以上的对象和方法。
在这里插入图片描述
urlopen()函数的API：

urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)

data参数
data 参数是可选的。如果要添加该参数，并且如果它是字节流编码格式的内容，即 bytes 类型，则需要通过 bytes()方法转化。如果传递了这个参数，则它的请求方式就不再是GET方式，而是POST方式。

import urllib.parse
import urllib.request
data = bytes(urllib.parse.urlencode({‘word’:‘hello’}),encoding=‘utf8’)
response = urllib.request.urlopen(‘http://httpbin.org/post’,data=data)
print(response.read())

在这里插入图片描述