python爬虫|urllib.request模块和urllib.parse模块使用

本文介绍了Python的urllib.request模块和urllib.parse模块的使用，包括request模块的urlopen和Request方法，以及parse模块的encode和quote函数。通过实例讲解了如何进行网页请求、处理重定向和身份验证，以及URL编码和URL拼接。最后，通过爬虫百度贴吧的练习，巩固了所学知识。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1 urllib.requst使用
1.1 基本介绍
1.2 urllib.request.urlopen
1.3 urllib.request.Request
2 urllib.parse 模块使用
2.1 urllib.parse.encode()
2.2 urllib.parse.quote()的使用
2.3 综合练习
3 爬虫百度贴吧练习

1. urllib requst使用

1.1 基本介绍
（1）介绍： urllib 的 request 模块可以发送请求，返回 response。
（2）用法：urllib.request 模块提供了基本的构造 http 请求的方法，同时可以处理 authentication(身份授权验证)、redirections(重定向)、cookies等内容。
1.2 urllib.request.urlopen

 '''
    urllib.request.urlopen:不支持Use-Agent:可以起到反爬作用
    '''
    import urllib.request
    #response是响应对象
    response=urllib.request.urlopen('https://qq.yh31.com/zjbq/2920180.html')
    # read是把response里面的内容存储起来
    html=response.read()
    # html=response.read().decode('utf8')
    print(html)
    print(type(html))
    # encode() 字符串-->bytes数据类型
    #decode() bytes数据类型-->字符串

1.3 urllib.request.Request
代码中的知识点是：（1）urllib.request.Request解决网页设置了User-Agent不能获取全部源代码；（2）响应对象的读取内容、返回响应码、返回实际url

'''
urllib.request.Request:解决网页设置了Use-Agent:
使用流程
1.利用Request()方法构建请求对象
2.利用urlopen()方法获取响应对象
3.利用响应对象中的read().decode('utf8')中的内容
'''
import urllib.request
url='https://www.baidu.com/'
headers = headers ={
   
   
     'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36'
 }
# 1.创建请求对象
req = urllib.request.Request(url,headers=headers)
# 2.获取相应对象
response=urllib.request.urlopen(req)
# 3.读取响应对象 read().decode('utf-8')
html=response.read().decode('utf-8')
 print(html)
'''
响应对象(response):
读取响应对象：response.read()
返回http的响应码：response.getcode()
返回实际数据的URL(防止重定向问题):response.geturl()
'''
print(response.getcode())  #返回状态码
print(response.geturl())   #返回实际给我们数据的url
'''

2 urllib.parse 模块使用

2.1 urllib.parse.encode()
urllib.parse.urlencode() 传入字典，将字符串变为字节

import urllib.request
import urllib.parse
name={
   
   'wd':'海贼王'}
name=urllib.parse.urlencode(name)
print(name)

2.2 urllib.parse.quote()的使用

#quoto传入字符串
import urllib.parse
n