1、GET数据爬虫
网页1:滨州市公共数据开放网
url = 'http://bzdata.sd.gov.cn/binzhou/catalog/index?domainId=domain-4&page=1'
# 模拟浏览器向服务器发送请求
response = requests.get(url=url, headers=headers)
response = response.text
soup = BeautifulSoup(response, "lxml")
lis = soup.select('.bottom-content>ul>li')
for li in lis:
title = li.find('a').text
cate_infor = li.select('.cata-information')[0].select('.information')
fabu = cate_infor[2].select('span')[1].text
gengxin = cate_infor[3].select('span')[1].text
网页2:淮安市人民政府
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36"}
url = 'https://opendata.huaian.gov.cn:18443/api/catalog?keyword=&resourceType=&categoryId=3&pageNum=1&pageSize=5&orderBy=publishTime&dir=desc'
response = requests.get(url=url, headers=headers)
response = response.json()
data_list = response['data']['data']
for data in data_list:
catalogName = data['catalogName']
sharingProperty = data['sharingProperty']
2、POST数据爬虫
网页1:遂宁市公共数据开放平台
url = 'https://www.suining.gov.cn/exchangeopengateway/v1.0/mh/sjml/getMlxxList'
headers = {
'Accept': 'application/json, text/plain, */*',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',
'appCode': 'd7b9815286344caff8e9a5171dc10fac',
'Cookie': '_appId=d7b9815286344caff8e9a5171dc10fac',
'Connection': 'keep-alive',
'Content-Length': '198',
'Content-Type': 'application/json;charset=UTF-8',
'Origin': 'https://www.suining.gov.cn',
'Referer': 'https://www.suining.gov.cn/data',
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'token': 'undefined',
'Sec-Fetch-Site': 'same-origin',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
'sec-ch-ua': '"Chromium";v="110", "Not A(Brand";v="24", "Google Chrome";v="110"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
}
params = {
'gxsjPx': 2,
'llslPx': '',
'mlmc': '',
'pfPx': '',
'sqslPx': '',
'kflx': ['00'],
'ssbmId': ['00'],
'ssjclmId': ['00'],
'ssqtlmId': ['00'],
'ssztlmId': ['00'],
'wjlx': ['00'],
'zylx': ['01', '02'],
'pageNo': 1,
'pageSize': 10
}
data = json.dumps(params)
response = requests.post(url=url, headers=headers, data=data)
response.encoding = response.apparent_encoding
response = response.json()
data_list = response['data']['rows']
for data in data_list:
title = data['mlmc']
tags = data['mlms']
2万+

被折叠的 条评论
为什么被折叠?



