万能爬虫代码

1、GET数据爬虫

网页1:滨州市公共数据开放网

url = 'http://bzdata.sd.gov.cn/binzhou/catalog/index?domainId=domain-4&page=1'
# 模拟浏览器向服务器发送请求
response = requests.get(url=url, headers=headers)
response = response.text

soup = BeautifulSoup(response, "lxml")
lis = soup.select('.bottom-content>ul>li')

for li in lis:
    title = li.find('a').text
    cate_infor = li.select('.cata-information')[0].select('.information')
    fabu = cate_infor[2].select('span')[1].text
    gengxin = cate_infor[3].select('span')[1].text

网页2:淮安市人民政府

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36"}
url = 'https://opendata.huaian.gov.cn:18443/api/catalog?keyword=&resourceType=&categoryId=3&pageNum=1&pageSize=5&orderBy=publishTime&dir=desc'

response = requests.get(url=url, headers=headers)
response = response.json()
data_list = response['data']['data']
for data in data_list:
    catalogName = data['catalogName']
    sharingProperty = data['sharingProperty']

2、POST数据爬虫

网页1:遂宁市公共数据开放平台

url = 'https://www.suining.gov.cn/exchangeopengateway/v1.0/mh/sjml/getMlxxList'
headers = {
	'Accept': 'application/json, text/plain, */*',
	'Accept-Encoding': 'gzip, deflate, br',
	'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',
	'appCode': 'd7b9815286344caff8e9a5171dc10fac',
	'Cookie': '_appId=d7b9815286344caff8e9a5171dc10fac',
	'Connection': 'keep-alive',
	'Content-Length': '198',
	'Content-Type': 'application/json;charset=UTF-8',
	'Origin': 'https://www.suining.gov.cn',
	'Referer': 'https://www.suining.gov.cn/data',
	'Sec-Fetch-Dest': 'empty',
	'Sec-Fetch-Mode': 'cors',
	'token': 'undefined',
	'Sec-Fetch-Site': 'same-origin',
	'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
	'sec-ch-ua': '"Chromium";v="110", "Not A(Brand";v="24", "Google Chrome";v="110"',
	'sec-ch-ua-mobile': '?0',
	'sec-ch-ua-platform': '"Windows"',
}

params = {
	'gxsjPx': 2,
	'llslPx': '',
	'mlmc': '',
	'pfPx': '',
	'sqslPx': '',
	'kflx': ['00'],
	'ssbmId': ['00'],
	'ssjclmId': ['00'],
	'ssqtlmId': ['00'],
	'ssztlmId': ['00'],
	'wjlx': ['00'],
	'zylx': ['01', '02'],
	'pageNo': 1,
	'pageSize': 10
}
data = json.dumps(params)
response = requests.post(url=url, headers=headers, data=data)
response.encoding = response.apparent_encoding
response = response.json()
data_list = response['data']['rows']

for data in data_list:
	title = data['mlmc']
	tags = data['mlms']

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值