Python网络爬虫（第六篇）——BeautifulSoup项目实践

最新推荐文章于 2024-04-24 08:26:26 发布

Rush006

最新推荐文章于 2024-04-24 08:26:26 发布

阅读量439

点赞数

分类专栏： Python网络爬虫

本文链接：https://blog.youkuaiyun.com/weixin_45480995/article/details/111656649

版权

Python网络爬虫专栏收录该内容

8 篇文章

订阅专栏

本文介绍了一个Python项目，利用BeautifulSoup库爬取安居客网站上的上海二手房数据，包括房源名称、价格、每平米价格、地址、户型、房屋大小、楼层、建造年份、标签和联系人等关键信息。通过发送HTTP请求并解析HTML页面，实现了对多页房源数据的抓取。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

项目实践——BeautifulSoup爬取上海二手房的数据

获取房源的名称、价格、户型、面积大小、楼层、建造年份、联系人、地址、标签等数据。

一、网站分析
在这里插入图片描述

1：请求头

URL：https://shanghai.anjuke.com/sale/p1/#filtersort（第一页）
User-Agent：Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36 Edg/87.0.664.66

2：定位到各个元素的地址
名称 div.house-title
价格 div.pro-price
每平米价格 span.unit-price
地址 span.comm-address
户型 div.details-item
房屋大小 div.details-item
楼层 div.details-item
建造年份 div.details-item
标签 span.item-tags tag-others
联系人 Span.broker-name broker-text

地址获取，关注我，前期内容已详细讲解
二、敲代码

import requests
from bs4 import BeautifulSoup


link_1='https://shanghai.anjuke.com/sale/p'#安居客二手房
link_2='/#filtersort'
f=1#计数用
for i in range(1,6):#一到五页

    link=link_1+str(i)+link_2
    headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'
                      ' AppleWebKit/537.36 (KHTML, like Gecko)'
                      ' Chrome/87.0.4280.88 Safari/537.36 Edg/87.0.664.66'}

    #发送请求
    r=requests.get(url=link,headers=headers)
    print(r.status_code)
    #解析网页
    soup=BeautifulSoup(r.text,'lxml')
    #抓取内容

    house_item_list=soup.find_all('li',class_='list-item')
    for eachitem in house_item_list:
       print("第",f,'套房源')
       house_name_list = eachitem.find('div', class_='house-title')
       name=house_name_list.a.text.strip()
       print(name)
       house_price_list = eachitem.find('div', class_='pro-price')
       price=house_price_list.span.text.strip()
       print('价格',price)
       house_areaPrice_list=eachitem.find('span',class_='unit-price')
       areaPrice=house_areaPrice_list.text.strip()
       print(areaPrice)
       house_address_list=eachitem.find('span',class_='comm-address')
       address=house_address_list.text.strip()
       print('地址',address)
       house_detail_list=eachitem.find('div',class_='details-item')
       detail=house_detail_list.span.text.strip()
       print('户型',detail)
       area=house_detail_list.contents[3].text
       print('房屋大小：',area)
       floor=house_detail_list.contents[5].text
       print('楼层：',floor)
       year=house_detail_list.contents[7].text
       print(year)
       house_tag_list = eachitem.find_all('span', class_='item-tags tag-others')
       tag = [i.text for i in house_tag_list]
       print(tag)
       house_broker_list=eachitem.find('span',class_='broker-name broker-text')
       broker_name=house_broker_list.text.strip()
       print('联系人:',broker_name)
       print('\n')
       f+=1