python3 [爬虫入门实战]爬虫之scrapy爬取游天下南京短租房存mongodb_python爬取租房网站论文mongodb总结怎么写-优快云博客

本文链接：https://blog.youkuaiyun.com/xudailong_blog/article/details/75610248

本文介绍了一个使用Scrapy框架爬取南京地区租房信息的项目。爬虫从游天下网站抓取了120多条租房记录，包括房源名称、链接、单价等详细信息，并通过XPath解析网页元素。数据最终被保存到MongoDB数据库及JSON文件中。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

总结：总的来说不是很难，只是提取的字段有些多。总共获取了一个120多个南京房租信息

这里写图片描述

1 爬取的item

# -*- coding: utf-8 -*-

# Define here the models for your scraped items
#
# See documentation in:
# http://doc.scrapy.org/en/latest/topics/items.html

import scrapy


class YoutxnanjinItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    # pass

    # 房源名称
    homeName = scrapy.Field()
    # 房源链接
    homeLine = scrapy.Field()
    # 房租单价
    homeSinglePrice = scrapy.Field()
    # 房租地址
    homeAddress = scrapy.Field()
    # 房租近期信息
    homeDetai = scrapy.Field()
    # 满七天价格
    homeSeven = scrapy.Field()
    # 满30天价格
    homeThirth = scrapy.Field()

    # 房东
    homePerson = scrapy.Field()
    # 房东头像
    homePersonImg = scrapy.Field()
    # 房东头像链接
    homePersonLink = scrapy.Field()

    # 房子大图
    homePicBg = scrapy.Field()
    # 房子大图链接
    homePicLink = scrapy.Field()

    # 品牌店铺信息
    # homePinPai = scrapy.Field()
    # 明星房东
    # homeStarrPerson = scrapy.Field()

我就问：是不是注释很详细，。

2 spider里面的内容

#encoding=utf8
import scrapy
from youtxNanJin.items import YoutxnanjinItem

class NanJinDefault(scrapy.Spider):
    name = 'youtx'
    allowed_domains = ['youtx.com']
    start_urls = ["http://www.youtx.com/nanjing/longrent1-page{}".format(n) for n in range(0,6)]
    def parse(self, response):
        # print(response.body)
        node_list = response.xpath("//div[@class='duanzu houseList']/ul/li[@class='clearfix']")
        # print(node_list)
        for node in node_list:
            item = YoutxnanjinItem()
            homeName = node.xpath("./div[@class='houseInfo clearfix']/div[@class='house-tit clearfix']/h3/a/text()").extract()
            homeLink = node.xpath("./div[@clas