关于Python爬取懂车帝汽车数据并结合SSM框架（Spring+Spring MVC+MyBatis）构建数据分析平台的技术实现方案分析

本文链接：https://blog.youkuaiyun.com/shuaige_shiwoa/article/details/146040022

以下是一个关于Python爬取懂车帝汽车数据并结合SSM框架（Spring+Spring MVC+MyBatis）构建数据分析平台的技术实现方案分析。代码示例主要分为爬虫部分和平台整合思路：

一、Python爬虫实现核心逻辑（示例）

1.1 使用 Requests+BeautifulSoup 抓取静态数据

import requests
from bs4 import BeautifulSoup
import json

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}

def get_car_list(page=1):
    url = f"https://www.dongchedi.com/car_list?page={page}"
    response = requests.get(url, headers=headers)
    
    if response.status_code == 200:
        soup = BeautifulSoup(response.text, 'html.parser')
        car_items = soup.select('.car-list-item')
        
        data_list = []
        for item in car_items:
            car_data = {
                "brand": item.select('.brand-name')[0].text.strip(),
                "model": item.select('.model-name')[0].text.strip(),
                "price": item.select('.price')[0].text.strip(),
                "engine": item.select('.engine-info')[0].text.strip(),
                "sales": item.select('.sales-num')[0].text.strip()
            }
            data_list.append(car_data)
        
        return data_list
    else:
        print(f"请求失败：{response.status_code}")
        return []

1.2 处理动态加载数据（推荐使用Selenium）

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

def get_dynamic_content():
    options = Options()
    options.add_argument("--headless")  # 无头模式
    driver = webdriver.Chrome(options=options)

    driver.get("https://www.dongchedi.com/sales_rank")
    driver.implicitly_wait(10)
    
    # 解析动态生成的内容
    sales_data = driver.find_elements_by_css_selector('.sales-rank-item')
    for item in sales_data:
        pass  # 数据解析逻辑
    driver.quit()

二、数据存储（MySQL示例）

import pymysql

def save_to_mysql(data):
    conn = pymysql.connect(
        host='localhost',
        user='root',
        password='123456',
        database='car_analysis',
        charset='utf8mb4'
    )
    
    try:
        with conn.cursor() as cursor:
            sql = '''INSERT INTO car_info 
                     (brand, model, price, engine, sales_month) 
                     VALUES (%s, %s, %s, %s, %s)'''
            cursor.executemany(sql, data)
        conn.commit()
    finally:
        conn.close()

三、SSM框架整合（Java部分关键代码）

3.1 MyBatis Mapper接口

public interface CarMapper {
    @Select("SELECT * FROM car_info WHERE brand = #{brand}")
    List<Car> selectByBrand(String brand);
    
    @Select("SELECT brand, AVG(price) as avg_price FROM car_info GROUP BY brand")
    List<Map<String, Object>> getBrandPriceAnalysis();
}

3.2 Spring MVC控制器

@RestController
@RequestMapping("/api/car")
public class CarController {
    
    @Autowired
    private CarService carService;

    @GetMapping("/analysis/brand")
    public ResponseEntity<List<CarAnalysisVO>> getBrandAnalysis() {
        return ResponseEntity.ok(carService.getBrandAnalysis());
    }
}

3.3 定时任务调用Python爬虫

@Component
public class CrawlerScheduler {
    
    @Scheduled(cron = "0 0 2 * * ?") // 每天凌晨2点执行
    public void executeCarCrawler() {
        try {
            String[] cmd = {"python3", "/path/to/crawler.py"};
            Process process = Runtime.getRuntime().exec(cmd);
            process.waitFor();
        } catch (IOException | InterruptedException e) {
            e.printStackTrace();
        }
    }
}

四、数据分析功能示例

4.1 价格分布分析（Python）

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_sql("SELECT * FROM car_info", con=engine)
df['price'] = df['price'].str.replace('万', '').astype(float)

plt.figure(figsize=(10,6))
df['brand'].value_counts().plot(kind='bar')
plt.title('各品牌车型数量统计')
plt.savefig('brand_dist.png')

4.2 Spring Boot返回分析结果

@GetMapping("/salesTrend")
public ResponseEntity<SalesTrendDTO> getSalesTrend() {
    // 调用分析服务获取数据
    return ResponseEntity.ok(analysisService.getSalesTrendData());
}

五、注意事项

反爬对抗策略：
- 使用代理IP池
- 设置随机请求间隔（time.sleep(random.uniform(1,3))）
- 定期更换User-Agent
数据清洗：
- 处理价格区间字符串（如"10.58-15.98万"）
- 统一单位转换（公里/小时 → km/h）

平台架构示意：

爬虫系统 → MySQL → SSM服务 → VUE前端
        ↗             ↖
     Redis缓存       Echarts可视化

法律合规：
- 遵守robots.txt协议
- 控制请求频率
- 仅用于学习研究

需要完整项目源码可以告知具体模块需求，可以提供更详细的代码实现。