如何简单的抓取网站数据

最新推荐文章于 2025-06-27 15:17:59 发布

原创最新推荐文章于 2025-06-27 15:17:59 发布 · 1w 阅读

CC 4.0 BY-SA版权

本文介绍了如何利用Spring框架的定时任务调度功能，动态更改任务执行周期。通过在配置文件中启用定时任务，结合@EnableScheduling注解和SchedulingConfigurer接口，可以在不重启服务器的情况下调整任务的cron表达式，实现任务执行周期的动态更新。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1.首先，用带debug的火狐浏览器，访问要抓取的网站，通过debug的控制台或网络找到数据的接口。

2.Spring框架自3.0版本起，自带了任务调度功能，好比是一个轻量级的Quartz，而且使用起来也方便、简单，且不需要依赖其他的JAR包。秉承着Spring的一贯风格，Spring任务调度的实现同时支持注解配置和XML配置两种方式。

　　先来看下Spring常规定时任务的配置，如下：

[html]view plain copy
<?xml version="1.0" encoding="UTF-8"?>  
<beans xmlns="http://www.springframework.org/schema/beans"  
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  
    xmlns:task="http://www.springframework.org/schema/task"  
    xmlns:context="http://www.springframework.org/schema/context"  
    xsi:schemaLocation="  
        http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd   
        http://www.springframework.org/schema/task http://www.springframework.org/schema/task/spring-task.xsd   
        http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd ">  
      
    <context:component-scan base-package="com.pes_soft.task.demo" />  
      
    <!-- Spring注解方式配置调度任务 -->  
    <task:executor id="executor" pool-size="3"/>  
    <task:scheduler id="scheduler" pool-size="3"/>  
    <task:annotation-driven executor="executor" scheduler="scheduler"/>  
</beans>  

　　注意：配置Spring定时任务时，需要在Spring配置文件的xml头部加入xmlns:task="http://www.springframework.org/schema/task"和xsi:schemaLocation位置中加入http://www.springframework.org/schema/task

http://www.springframework.org/schema/task/spring-task.xsd

3.代码实现如下：

在定时任务类上增加@EnableScheduling注解，并实现SchedulingConfigurer接口。（值得注意的是：@EnableScheduling对Spring的版本要求比较高，一开始使用的3.2.6版本时一直未成功，后来改成4.2.5版本就可以了）
设置一个静态变量cron，用于存放任务执行周期参数。
另辟一线程，用于模拟实际业务中外部原因修改了任务执行周期（修改了周期时间，不用重启服务器）。
设置任务触发器，触发任务执行，其中就可以修改任务的执行周期。

　　完整的SpringDynamicCronTask.java代码如下：

[java]view plain copy
package com.pes_soft.task.demo;  
  
import java.util.Date;  
  
import org.slf4j.Logger;  
import org.slf4j.LoggerFactory;  
import org.springframework.context.annotation.Lazy;  
import org.springframework.scheduling.Trigger;  
import org.springframework.scheduling.TriggerContext;  
import org.springframework.scheduling.annotation.EnableScheduling;  
import org.springframework.scheduling.annotation.SchedulingConfigurer;  
import org.springframework.scheduling.config.ScheduledTaskRegistrar;  
import org.springframework.scheduling.support.CronTrigger;  
import org.springframework.stereotype.Component;  
  
/** 
 * Spring动态周期定时任务<br> 
 * 在不停应用的情况下更改任务执行周期 
 * @Author 许亮 
 * @Create 2016-11-10 16:31:29 
 */  
@Lazy(false)  
@Component  
@EnableScheduling  
public class SpringDynamicCronTask implements SchedulingConfigurer {  
    private static final Logger logger = LoggerFactory.getLogger(SpringDynamicCronTask.class);  
      
    private static String cron;  
      
    public SpringDynamicCronTask() {  
        cron = "0/5 * * * * ?";  
          
        // 开启新线程模拟外部更改了任务执行周期  
        new Thread(new Runnable() {  
            @Override  
            public void run() {  
                try {  
                    Thread.sleep(15 * 1000);  
                } catch (InterruptedException e) {  
                    e.printStackTrace();  
                }  
                  
                cron = "0/10 * * * * ?";  
                System.err.println("cron change to: " + cron);  
            }  
        }).start();  
    }  
  
    @Override  
    public void configureTasks(ScheduledTaskRegistrar taskRegistrar) {  
        taskRegistrar.addTriggerTask(new Runnable() {  
            @Override  
            public void run() {  
                // 任务逻辑  
                logger.debug("dynamicCronTask is running...");  
            }  
        }, new Trigger() {  
            @Override  
            public Date nextExecutionTime(TriggerContext triggerContext) {  
                // 任务触发，可修改任务的执行周期  
                CronTrigger trigger = new CronTrigger(cron);  
                Date nextExec = trigger.nextExecutionTime(triggerContext);  
                return nextExec;  
            }  
        });  
    }  
}