druid连接超时时间20分钟引起的血案

最新推荐文章于 2025-09-26 01:41:46 发布

原创最新推荐文章于 2025-09-26 01:41:46 发布 · 1.5w 阅读

3 ·

CC 4.0 BY-SA版权

java 专栏收录该内容

101 篇文章

订阅专栏

本文分析了生产环境中定时任务处理大量业务数据时出现的挂批问题，探讨了Druid连接池配置对问题的影响，并提出了两种解决方案。

1.问题描述

生产环境当数据量大的时候有时就会出现挂批问题。(默认3分钟定时任务调度一次，可是有时候定时任务启动了，但是业务数据没有处理）。挂批就造成大量业务挤压，没有处理。这时候就需要人手工处理。

2.问题分析

2.1 bug 反思路分析

JobDetail#execute
/**这段代码中首先判断一下批次的状态是否是running，若是running那么直接返回，不调用处理业务逻辑代码*/
if (RUNNING.equals(taskInfo.getStatus())) {
    logger.info(taskInfo.getId().getTaskKey() + "is running...");
} else {
    // 处理业务逻辑的代码
    execute(context, task, taskInfo);
    log.info("job exec end"+DataUtil.now())
    updateJobDb(task);
}

2.2Bug分析

强总提示druid配置发现一个 Druid连接池 removeAbandonedTimeout 设置1200。

配置说明：removeAbandonedTimeout 超过时间限制是否回收。

根据强总得分析，紧接着我们分析一下源码。简单来说就是druid会hold住连接池。

com.alibaba.druid.pool.DruidDataSource#getConnectionDirect
/** 设置数据库连接最长时间 */
public DruidPooledConnection getConnection(long maxWaitMillis) throws SQLException {
        this.init();
        if (this.filters.size() > 0) {
            FilterChainImpl filterChain = new FilterChainImpl(this);
            return filterChain.dataSource_connect(this, maxWaitMillis);
        } else {
            return this.getConnectionDirect(maxWaitMillis);
        }
}
/**超时之后放弃连接，这里仅仅是部分源码，有兴趣自己研究看全*/
public int removeAbandoned() {
      
        DruidPooledConnection pooledConnection;
        try {
            iter = this.activeConnections.keySet().iterator();

            while(iter.hasNext()) {
                pooledConnection = (DruidPooledConnection)iter.next();
                if (!pooledConnection.isRunning()) {
                    long timeMillis = (currrentNanos - pooledConnection.getConnectedTimeNano()) / 1000000L;
                    if (timeMillis >= this.removeAbandonedTimeoutMillis) {
                        iter.remove();
                        pooledConnection.setTraceEnable(false);
                        abandonedList.add(pooledConnection);
                    }
                }
            }
        } finally {
            this.activeConnectionLock.unlock();
        }
    }

2.3 bug再现

    根据上述分析，我们采用手动sleep20分钟来看看是否是这样。
    样板(1)发3条数据，sleep19分钟
    样板(2)发2条数据，sleep20分钟
发起时间                结束时间            流水号    
2020-01-09 14:18:40                     23
2020-01-09 14:18:30                     22
2020-01-09 14:13:10 2020-01-09 14:18:01 21
2020-01-09 13:57:50 2020-01-09 14:18:01 20
2020-01-09 13:57:40 2020-01-09 14:18:00 18
2020-01-09 13:51:25 2020-01-09 13:57:24 17
2020-01-09 13:51:20 2020-01-09 13:57:23 16
2020-01-09 13:50:40 2020-01-09 13:57:21 15
2020-01-09 13:31:01 2020-01-09 13:47:11 14
2020-01-09 13:28:20 2020-01-09 13:47:10 13
2020-01-09 13:13:09 2020-01-09 13:13:14 12