如何规定时间内批量插入1000000条数据到数据库中

最新推荐文章于 2024-08-14 15:13:38 发布

原创最新推荐文章于 2024-08-14 15:13:38 发布 · 1k 阅读

5 ·

CC 4.0 BY-SA版权

文章标签：

#数据库

该文章描述了一个Java程序，它首先读取Excel文件，然后利用Hutool库解析数据成List<List<Object>>。数据被转换为Student对象并存储在ArrayList中，接着使用线程池和CountDownLatch进行并发处理，将数据分批插入数据库，以提高效率。程序使用ExecutorType.BATCH执行批量SQL操作，减少与数据库的交互次数，提升性能。

首先模拟1000000条数据到excel中，

后端需要用到一个工具类来获得到解析excel文件

//1、获取Excel文件（只是创建ExcelReader读取器，还没有开始真正的读）
ExcelReader reader = ExcelUtil.getReader("C:/Users/xiaopeng/Desktop/aa.xlsx");

有了解析excel的工具之后，我们要思考下，把文件解析成什么样子了呢？

那么我们使用idea看一下就好了

很显然返回结果是一个List<List<Object>>，有点蒙。别急慢慢分析下

excel中解析出来的肯定会是一个存放student的集合，但是为什么里面还有一个List呢？

这是因为hootel工具帮我们把excel中的每一行数据都解析成为一个list集合了，List<Object>里面每一项放的是每个字段的值。仔细想想，确实，hootel怎么会知道excel中存放的是什么对象啊，他只能保证把数据解析出来。剩下的需要我们程序员想办法把他包装成我们需要的对象集合，并插入数据库。

好的，思路大概清晰了，那么问题来了，怎么才能把List<List<Object>>变成我们需要的对象集合呢？这简单，提前创建好一个Student类集合，然后从每一项List<Object>的值赋给Student集合不就好了吗？

                students = new ArrayList<>(10000);
                 //   2、 在这里控制每次读取多少条数据
                 readList = reader.read(start, end, true);
                 for (List<Object> objects : readList) {
                     students.add(Student.builder()
                             .id(Long.valueOf(objects.get(0).toString()))
                             .name((String) objects.get(1))
                             .age(Integer.valueOf(objects.get(2).toString()))
                             .build());
                 }

好的现在student集合有了，存数据库吧。1000000万呢！直接存慢死了，能不能多开几个线程一块干呢？必须可以。那我们创建个线程池放十个线程。但是一下1000000万还是太多了，分成多几分吧，一次读一万条，这一万条分给五个不同线程去读，应该就快了。

 int start = 1;
             int end = 10000;

             while (end <= 1000000) {
                 //  1、 以下for循环，主要就是将读取到的数据，封装到这里
                 students = new ArrayList<>(10000);
                 //   2、 在这里控制每次读取多少条数据
                 readList = reader.read(start, end, true);
                 for (List<Object> objects : readList) {
                     students.add(Student.builder()
                             .id(Long.valueOf(objects.get(0).toString()))
                             .name((String) objects.get(1))
                             .age(Integer.valueOf(objects.get(2).toString()))
                             .build());
                 }
                 int i = 0;
                 map = new HashMap<>(5);

                 while (true) {
                     List<Student> list = students.stream().skip(i * 2000).limit(2000).parallel().collect(Collectors.toList());
                     if (list.isEmpty()) {
                         break;
                     }
                     map.put(i + 1, list);
                     i++;
                 }

                 CountDownLatch latch = new CountDownLatch(5);

                 for (List<Student> students1 : map.values()) {
                     executor.submit(() -> {
                         mapper.insertBatch(students1);
                         latch.countDown();
                     });
                 }

                 latch.await();
                 //每10000条提交1次
                 sqlSession.commit();
                 students.clear();


                 start += 10000;
                 end += 10000;
             }

但是肯定有小伙伴比较懵：

map = new HashMap<>(5);

while (true) {
List<Student> list = students.stream().skip(i * 2000).limit(2000).parallel().collect(Collectors.toList());
if (list.isEmpty()) {
break;
}
map.put(i + 1, list);
i++;
}

CountDownLatch latch = new CountDownLatch(5);

for (List<Student> students1 : map.values()) {
executor.submit(() -> {
mapper.insertBatch(students1);
latch.countDown();
});
}

latch.await();
//每10000条提交1次
sqlSession.commit();

这是干啥的？

别急，我给你细细道来。一万条数据分给五个线程的话，一个线程就是2000，怎么保证一个分到2000啊？

while (true) { List<Student>list=students.stream().skip(i*2000).limit(2000).parallel().collect(Collectors.toList());
if (list.isEmpty()) {
break;
}
map.put(i + 1, list);
i++;
}

在看这段代码，是不是明白了？（stream流不会的需要小补一下哈，我就不讲了）。

现在就是线程也分配到了，开始插数据吧。不行，因为这是五个线程呀，你执行一点插一点，这数据库压力有点大呀，这得连接多久。知识点来了：

2. ExecutorType.BATCH：

ExecutorType.BATCH会将一批SQL语句集中在一起批量执行，减少了与数据库的交互次数，提高性能。
多条SQL语句会一起提交到数据库执行，可以提升执行效率。
可以通过`sqlSession.flushStatements()`方法手动触发批量执行。
适用于需要执行大量SQL操作的场景，如批量插入、更新或删除多条记录。

综上所述，ExecutorType.SIMPLE适用于常规的单个或少量SQL操作，而ExecutorType.BATCH适用于需要批量执行大量SQL操作的场景，效率更高。选择适当的执行方式可根据业务需求和性能要求进行衡量。

CountDownLatch latch = new CountDownLatch(5);

for (List<Student> students1 : map.values()) {
    executor.submit(() -> {
        mapper.insertBatch(students1);
        latch.countDown();
    });
}

latch.await();
//每10000条提交1次
sqlSession.commit();
students.clear();

现在应该还是懵，insertBatch就insertBatch呗，你这 executor.submit和 latch.countDown();还有

CountDownLatch latch = new CountDownLatch(5);啥意思啊？

问得好，这里每个线程执行完我们还不能直接提交，我们要等五个线程全部完成后，咱们一块提交，这样数据库执行一次就行了，就高效了对吧？完事儿。下面欣赏全部service代码

 /*
  * Copyright (c) 2021, 2023, peng.cn All rights reserved.
  *
  */
 package com.xiaopeng.service;

 import cn.hutool.core.date.StopWatch;
 import cn.hutool.poi.excel.ExcelReader;
 import cn.hutool.poi.excel.ExcelUtil;
 import com.xiaopeng.dao.TestDao;
 import com.xiaopeng.model.Student;
 import org.apache.ibatis.session.ExecutorType;
 import org.apache.ibatis.session.SqlSession;
 import org.apache.ibatis.session.SqlSessionFactory;
 import org.springframework.stereotype.Service;

 import javax.annotation.Resource;
 import java.util.ArrayList;
 import java.util.HashMap;
 import java.util.List;
 import java.util.Map;
 import java.util.concurrent.CountDownLatch;
 import java.util.concurrent.ExecutorService;
 import java.util.concurrent.Executors;
 import java.util.stream.Collectors;

 /**
  * <p>Project: spring-excell-test - TestService
  * <p>Powered by xiaopeng On 2023-07-26 15:37:38
  *
  * @author peng [3121843255@qq.com]
  * @version 1.0
  * @since 1.8
  */
 @Service
 public class TestService {

     @Resource
     private SqlSessionFactory sqlSessionFactory;

     public Integer insertBatch() {
         //1、获取Excel文件（只是创建ExcelReader读取器，还没有开始真正的读）
         ExcelReader reader = ExcelUtil.getReader("C:/Users/xiaopeng/Desktop/aa.xlsx");
         //设置读取的excell页
         reader.setSheet("Sheet1");
         //优化sql插入   ExecutorType.BATCH
         SqlSession sqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH);
         TestDao mapper = sqlSession.getMapper(TestDao.class);

         //存储读取的数据
         List<List<Object>> readList = null;
         List<Student> students = null;
         Map<Integer, List<Student>> map = null;
         ExecutorService executor = Executors.newFixedThreadPool(10);
         try {
             //计时器
             StopWatch stopWatch = new StopWatch();
             stopWatch.start();

             int start = 1;
             int end = 10000;

             while (end <= 1000000) {
                 //  1、 以下for循环，主要就是将读取到的数据，封装到这里
                 students = new ArrayList<>(10000);
                 //   2、 在这里控制每次读取多少条数据
                 readList = reader.read(start, end, true);
                 for (List<Object> objects : readList) {
                     students.add(Student.builder()
                             .id(Long.valueOf(objects.get(0).toString()))
                             .name((String) objects.get(1))
                             .age(Integer.valueOf(objects.get(2).toString()))
                             .build());
                 }
                 int i = 0;
                 map = new HashMap<>(5);

                 while (true) {
                     List<Student> list = students.stream().skip(i * 2000).limit(2000).parallel().collect(Collectors.toList());
                     if (list.isEmpty()) {
                         break;
                     }
                     map.put(i + 1, list);
                     i++;
                 }

                 CountDownLatch latch = new CountDownLatch(5);

                 for (List<Student> students1 : map.values()) {
                     executor.submit(() -> {
                         mapper.insertBatch(students1);
                         latch.countDown();
                     });
                 }

                 latch.await();
                 //每10000条提交1次
                 sqlSession.commit();
                 students.clear();


                 start += 10000;
                 end += 10000;
             }
             //停止计时
             stopWatch.stop();
             System.out.println("总共耗时" + stopWatch.getTotalTimeSeconds());
         } catch (Exception e) {
             sqlSession.rollback();
         } finally {
             sqlSession.close();
             // 关闭线程池
             executor.shutdown();
         }
         return 1;
     }
 }

controller层

 @RestController
 @RequestMapping
 public class TestController {
     @Autowired
     private TestService testService;

     //测试一
     @RequestMapping(value = "/api/test")
     public String helloTest() {
         testService.insertBatch();
         return "ok";
     }
 }

dao

@Mapper
public interface TestDao {

    Integer insert(Student student);

    Integer insertBatch(List<Student> students);

}

mapper.xml

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE mapper
        PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN"
        "http://mybatis.org/dtd/mybatis-3-mapper.dtd">

<mapper namespace="com.xiaopeng.dao.TestDao">
    <insert id="insert">
        insert into student(id,name,age) values (#{id},#{name},#{age})
    </insert>

    <insert id="insertBatch">
        insert into student(id,name,age) values
        <foreach collection="students" item="student" open="" close="" separator=",">
            (#{student.id},#{student.name},#{student.age})
        </foreach>
    </insert>
</mapper>

最后注意配置文件还要加一个配置来支持insertBatch插入

server.port=8081

logging.level.root=error
logging.level.com.xiaopeng=debug

#数据库配置
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver
spring.datasource.url=jdbc:mysql://localhost:3308/excel?useUnicode=true&characterEncoding=utf-8&useSSL=true&serverTimezone=UTC&rewriteBatchedStatements=true
spring.datasource.username=root
spring.datasource.password=root
#连接池配置
spring.datasource.type=com.alibaba.druid.pool.DruidDataSource
spring.datasouce.initial-size=5     #初始化连接数
spring.datasouce.minIdle=5          #最小连接数
spring.datasouce.maxAxtive=20       #最大连接数
spring.datasouce.maxWait=6000       #最大等待时间



#mybatis配置
mybatis.mapper-locations=classpath:mapper/*.xml
mybatis.type-aliases-package=com.xiaopeng.model

加粗的就是注意的，其他没啥了。感谢观看！！！