SqlServer数据批量写入（sql bulk copy）

原创已于 2024-11-06 11:38:22 修改 · 2.6k 阅读

8 ·

CC 4.0 BY-SA版权

文章标签：

#sqlserver #sql #数据库

于 2023-07-21 16:31:19 首次发布

数据库专栏收录该内容

1 篇文章

订阅专栏

文章介绍了如何在Java平台上将老系统的Excel数据处理并高效地导入SQLServer数据库，通过使用sqlbulkcopyAPI解决了因参数限制无法直接插入的问题。作者通过CachedRowSetImpl处理批量数据，利用反射获取对象属性值，并调整了代码避免因value为null导致的错误，显著提升了性能。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

近期工作中遇到一个迁移老系统的任务，大致内容就是将用户上传的excel文件，按照一定的规则做处理，将处理后的数据入库，数据库是sqlserver。老系统运行在.net系统上，单次跑任务需要八九个小时，给我的任务是移到java平台且性能提升10倍。为了完成任务，所以就需要有很多效率高一些的操作，这里记录一下关于数据insert入库的处理方案。

单个Excel文件最终产生的数据库行数就有400w到500w，由于sqlserver对于sql预编译的参数数量有限制，最高2100个参数，所以这里直接用insert语句插入显然是不行的。经过调研，了解到了sqlserver有sql bulk copy这么个api可以用来解决大批量数据的插入性能问题，这里简单记录一下使用过程以及其中遇到的问题。

先放一下最终的关键代码，如下：

    public <T extends BaseBulkCopyEntity> void bulkCopy(String tableName, List<T> records) {
        log.info("start sql bulk copy, table: {}, dataSet size: {}", tableName, records.size());
        Connection conn = SqlConnPool.getConnection();
        AtomicLong currentId = BulkCopyCache.getId(tableName);
        long rollbackId = currentId.get();
        try {
            ResultSet resultSet = BulkCopyCache.getResultSet(tableName);
            String[] columnNames = BulkCopyCache.getColumnNames(tableName);

            Class<?> clazz = records.get(0).getClass();
            Map<String, Method> columnGetMethodMap = getMethodMap(clazz, columnNames);

            CachedRowSet crs = RowSetProvider.newFactory().createCachedRowSet();
            crs.populate(resultSet);
            for (T record : records) {
                long id = currentId.incrementAndGet();
                record.setID(id);
                crs.moveToInsertRow();
                populate(crs, record, columnNames, columnGetMethodMap);
                crs.insertRow();
                crs.moveToCurrentRow();
            }
            SQLServerBulkCopy bulkCopy = new SQLServerBulkCopy(conn);
            SQLServerBulkCopyOptions sqlServerBulkCopyOptions = new SQLServerBulkCopyOptions();
            sqlServerBulkCopyOptions.setKeepIdentity(true);
            sqlServerBulkCopyOptions.setBatchSize(records.size());
            bulkCopy.setBulkCopyOptions(sqlServerBulkCopyOptions);
            bulkCopy.setDestinationTableName(tableName);
            bulkCopy.writeToServer(crs);
            crs.close();
            bulkCopy.close();
        } catch (Exception e) {
            log.error("bulk copy error, rollback current-id: {}", rollbackId);
            currentId.set(rollbackId);
            log.error("sql bulk copy error", e);
            throw new RuntimeException("sql bulk copy failed");
        }finally {
            log.info("table [{}] cached id: {}", tableName, currentId.get());
            SqlConnPool.freeConn(conn);
        }
    }

    private <T> void populate(CachedRowSet crs, T record, String[] dbFieldNames, Map<String, Method> columnGetMethodMap) throws Exception {
        for (String fieldName : dbFieldNames) {
            Method method = columnGetMethodMap.get(fieldName);
            Object value = method.invoke(record);
            updateCRS(crs, fieldName, value);
        }
    }

    private void updateCRS(CachedRowSet crs, String dbFieldName, Object value) throws SQLException {
        if (value instanceof String){
            crs.updateString(dbFieldName, (String) value);
        }else if (value instanceof Integer){
            crs.updateInt(dbFieldName, (int) value);
        }else if (value instanceof Double){
            crs.updateDouble(dbFieldName, (double) value);
        }else if (value instanceof Long){
            crs.updateLong(dbFieldName, (long) value);
        }else if (value instanceof Float){
            crs.updateFloat(dbFieldName, (float) value);
        }else if (value instanceof Timestamp){
            crs.updateTimestamp(dbFieldName, (Timestamp) value);
        }else if (value instanceof java.util.Date){
            crs.updateDate(dbFieldName, new java.sql.Date(((java.util.Date)value).getTime()));
        }else if (value instanceof Boolean){
            crs.updateByte(dbFieldName, (boolean) value ? (byte)1 : (byte)0);
        }else {
            crs.updateObject(dbFieldName, value);
        }
    }

    private Map<String, Method> getMethodMap(Class<?> clazz, String[] columnNames) throws NoSuchMethodException{
        Map<String, Method> columnGetNameMap = new HashMap<>(columnNames.length);
        for (String columnName : columnNames) {
            StringBuilder getMethodName = new StringBuilder("get");
            if (columnName.contains("_")){
                String[] singleWords = columnName.split("_");
                for (String singleWord : singleWords) {
                    getMethodName.append(CommonUtils.upperFirstChar(singleWord));
                }
            }else {
                getMethodName.append(CommonUtils.upperFirstChar(columnName));
            }
            Method method = clazz.getMethod(getMethodName.toString());
            columnGetNameMap.put(columnName, method);
        }
        return columnGetNameMap;
    }

这里因为对api的不熟悉，踩过坑，其中就有对CachedRowSet操作不熟悉导致的。一开始在updateCRS方法里，会对value做判断，如果为null，就直接return，导致最终的结果混乱。后面排查过后，改成了现在这样。

贴的代码里，有不少自己写的工具辅助类里的内容没有贴出来，可能会造成一些疑惑，这里详细解释下

首先是这个类 SqlConnPool

这个类是我存放数据库连接池的。项目当中用了其他的连接池管理组件，我尝试过从连接池组件中获取连接，这样确实方便，不需要后续连接的管理，但是连接池组件在回收连接的时候，会把connection关联的resultset都close掉，这会造成我后续程序的错误。在单次任务执行过程，批量插入的方法都是多次调用的，我不想每次调用都从数据库重新获取resultset。最终决定自己写一个简单的存放连接池的类，代码我就不贴了，就是把数据库连接池存放进阻塞队列。

然后是这个类 BulkCopyCache

上面说到resultset会多次重复利用，BulkCopyCache就作为缓存类，用于存放resultset。但是这里不仅缓存resultset，仔细看代码，在批量复制过程中用到的id以及表的列名集合等，都是从中获取的。所以这个类的定位就是，大批量复制所需信息缓存类，贴下代码吧

@Slf4j
public class BulkCopyCache {

    private static final Map<String, String[]> fieldNamesMap = new HashMap<>();

    private static final Map<String, ResultSet> resultSetMap = new HashMap<>();

    private static final Map<String, AtomicLong> rollingIdmMap = new HashMap<>();

    /**
     * 程序启动之初调用，后续不应该再被调用
     */
    public static void initCache(Map<String, BulkCopyInitCacheDas> bulkCopyServiceMap){
        try {
            for (BulkCopyInitCacheDas service : bulkCopyServiceMap.values()) {
                List<String> tableNames = service.getTableNames();
                for (String tableName : tableNames) {
                    ResultSet resultSet = getResultSet0(tableName);
                    resultSetMap.put(tableName, resultSet);
                    String[] fieldNames = getColumnName(resultSet.getMetaData());
                    fieldNamesMap.put(tableName, fieldNames);
                    AtomicLong rollingId = getCurrentId(resultSet);
                    rollingIdmMap.put(tableName, rollingId);
                    log.info("init {} bulk copy cache success, currentId: {}", tableName, rollingId);
                }
            }
            log.info("init all table bulk copy cache success, nice job!");
        }catch (Exception e){
            log.error("init bulk copy cache error", e);
        }
    }

    /**
     * 对外暴露的获取resultSet缓存的方法
     * @param tableName tableName
     * @return
     */
    public static ResultSet getResultSet(String tableName){
        ResultSet resultSet = resultSetMap.get(tableName);
        if (resultSet == null){
            log.info("result set for table {} not exist in cache, populate it", tableName);
            try {
                resultSet = getResultSet0(tableName);
                resultSetMap.put(tableName, resultSet);
            }catch (Exception e){
                log.error("get result set for table {} error, ", tableName, e);
                throw new RuntimeException(e);
            }
        }
        return resultSet;
    }

    public static String[] getColumnNames(String tableName){
        String[] fieldNames = fieldNamesMap.get(tableName);
        if (fieldNames == null) {
            log.info("field names array for table {} not exist in cache, populate it", tableName);
            try {
                ResultSet resultSet = getResultSet(tableName);
                fieldNames = getColumnName(resultSet.getMetaData());
                fieldNamesMap.put(tableName, fieldNames);
            }catch (Exception e){
                throw new RuntimeException(e);
            }
        }
        return fieldNames;
    }

    /**
     * 获取对应表的id
     * @param tableName
     * @return
     */
    public static AtomicLong getId(String tableName){
        AtomicLong id = rollingIdmMap.get(tableName);
        if (id == null){
            log.info("current ID for table {} not exist in cache, populate it", tableName);
            try {
                ResultSet resultSet = getResultSet(tableName);
                id = getCurrentId(resultSet);
                rollingIdmMap.put(tableName, id);
            }catch (Exception e){
                throw new RuntimeException(e);
            }
        }
        return id;
    }

    /**
     * 使缓存中的id无效，迫使重新从数据库拉取数据
     */
    public static void invalidateId(String tableName){
        rollingIdmMap.remove(tableName);
        resultSetMap.remove(tableName);
    }

    private static ResultSet getResultSet0(String tableName) throws Exception{
        ResultSet resultSet = resultSetMap.get(tableName);
        if (resultSet != null) return resultSet;
        Connection conn = SqlConnPool.getConnection();
        try {
            Statement statement = conn.createStatement();
            String firstSql = "SELECT TOP 1 * FROM " + tableName + " ORDER BY ID DESC";
            resultSet = statement.executeQuery(firstSql);
            return resultSet;
        }catch (Exception e){
            log.error("prepare result set cache error ", e);
            throw e;
        }finally {
            SqlConnPool.freeConn(conn);
        }
    }

    private static String[] getColumnName(ResultSetMetaData rsmd) throws SQLException {
        int columnCount = rsmd.getColumnCount();
        String[] result = new String[columnCount];
        for (int i = 1; i <= columnCount; i++) {
            result[i-1] = rsmd.getColumnName(i);
        }
        return result;
    }

    private static AtomicLong getCurrentId(ResultSet resultSet) throws Exception{
        int currentId = resultSet.next() ? resultSet.getInt("ID") : 0;
        return new AtomicLong(currentId);
    }
}

初始化方法会在程序启动后就被调用，将所需数据库表的信息缓存到程序当中。

然后配合的接口

public interface BulkCopyInitCacheDas {

    List<String> getTableNames();
}

这里最上面批量复制的方法没有加锁，是因为我这边每个表的操作都会引入das层，在das层做批量复制时加锁，让锁更靠上，做到锁只在同一个表中起作用，不同表的批量复制操作互不影响，性能也会更好点。

以上几乎是所有代码了。这种插入方式确实比insert快很多，我这单批次最少都是1w起步。都是根据网上不多的博客以及自己慢慢试错才搞出来的，中间踩过很多坑，这里记录一下，希望对后面有需要的人有所帮助。