5.7 与 8.0 对相同文件的 LOAD DATA 语句结果不同
问题描述
某客户现场支持,由MySQL 5.7.21升级MySQL 8.0.25后,通过LOAD DATA
导入文件,当同一会话连续导入不同的编码(UTF8/GB18030)文件时会出现乱码。数据库版本未升级之前,相同的导入操作在MySQL 5.7.21未出现乱码。
问题分析
1)查看简化后的 LOAD DATA
语句
greatsql> LOAD DATA LOCAL INFILE '/home/greatdb/TEST_UTF8_bak.txt' IGNORE INTO TABLE ASSP_SIS_PAYRES_IMP_BAK CHARACTER SET UTF8MB4 IGNORE 0 LINES (@row) SET `D_NAME` = NULLIF(TRIM(CONVERT(UNHEX(SUBSTR(HEX(@row), 1,240)) USING UTF8MB4)),'');
Query OK, 2 rows affected (0.01 sec)
Records: 2 Deleted: 0 Skipped: 0 Warnings: 0
greatsql> LOAD DATA LOCAL INFILE '/home/greatdb/TEST_GB18030_bak.txt' IGNORE INTO TABLE ASSP_SIS_PAYRES_IMP_BAK CHARACTER SET GB18030 IGNORE 0 LINES (@row) SET `D_NAME` = NULLIF(TRIM(CONVERT(UNHEX(SUBSTR(HEX(@row), 1,240)) USING GB18030)),'');
Query OK, 2 rows affected (0.01 sec)
Records: 2 Deleted: 0 Skipped: 0 Warnings: 0
2)查看表数据
+----------+------------------------------------------------------+
| AUTO_INC | D_NAME |
+----------+------------------------------------------------------+
| 1 | xxx社会保险xxx |
| 2 | xxx市路桥区xxx |
| 4 | 鍙板窞甯傝矾妗ュ尯绀句細淇濋櫓浜嬩笟绠$悊涓績 |
| 5 | 鍙板窞甯傝矾妗ュ尯绀句細淇濋櫓浜嬩笟绠$悊涓績 |
+----------+------------------------------------------------------+
4 rows in set (0.00 sec)
3)检查业务表的字符集与校验集,发现字符集为 utf8mb4 、校验集为 utf8mb4_bin
4)检查数据库的字符集与校验集
greatsql> SHOW GLOBAL VARIABLES LIKE '%char%';
+--------------------------------------+--------------------------------+
| Variable_name | Value |
+--------------------------------------+--------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8mb3 |
| character_sets_dir | /opt/mysql3301/share/charsets/ |
| validate_password_special_char_count | 1 |
+--------------------------------------+--------------------------------+
9 rows in set (0.01 sec)
greatsql> SHOW GLOBAL VARIABLES LIKE '%coll%';
+-------------------------------+--------------------+
| Variable_name | Value |
+-------------------------------+--------------------+
| collation_connection | utf8mb4_bin |
| collation_database | utf8mb4_bin |
| collation_server | utf8mb4_bin |
| default_collation_for_utf8mb4 | utf8mb4