GreatSQL5.7 与 8.0 对 DATE 非法值处理方式不同
一、问题描述
1. 问题现象
当分别通过LOAD DATA LOCAL INFILE
和INSERT
导入非法的 DATE 字段数据时,在5.7.21和 8.0.25使用LOAD DATA LOCAL
会报一个Warning,数据异常但可以插入成功,而且实际插入的数据跟用户计划插入的不同,具体是0000-00-00;而使用LOAD DATA INFILE
方式导入和INSERT
非法值则直接报错,表里不会插入数据。
二、问题分析
1、LOAD DATA INFILE的使用模式
有普通模式和 LOCAL 模式。普通模式要求数据文件放在数据库服务器上,基于行内数据库容器化的情况,目前统一使用 LOCAL 模式。
2、LOAD DATA LOCAL INFILE方式导入的处理机制
LOAD DATA LOCAL INFILE
的特殊处理机制,相当于自动置上IGNORE
,官方文档有明确说明。
LOCAL also affects error handling.
With LOAD DATA INFILE,Data interpretation and duplicate-key errors terminate the operation.
With LOAD DATA LOCAL INFILE,Data interpretation errors and duplicate-key errors become warnings and the operation continues because the server has no way to stop transmission in the middle of the operation. For duplicate-key errors, this is same as if IGNORE is specified. IGNORE is explained further later in this section.
而在源码LOCAL INFILE
里加上IGNORE
标识的代码:
opt_duplicate INTO TABLE_SYM table_ident opt_use_partition{LEX *lex=Lex;/* Fix lock for LOAD DATA CONCURRENT REPLACE */if (lex->duplicates == DUP_REPLACE && $4 == TL_WRITE_CONCURRENT_INSERT)$4= TL_WRITE_DEFAULT;if (!Select->add_table_to_list(YYTHD, $12, NULL, TL_OPTION_UPDATING,$4, $4 == TL_WRITE_LOW_PRIORITY ?MDL_SHARED_WRITE_LOW_PRIO :MDL_SHARED_WRITE, NULL, $13))MYSQL_YYABORT;lex->load_field_list.empty();lex->load_update_list.empty();lex->load_value_list.empty();/* We can't give an error in the middle when using LOCAL files */if (lex->local_file && lex->duplicates == DUP_ERROR)lex->set_ignore(true);}/* We can't give an error in the middle when using LOCAL files */if (m_cmd.m_is_local_file && lex->duplicates == DUP_ERROR)lex->set_ignore(true);
因此,上述代码里针对LOAD DATA LOACL INFILE
加 IGNORE 属于数据库的正常行为,
但官方文档未给出对 LOCAL 加上IGNORE
未有进一步的说明,只是描述了IGNORE
对语句执行的影响。
当 IGNORE
关键字和严格 SQL 模式同时生效时,IGNORE
关键字优先。
The following table presents a summary comparison of statement behavior when the default is to produce an error versus a warning. An example of when the default is to produce an error is inserting a NULL into a NOT NULL column. An example of when the default is to produce a warning is inserting a value of the wrong data type into a column (such as inserting the string 'abc' into an integer column).
Operational Mode | When Statement Default is Error | When Statement Default is Warning |
---|---|---|
Without IGNORE or strict SQL mode |
Error | Warning |
With IGNORE |
Warning | Warning (same as without IGNORE or strict SQL mode) |
With strict SQL mode | Error (same as without IGNORE or strict SQL mode) |
Error |
With IGNORE and strict SQL mode |
Warning | Warning |
One conclusion to draw from the table is that when the IGNORE keyword and strict SQL mode are both in effect, IGNORE takes precedence. This means that, although IGNORE and strict SQL mode can be considered to have opposite effects on error handling, they do not cancel when used together.
在严格模式下是不允许使用零部分的日期,并且插入会产生错误,除非使用IGNORE
,会将将零部分的日期插入为 '0000-00-00'
(这在数据库中被视为有效IGNORE
)并产生警告。
lf strict mode is enabled, dates with zero parts are not permitted and inserts produce an error, unless IGNORE is given as WelL. For INSERT IGNORE and UPDATE IGNORE, dates with zero parts are inserted as '0000-00-00' (which is consideredvalid with IGNORE) and produce a warning.
3、问题重现
3.1 基线设置
5.7.21和8.0.25的SQL_MODE:
greatsql>SHOW GLOBAL VARIABLES LIKE 'SQL_MODE';
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name | Value |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------+
| sql_mode | ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------+
3.2 测试情况
LOAD DATA INFILE测试情况
在存在非法数值的情况下,导入命令报错"ERROR 1292 (22007): Incorrect date value: '0000-00-00' for column 'c_date' at row 1
"。
greatsql> TRUNCATE TABLE testdb.t2;
Query OK, 0 rows affected (0.04 sec)greatsql> LOAD DATA INFILE '/data/greate/t1_00.csv' INTO TABLE testdb.t2 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY "\r\n";
ERROR 1292 (22007): Incorrect date value: '0000-00-00' for column 'c_date' at row 1
greatsql>
LOAD DATA LOCAL INFILE测试情况
在存在非法数值的情况下,导入命令不报错,但是会输出"warning:Data truncated for column 'a' at row 1
",表明GreatSQL识别到了非法字符串,并将其自动进行TRUNCATE转换成了数据类型的默认值。
$ echo 0002024-06 > test.txtgreatsql> CREATE TABLE t1 ( a date primary key);
Query OK, 0 rows affected (0.15 sec)greatsql> LOAD DATA LOCAL INFILE '/root/test.txt' INTO TABLE t1;
Query OK, 1 row affected, 1 warning (0.00 sec)
Records: 1 Deleted: 0 Skipped: 0 Warnings: 1greatsql> SHOW WARNINGS;
+---------+------+----------------------------------------+
| Level | Code | Message |
+---------+------+----------------------------------------+
| Warning | 1265 | Data truncated for column 'a' at row 1 |
+---------+------+----------------------------------------+
1 row in set (0.00 sec)greatsql> SELECT * FROM t1;
+------------+
| a |
+------------+
| 0000-00-00 |
+------------+
1 row in set (0.00 sec)
INSERT 测试情况
在存在非法数值的情况下,执行INSERT
命令报错"ERROR 1292 (22007): Incorrect date value: '0000-00-00' for column 'c_date' at row 1
"。
greatsql> SHOW CREAET TABLE t2\G
*************************** 1. row ***************************Table: t2
Create Table: CREATE TABLE `t2` (`c1` int NOT NULL,`c2` int DEFAULT NULL,`c_date` date DEFAULT '2000-01-01',`c_ts` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,`c_dt` datetime DEFAULT '2000-01-01 00:00:00'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.00 sec)greatsql> INSERT INTO testdb.t2 VALUES (1,1,"0000-00-00",now(),"2000-01-01 00:00:00");
ERROR 1292 (22007): Incorrect date value: '0000-00-00' for column 'c_date' at row 1
greatsql>
INSERT + IGNORE 测试情况
在存在非法数值的情况下,导入命令不报错,但是会输出"Warning:Out of range value for column 'c_date' at row 1
",表明GreatSQL识别到了非法字符串,并也自动进行TRUNCATE转换成数据类型的默认值。
greatsql> INSERT IGNORE INTO testdb.t2 VALUES (1,1,"0000-00-00",now(),"2000-01-01 00:00:00");
Query OK, 1 row affected, 1 warning (0.01 sec)greatsql> SHOW WARNINGS;
+---------+------+-------------------------------------------------+
| Level | Code | Message |
+---------+------+-------------------------------------------------+
| Warning | 1264 | Out of range value for column 'c_date' at row 1 |
+---------+------+-------------------------------------------------+
1 row in set (0.00 sec)greatsql> INSERT IGNORE INTO testdb.t2 VALUES (1,1,"002000-00-00",now(),"2000-01-01 00:00:00");
Query OK, 1 row affected, 1 warning (0.01 sec)greatsql> SELECT * FROM testdb.t2;
+----+------+------------+---------------------+---------------------+
| c1 | c2 | c_date | c_ts | c_dt |
+----+------+------------+---------------------+---------------------+
| 1 | 1 | 0000-00-00 | 2024-06-05 13:57:29 | 2000-01-01 00:00:00 |
| 1 | 1 | 0000-00-00 | 2024-06-05 13:59:55 | 2000-01-01 00:00:00 |
+----+------+------------+---------------------+---------------------+
2 rows in set (0.00 sec)
其它日期类型存在类似的ERROR和WARNING
除了"date
"类型,其他时间类型存在同样的问题,例如"datetime
、timestamp
"等。
greatsql> CREATE TABLE t3 ( a datetime );
greatsql> LOAD DATA LOCAL INFILE '/home/greatdb/test.txt' INTO TABLE t3;
Query OK, 1 row affected, 1 warning (0.01 sec)
Records: 1 Deleted: 0 Skipped: 0 Warnings: 1
greatsql> SHOW WARNINGS;
+---------+------+----------------------------------------+
| Level | Code | Message |
+---------+------+----------------------------------------+
| Warning | 1265 | Data truncated for column 'a' at row 1 |
+---------+------+----------------------------------------+
1 row in set (0.00 sec)
greatsql> SELECT * FROM t3;
+---------------------+
| a |
+---------------------+
| 0000-00-00 00:00:00 |
+---------------------+
1 row in set (0.00 sec)
greatsql> CREATE TABLE t4 ( a timestamp);
greatsql> LOAD DATA LOCAL INFILE '/home/greate/test.txt' INTO TABLE t4;
Query OK, 1 row affected, 1 warning (0.01 sec)
Records: 1 Deleted: 0 Skipped: 0 Warnings: 1greatsql> SHOW WARNINGS;
+---------+------+----------------------------------------+
| Level | Code | Message |
+---------+------+----------------------------------------+
| Warning | 1265 | Data truncated for column 'a' at row 1 |
+---------+------+----------------------------------------+
1 row in set (0.00 sec)greatsql> SELECT * FROM t4;
+---------------------+
| a |
+---------------------+
| 0000-00-00 00:00:00 |
+---------------------+
1 row in set (0.00 sec)
默认值测试
当数据存在问题时,无论是LOAD DATA
还是INSERT
,GreatSQL内部都会自动进行TRUNCATE
转换成数据类型的默认值。只有当字段缺省时,才会使用表中字段的default
缺省值。
greatsql> LOAD DATA LOCAL INFILE '/tmp/t1.csv' INTO TABLE TESTDB.T2 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY "\r\n";
Query OK, 1 rows affected, 1 warning (0.01 sec)
Records: 1 Deleted: 0 Skipped: 0 Warnings: 1greatsql> INSERT IGNORE INTO testdb.t2 VALUES (1,1,"002000-00-00",now(),"2000-01-01 00:00:00");
Query OK, 1 row affected, 1 warning (0.01 sec)greatsql> SHOW WARNINGS;
+---------+------+-------------------------------------------------+
| Level | Code | Message |
+---------+------+-------------------------------------------------+
| Warning | 1264 | Out of range value for column 'c_date' at row 1 |
+---------+------+-------------------------------------------------+
1 row in set (0.00 sec)greatsql> INSERT IGNORE INTO testdb.t2 VALUES (3,3,NULL,now(),"2000-01-01 00:00:00");
Query OK, 1 row affected (0.01 sec)greatsql> INSERT IGNORE INTO testdb.t2 VALUES (4,4,"",now(),"2000-01-01 00:00:00");
Query OK, 1 row affected, 1 warning (0.01 sec)greatsql> SHOW WARNINGS;
+---------+------+---------------------------------------------+
| Level | Code | Message |
+---------+------+---------------------------------------------+
| Warning | 1265 | Data truncated for column 'c_date' at row 1 |
+---------+------+---------------------------------------------+
1 row in set (0.00 sec)greatsql> INSERT INTO testdb.t2(c1,c2) VALUES (5,5);
Query OK, 1 row affected (0.01 sec)greatsql> SELECT * FROM testdb.t2;
+----+------+------------+---------------------+---------------------+
| c1 | c2 | c_date | c_ts | c_dt |
+----+------+------------+---------------------+---------------------+
| 1 | 1 | 0000-00-00 | 2024-06-05 13:57:29 | 2000-01-01 00:00:00 |
| 1 | 1 | 0000-00-00 | 2024-06-05 13:59:55 | 2000-01-01 00:00:00 |
| 3 | 3 | NULL | 2024-06-05 14:03:40 | 2000-01-01 00:00:00 |
| 4 | 4 | 0000-00-00 | 2024-06-05 14:04:07 | 2000-01-01 00:00:00 |
| 5 | 5 | 2000-01-01 | 2024-06-05 14:11:48 | 2000-01-01 00:00:00 |
+----+------+------------+---------------------+---------------------+
5 rows in set (0.00 sec)
主键冲突场景测试
主键冲突场景下的ERROR,也会被LOAD DATA LOCAL INFILE
当作WARNING处理,该行不进行插入。
greatsql> LOAD DATA LOCAL INFILE '/tmp/t1.csv' INTO TABLE testdb.t2 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY "\r\n";
Query OK, 1 row affected, 2 warnings (0.01 sec)
Records: 2 Deleted: 0 Skipped: 1 Warnings: 2greatsql> SHOW WARNINGS;
+---------+------+---------------------------------------------+
| Level | Code | Message |
+---------+------+---------------------------------------------+
| Warning | 1265 | Data truncated for column 'c_date' at row 2 |
| Warning | 1062 | Duplicate entry '1' for key 't2.PRIMARY' |
+---------+------+---------------------------------------------+
2 rows in set (0.00 sec)greatsql> SELECT * FROM t2;
+----+------+--------+---------------------+------+
| c1 | c2 | c_date | c_ts | c_dt |
+----+------+--------+---------------------+------+
| 1 | 1 | NULL | 2024-06-05 09:26:58 | NULL |
+----+------+--------+---------------------+------+
1 row in set (0.00 sec)greatsql> SYSTEM CAT /tmp/t1.csv
1,1,\N,"2024-06-05 09:26:58",\N
1,2,null,"2024-06-05 09:26:58",\Ngreatsql> SHOW CREATE TABLE t2\G
*************************** 1. row ***************************Table: t2
Create Table: CREATE TABLE `t2` (`c1` int NOT NULL,`c2` int DEFAULT NULL,`c_date` date DEFAULT '2000-01-01',`c_ts` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,`c_dt` datetime DEFAULT '2000-01-01 00:00:00',PRIMARY KEY (`c1`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.00 sec)
三、问题总结
1、 问题原因
特性如此,属于正常情况。
2、 解决方案建议
1、数据预处理
在导入数据前,使用脚本或程序对数据进行预处理,确保所有日期值都是合法的。例如,可以使用正则表达式或日期解析库来验证和格式化日期。
2、导入验证
将含有日期非法值的数据文件通过导入程序的LOAD DATA LOCAL INFILE
方式导入时,会产生 Warning | 1265 | Data truncated for column 'a' at row 1
。建议导入程序可以在导入后通过SHOW WARINGS
语句检查下导入过程,如果出现类似warning
,则说明存在非法数据转换。
注意LOAD DATA LOCAL INFILE
无法指定数据异常时的处理机制,也无法通过设置字段默认值来指定,由数据库内部统一按照数据类型的默认值处理。
3、建议应用使用数据库以外的工具进行数据导入。
4、修改sql_mode,添加ALLOW_INVALID_DATES,允许不合法的 date 数据导入。
Enjoy GreatSQL 😃
关于 GreatSQL
GreatSQL是适用于金融级应用的国内自主开源数据库,具备高性能、高可靠、高易用性、高安全等多个核心特性,可以作为MySQL或Percona Server的可选替换,用于线上生产环境,且完全免费并兼容MySQL或Percona Server。
相关链接: GreatSQL社区 Gitee GitHub Bilibili
GreatSQL社区:
社区博客有奖征稿详情:https://greatsql.cn/thread-100-1-1.html
技术交流群:
微信:扫码添加
GreatSQL社区助手
微信好友,发送验证信息加群
。