发现一个主从复制的BUG,从库会自动跳过对主库来说不存在的列,看例子。
主库有一个sam表,其中id是自增主键:
mysql> desc sam;
+-------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| col1 | int(11) | YES | MUL | NULL | |
| col2 | int(11) | YES | | NULL | |
+-------+---------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
从库也有一个sam表,但是我通过set sql_log_bin=0将主键列删除了,其余两列和主库保持一致,这种情况在生产环境也可能存在的,并不是纯粹的实验:
mysql> desc sam;
+-------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| col1 | int(11) | YES | | NULL | |
| col2 | int(11) | YES | | NULL | |
+-------+---------+------+-----+---------+-------+
2 rows in set (0.00 sec)
此时主从两个库都没有数据,那么我在主库用以下方式插入两条数据:
mysql> insert into sam(col1,col2) values (2,3),(3,4);
Query OK, 2 rows affected (0.01 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql> select * from sam;
+----+------+------+
| id | col1 | col2 |
+----+------+------+
| 1 | 2 | 3 |
| 2 | 3 | 4 |
+----+------+------+
2 rows in set (0.00 sec)
这时看从库,竟然复制正常,而且sam表的数据看起来非常诡异:
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 127.0.0.1
Master_User: rep
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 2706
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 2919
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 2706
Relay_Log_Space: 3120
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_UUID: 0bf4e28a-1d6f-11ea-9a02-080027e07ac6
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 0bf4e28a-1d6f-11ea-9a02-080027e07ac6:1-8
Executed_Gtid_Set: 0bf4e28a-1d6f-11ea-9a02-080027e07ac6:1-8
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
mysql> select * from sam;
+------+------+
| col1 | col2 |
+------+------+
| 1 | 2 |
| 2 | 3 |
+------+------+
2 rows in set (0.00 sec)
发现什么规律?从库col1和col2的数据对应的是主库id和col1的数据,也就是从库将主库col2的数据主动忽略掉了。
这无疑是造成数据不一致的风险点,而且从库复制也没有报错,会让人无法察觉。
为什么会这样呢?在relay log可以看到主库传过来的binlog是下面这样的:
# at 2575
#200103 16:09:30 server id 1 end_log_pos 2427 CRC32 0x6d4d2e2e GTID last_committed=7 sequence_number=8 rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= '0bf4e28a-1d6f-11ea-9a02-080027e07ac6:8'/*!*/;
# at 2640
#200103 16:09:30 server id 1 end_log_pos 2498 CRC32 0x1d72f9b2 Query thread_id=15 exec_time=0 error_code=0
SET TIMESTAMP=1578038970/*!*/;
BEGIN
/*!*/;
# at 2711
#200103 16:09:30 server id 1 end_log_pos 2567 CRC32 0x7bbaa2e7 Rows_query
# insert into sam(col1,col2) values (2,3),(3,4)
# at 2780
#200103 16:09:30 server id 1 end_log_pos 2614 CRC32 0x10cd76fc Table_map: `sam`.`sam` mapped to number 111
# at 2827
#200103 16:09:30 server id 1 end_log_pos 2675 CRC32 0x3a06bc1f Write_rows: table id 111 flags: STMT_END_F
BINLOG '
uvYOXh0BAAAARQAAAAcKAACAAC1pbnNlcnQgaW50byBzYW0oY29sMSxjb2wyKSB2YWx1ZXMgKDIs
MyksKDMsNCnnorp7
uvYOXhMBAAAALwAAADYKAAAAAG8AAAAAAAEAA3NhbQADc2FtAAMDAwMABvx2zRA=
uvYOXh4BAAAAPQAAAHMKAAAAAG8AAAAAAAEAAgAD//gBAAAAAgAAAAMAAAD4AgAAAAMAAAAEAAAA
H7wGOg==
'/*!*/;
### INSERT INTO `sam`.`sam`
### SET
### @1=1 /* INT meta=0 nullable=0 is_null=0 */
### @2=2 /* INT meta=0 nullable=1 is_null=0 */
### @3=3 /* INT meta=0 nullable=1 is_null=0 */
### INSERT INTO `sam`.`sam`
### SET
### @1=2 /* INT meta=0 nullable=0 is_null=0 */
### @2=3 /* INT meta=0 nullable=1 is_null=0 */
### @3=4 /* INT meta=0 nullable=1 is_null=0 */
# at 2888
#200103 16:09:30 server id 1 end_log_pos 2706 CRC32 0x8816743b Xid = 123
COMMIT/*!*/;
也就是说,binlog里是用@1代表第一列,@2代表第二列,@3代表第三列,而不管列名是什么。
那么再看从库执行了relay log之后生成的binlog:
# at 2260
#200103 16:09:30 server id 1 end_log_pos 2325 CRC32 0x616ca58a GTID last_committed=7 sequence_number=8 rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= '0bf4e28a-1d6f-11ea-9a02-080027e07ac6:8'/*!*/;
# at 2325
#200103 16:09:30 server id 1 end_log_pos 2388 CRC32 0xf0d1f61d Query thread_id=15 exec_time=0 error_code=0
SET TIMESTAMP=1578038970/*!*/;
SET @@session.sql_mode=524288/*!*/;
BEGIN
/*!*/;
# at 2388
#200103 16:09:30 server id 1 end_log_pos 2457 CRC32 0x0b0b77ea Rows_query
# insert into sam(col1,col2) values (2,3),(3,4)
# at 2457
#200103 16:09:30 server id 1 end_log_pos 2503 CRC32 0xe7af7737 Table_map: `sam`.`sam` mapped to number 117
# at 2503
#200103 16:09:30 server id 1 end_log_pos 2556 CRC32 0xbb65d2dd Write_rows: table id 117 flags: STMT_END_F
BINLOG '
uvYOXh0BAAAARQAAAJkJAACAAC1pbnNlcnQgaW50byBzYW0oY29sMSxjb2wyKSB2YWx1ZXMgKDIs
MyksKDMsNCnqdwsL
uvYOXhMBAAAALgAAAMcJAAAAAHUAAAAAAAEAA3NhbQADc2FtAAIDAwADN3ev5w==
uvYOXh4BAAAANQAAAPwJAAAAAHUAAAAAAAEAAgAC//wBAAAAAgAAAPwCAAAAAwAAAN3SZbs=
'/*!*/;
### INSERT INTO `sam`.`sam`
### SET
### @1=1 /* INT meta=0 nullable=1 is_null=0 */
### @2=2 /* INT meta=0 nullable=1 is_null=0 */
### INSERT INTO `sam`.`sam`
### SET
### @1=2 /* INT meta=0 nullable=1 is_null=0 */
### @2=3 /* INT meta=0 nullable=1 is_null=0 */
# at 2556
#200103 16:09:30 server id 1 end_log_pos 2587 CRC32 0x65411b02 Xid = 173
COMMIT/*!*/;
可以看到从库的binlog里并没有@3,只是机械地将主库binlog的@1和@2往自己的第一列和第二列里插入,而不管列名是否对应。
那么这是否跟自增键有关系呢?如果id不是自增键会是怎样?我们也捏造一个测试实验。
主库一个sam2表,并没有显示的主键列和自增键:
mysql> desc sam2;
+-------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| col1 | int(11) | YES | | NULL | |
| col2 | int(11) | YES | | NULL | |
+-------+---------+------+-----+---------+-------+
3 rows in set (0.00 sec)
从库也是一个sam2表,但少了id列:
mysql> desc sam2;
+-------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| col1 | int(11) | YES | | NULL | |
| col2 | int(11) | YES | | NULL | |
+-------+---------+------+-----+---------+-------+
2 rows in set (0.00 sec)
我们用以下方式在主库插入sam2的一条数据:
mysql> insert into sam2 values(1,2,3);
Query OK, 1 row affected (0.01 sec)
mysql> select * from sam2;
+------+------+------+
| id | col1 | col2 |
+------+------+------+
| 1 | 2 | 3 |
+------+------+------+
1 row in set (0.00 sec)
看一下从库,依然没有报错,并且是主库sam2表前两列的数据:
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 127.0.0.1
Master_User: rep
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 3203
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 3416
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 3203
Relay_Log_Space: 3617
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_UUID: 0bf4e28a-1d6f-11ea-9a02-080027e07ac6
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 0bf4e28a-1d6f-11ea-9a02-080027e07ac6:1-10
Executed_Gtid_Set: 0bf4e28a-1d6f-11ea-9a02-080027e07ac6:1-10
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
mysql> select * from sam2;
+------+------+
| col1 | col2 |
+------+------+
| 1 | 2 |
+------+------+
1 row in set (0.00 sec)
所以得知,从库复制insert操作,只是按照第一列第二列的顺序将主库的数据插入到从库的表,并不会依照列名一一对应,而且从库缺少了列,会直接忽略相关数据,而不报错。
我们也将从库的列改一下名字,看一下是否列名是否会影响复制:
mysql> alter table sam2 change col1 column1 int;
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> alter table sam2 change col2 column3 int;
Query OK, 0 rows affected (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> desc sam2;
+---------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+---------+------+-----+---------+-------+
| column1 | int(11) | YES | | NULL | |
| column3 | int(11) | YES | | NULL | |
+---------+---------+------+-----+---------+-------+
2 rows in set (0.00 sec)
主库插入数据:
mysql> insert into sam2 values(2,3,4);
Query OK, 1 row affected (0.01 sec)
mysql> select * from sam2;
+------+------+------+
| id | col1 | col2 |
+------+------+------+
| 1 | 2 | 3 |
| 2 | 3 | 4 |
+------+------+------+
2 rows in set (0.00 sec)
从库复制同样没有报错:
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 127.0.0.1
Master_User: rep
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 3520
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 3733
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 3520
Relay_Log_Space: 3934
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_UUID: 0bf4e28a-1d6f-11ea-9a02-080027e07ac6
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 0bf4e28a-1d6f-11ea-9a02-080027e07ac6:1-11
Executed_Gtid_Set: 0bf4e28a-1d6f-11ea-9a02-080027e07ac6:1-11
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
mysql> select * from sam2;
+---------+---------+
| column1 | column3 |
+---------+---------+
| 1 | 2 |
| 2 | 3 |
+---------+---------+
2 rows in set (0.00 sec)
以上是insert的测试,那么对于update呢?
主库执行update:
mysql> update sam2 set id=3 where id=1 and col1=2 and col2=3;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select * from sam2;
+------+------+------+
| id | col1 | col2 |
+------+------+------+
| 3 | 2 | 3 |
| 2 | 3 | 4 |
+------+------+------+
2 rows in set (0.00 sec)
从库依然没有报错:
mysql> select * from sam2;
+---------+---------+
| column1 | column3 |
+---------+---------+
| 3 | 2 |
| 2 | 3 |
+---------+---------+
2 rows in set (0.00 sec)
那么对于update,我们就可以拓展一下了,如果主库的id和col1是相同的数据,而col2是不同的数据,也就是整行数据是不一样,那么对于update在从库会是怎么样呢?例如
主库第二行和第三行的前两列是一样的:
mysql> select * from sam2;
+------+------+------+
| id | col1 | col2 |
+------+------+------+
| 3 | 2 | 3 |
| 2 | 3 | 4 |
| 2 | 3 | 5 |
+------+------+------+
3 rows in set (0.00 sec)
自然从库的前两行也会是一样的:
+---------+---------+
| column1 | column3 |
+---------+---------+
| 3 | 2 |
| 2 | 3 |
| 2 | 3 |
+---------+---------+
3 rows in set (0.01 sec)
那么对于在主库执行update sam2 set col1=10 where id=2 and col1=3 and col2=4
这样的语句,主库是只更新一行,但是从库会不会第二和第三行都更新呢?
答案是不会的,主库执行以后数据是:
mysql> select * from sam2;
+------+------+------+
| id | col1 | col2 |
+------+------+------+
| 3 | 2 | 3 |
| 2 | 10 | 4 |
| 2 | 3 | 5 |
+------+------+------+
3 rows in set (0.00 sec)
从库复制后数据是:
mysql> select * from sam2;
+---------+---------+
| column1 | column3 |
+---------+---------+
| 3 | 2 |
| 2 | 10 |
| 2 | 3 |
+---------+---------+
3 rows in set (0.00 sec)
所以这时候从库复制还是遵从binlog的行格式,只更新一行数据。
经过测试,delete也不会报错。