作者:林靖华
爱可生服务团队成员,负责处理客户在MySQL日常运维中遇到的问题;擅长处理备份相关的问题,对数据库相关技术有浓厚的兴趣,喜欢钻研各种问题。
本文来源:原创投稿
*爱可生开源社区出品,原创内容未经授权不得随意使用,转载请联系小编并注明来源。
一、背景
生产环境有一套 MySQL 集群,架构为一主两从,其中一个从库设置了延迟复制,延迟时间为 1 天。
某天在巡检实例时,发现这个延迟从库延迟时间已经超过 1 天,且延迟不停的在增加,在监控上查看数据库状态是正常的,其他两台实例也没有出现问题。
登录数据库 show slave status 查看状态发现 IO 线程和 SQL 线程的状态都是 YES,但实际上 SQL 线程已经出现了报错,信息如下(部分信息已省略):
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Last_Errno: 1205
Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 11 failed executing transaction 'ANONYMOUS' at master log binlog.000338, end_log_pos 40433204. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Seconds_Behind_Master: 451836
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1205
Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 11 failed executing transaction 'ANONYMOUS' at master log binlog.000338, end_log_pos 40433204. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
SQL_Delay: 86400
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Waiting for workers to exit
这时候如果执行 stop slave,这个会话会卡住,kill 对应的线程也无法停止,只能通过 kill -9 MySQL server 来停止.
实例相关参数配置信息如下,数据库版本为 5.7.31。
slave_parallel_type = LOGICAL_CLOCK
slave_parallel_workers = 16
slave_preserve_commit_order = on
binlog_transaction_dependency_tracking = WRITESET
transaction_isolation = READ-COMMITTED
二、问题排查
show full processlist 的结果如下:
mysql> show full processlist;
+----------+-------------+-----------------+------+---------+--------+---------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----------+-------------+-----------------+------+---------+--------+---------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 75846950 | system user | | NULL | Connect | 3193 | Waiting for workers to exit | NULL |
| 75846951 | system user | | NULL | Connect | 29352 | Waiting for preceding transaction to commit | NULL

最低0.47元/天 解锁文章
582

被折叠的 条评论
为什么被折叠?



