1,问题描述
上次在zabbix中使用了percona的mysql插件来监控mysql5.7.11,见:http://blog.youkuaiyun.com/mchdba/article/details/51447750
,在监控的时候发现从库延迟落后很多(Seconds_Behind_Master: 3089)的时候,zabbix没有报警,那么问题在哪里?
2,分析问题
从本源上着手,去查看监控slave报警的脚本:
if [ "$ITEM" = "running-slave" ]; then # Check for running slave #RES=`HOME=~zabbix mysql -e 'SHOW SLAVE STATUS\G' | egrep '(Slave_IO_Running|Slave_SQL_Running):' | awk -F: '{print $2}' | tr '\n' ','` RES=`/usr/local/mysql/bin/mysql -S /usr/local/mysql/mysql.sock -e 'SHOW SLAVE STATUS\G' | egrep '(Slave_IO_Running|Slave_SQL_Running):' | awk -F: '{print $2}' | tr '\n' ','` if [ "$RES" = " Yes, Yes," ]; then echo 1 else echo 0 fi exit elif [ -e $CACHEFILE ]; then # Check and run the script #TIMEFLM=`stat -c %Y /tmp/zabbix/$HOST-mysql_cacti_stats.txt:3317` TIMEFLM=`stat -c %Y /tmp/$HOST-mysql_cacti_stats.txt:3317` TIMENOW=`date +%s` if [ `expr $TIMENOW - $TIMEFLM` -gt 300 ]; then #rm -f $CACHEFILE $CMD 2>&1 > /dev/null fi else $CMD 2>&1 > /dev/null fi |
看到它监控slave状态的实现方式是Slave_IO_Running|Slave_SQL_Running取双YES判断返回为1否则返回为0,而没有对Seconds_Behind_Master进行判断,这就是问题所在。
3,解决问题
需要在这个脚本的基础至少加上Seconds_Behind_Master的判断,修改脚本如下:
if [ "$ITEM" = "running-slave" ]; then # Check for running slave #RES=`HOME=~zabbix mysql -e 'SHOW SLAVE STATUS\G' | egrep '(Slave_IO_Running|Slave_SQL_Running):' | awk -F: '{print $2}' | tr '\n' ','` RES=`/usr/local/mysql/bin/mysql -e 'SHOW SLAVE STATUS\G' | egrep '(Slave_IO_Running|Slave_SQL_Running):' | awk -F: '{print $2}' | tr '\n' ','` # 添加对Seconds_Behind_Master的判断 SBM=`/usr/local/mysql/bin/mysql -S /usr/local/mysql/mysql.sock -e 'SHOW SLAVE STATUS\G' |grep -E "Seconds_Behind_Master"|awk '{print $2}'|grep 0 |wc |awk '{print $2}'` SBM_BEHIND=`/usr/local/mysql/bin/mysql -S /usr/local/mysql/mysql.sock -e 'SHOW SLAVE STATUS\G' |grep -E "Seconds_Behind_Master"|awk '{print $2}'`
if [ "$RES" = " Yes, Yes," ] && [ $SBM -lt 1 ]; then #if [ "$RES" = " Yes, Yes," ]; then echo 1 elif [ "$RES" = " Yes, Yes," ] && [ $SBM_BEHIND -lt 5 ] && [ $SBM -gt 0 ]; then #if [ "$RES" = " Yes, Yes," ]; then echo 1 else echo 0 fi exit elif [ -e $CACHEFILE ]; then # Check and run the script #TIMEFLM=`stat -c %Y /tmp/zabbix/$HOST-mysql_cacti_stats.txt:3317` TIMEFLM=`stat -c %Y /tmp/$HOST-mysql_cacti_stats.txt:3317` TIMENOW=`date +%s` if [ `expr $TIMENOW - $TIMEFLM` -gt 300 ]; then #rm -f $CACHEFILE $CMD 2>&1 > /dev/null fi else $CMD 2>&1 > /dev/null fi |
然后重启mysql的agentd客户端生效,以后如果Seconds_Behind_Master延迟太大就会报警出来了。