最近为了恢复生产和经济,国家希望人们多多消费,为此出台了一系列补贴政策,简单总结就是:
- 买买买
- 走走走
- 吃吃吃
- 玩玩玩
但是本文的议题是:如何使用脚本检测ActiveMQ的死活?然后重启。
定义死活的概念
- 死得很彻底:pid都没了
- 进程还在,只是不能正常使用,比如发布消息,这种情况有很多原因,比如内存溢出:
2020-06-01 23:37:42,480 | INFO | Ignoring no space left exception, java.io.IOException: Java heap space | org.apache.activemq.util.DefaultIOExceptionHandler | ActiveMQ Journal Checkpoint Worker java.io.IOException: Java heap space at org.apache.activemq.util.IOExceptionSupport.create(IOExceptionSupport.java:40)[activemq-client-5.15.9.jar:5.15.9] at org.apache.activemq.store.kahadb.MessageDatabase$CheckpointRunner.run(MessageDatabase.java:451)[activemq-kahadb-store-5.15.9.jar:5.15.9] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)[:1.8.0_65] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)[:1.8.0_65] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)[:1.8.0_65] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)[:1.8.0_65] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)[:1.8.0_65] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)[:1.8.0_65] at java.lang.Thread.run(Thread.java:745)[:1.8.0_65]
如何检测这种状态
实践证明,ActiveMQ很容易出现【假死】状态,以下是尝试过的几种方式。
通过activemq自带的检测接口
基于【jolokia】的HTTP接口来使用JMX。
网上也有很多人提出用这种方法:
curl -u 用户名:密码 -s http://localhost:8161/api/jolokia/exec/org.apache.activemq:type=Broker,brokerName=localhost,service=Health/healthStatus | jq -r '.value')"
然后,在我的场景下,客户端连接时出现Session is closed问题,但这里返回正常。
于是,我采用了暴力检测。
直接通过发消息来检测
activemq提供了命令行工具,所以可以直接使用:
bin/activemq producer --user xxx --password xxx --destination check_test_queue --messageCount 2
这里发送2条测试消息到check_test_queue这个队列,如果正常,最后会成功并退出,否则,会出现下面的错误:
INFO | Connecting to URL: failover://tcp://localhost:61616 as user: xxx
INFO | Producing messages to check_test_queue
INFO | Using persistent messages
INFO | Sleeping between sends 0 ms
INFO | Running 1 parallel threads
INFO | Successfully connected to tcp://localhost:61616
WARN | Transport (tcp://localhost:61616) failed , attempting to automatically reconnect: {}
java.io.IOException: Wire format negotiation timeout: peer did not send his wire format.
at org.apache.activemq.transport.WireFormatNegotiator.oneway(WireFormatNegotiator.java:99)[activemq-client-5.15.9.jar:5.15.9]
at org.apache.activemq.transport.failover.FailoverTransport.oneway(FailoverTransport.java:668)[activemq-client-5.15.9.jar:5.15.9]
at org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68)[activemq-client-5.15.9.jar:5.15.9]
at org.apache.activemq.transport.ResponseCorrelator.asyncRequest(ResponseCorrelator.java:81)[activemq-client-5.15.9.jar:5.15.9]
at org.apache.activemq.transport.ResponseCorrelator.request(ResponseCorrelator.java:86)[activemq-client-5.15.9.jar:5.15.9]
at org.apache.activemq.ActiveMQConnection.syncSendPacket(ActiveMQConnection.java:1392)[activemq-client-5.15.9.jar:5.15.9]
at org.apache.activemq.ActiveMQConnection.ensureConnectionInfoSent(ActiveMQConnection.java:1486)[activemq-client-5.15.9.jar:5.15.9]
at org.apache.activemq.ActiveMQConnection.start(ActiveMQConnection.java:527)[activemq-client-5.15.9.jar:5.15.9]
at org.apache.activemq.console.command.ProducerCommand.runTask(ProducerCommand.java:61)[activemq-console-5.15.9.jar:5.15.9]
at org.apache.activemq.console.command.AbstractCommand.execute(AbstractCommand.java:63)[activemq-console-5.15.9.jar:5.15.9]
at org.apache.activemq.console.command.ShellCommand.runTask(ShellCommand.java:154)[activemq-console-5.15.9.jar:5.15.9]
at org.apache.activemq.console.command.AbstractCommand.execute(AbstractCommand.java:63)[activemq-console-5.15.9.jar:5.15.9]
at org.apache.activemq.console.command.ShellCommand.main(ShellCommand.java:104)[activemq-console-5.15.9.jar:5.15.9]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)[:1.8.0_65]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)[:1.8.0_65]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)[:1.8.0_65]
at java.lang.reflect.Method.invoke(Method.java:497)[:1.8.0_65]
at org.apache.activemq.console.Main.runTaskClass(Main.java:262)[activemq.jar:5.15.9]
at org.apache.activemq.console.Main.main(Main.java:115)[activemq.jar:5.15.9]
因为会卡住,为了使他退出,再加个超时参数:
timeout 10 activemq producer --user admin --password 8PpXb8rN --destination check_test_queue --messageCount 2
好,这就是主体内容了。
附录
一个简单的检测脚本:
# 使用命令行生产者命令发送消息,如果10秒没发送成功,则代表挂了
timeout 10 bin/activemq producer --user xxx --password xxx --destination check_test_queue --messageCount 2
code=$(echo $?)
d=$(date)
if [[ "$code" -ne 0 ]];then
echo "$d:获取MQ状态异常"
bin/activemq restart
else
echo "$d:MQ正常"
fi
然后用corntab每5分钟定时检测:
*/5 * * * * /apache-activemq-5.15.9/check-mq.sh >> /apache-activemq-5.15.9/check.log 2>&1 &

本文介绍了一种通过发送测试消息来检测ActiveMQ消息队列健康状况的方法,并提供了一个用于自动重启的脚本。
571

被折叠的 条评论
为什么被折叠?



