一、问题现象
线网环境mqq服务器运行一段时间后,出现CPU 100%,进而导致虚拟化平台报告警,服务不可用。
二、背景介绍
1、环境信息:
管理系统:redhed5.7 8C 16G 300G 2台
接口子系统:redhed5.7 8C 16G 300G 2台
接入系统:redhed5.7 8C 16G 300G 2台
数据库服务器:redhed5.7 16C 32G 4T 2台
缓存服务器; redhed5.7 16C 16G 300G 2台
2、配置信息:
管理系统配置参数:
JAVA_OPTS='-Xms4096m -Xmx8192m -XX:PermSize=256m -XX:MaxNewSize=1024m -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -Djava.awt.headless=true'
<Executor name="tomcatThreadPool" namePrefix="catalina-exec-"
maxThreads="500" minSpareThreads="20" maxIdleTime="60000"/>
<Connector executor="tomcatThreadPool" port="9005" protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8443"
enableLookups="false"
URIEncoding="UTF-8"
acceptCount="1000"/>
接口系统配置参数:
JAVA_OPTS='-Xms2048m -Xmx4096m -XX:PermSize=256m -XX:MaxNewSize=1024m -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -Djava.awt.headless=true'
接入系统配置参数:
java -Xms4096m -Xmx4096m -jar ./mqttServer.jar /usr/local/mqtt/properties/system.properties &
三、解决过程
1、jstack -F pid抓取堆栈信息
2、日志分析
Thread 12829: (state = IN_JAVA)
- java.util.HashMap.getEntry(java.lang.Object) @bci=81, line=465 (Compiled frame; information may be imprecise)
- java.util.HashMap.containsKey(java.lang.Object) @bci=2, line=449 (Compiled frame)
- com.avit.mqtt.MQTTServerHandler.userEventTriggered(io.netty.channel.ChannelHandlerContext, java.lang.Object) @bci=98, line=199 (Compiled frame)
- io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(java.lang.Object) @bci=16, line=315 (Compiled frame)
- io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(io.netty.channel.AbstractChannelHandlerContext, java.lang.Object) @bci=23, line=301 (Compiled frame)
- io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(java.lang.Object) @bci=5, line=293 (Compiled frame)
- io.netty.handler.timeout.IdleStateHandler.channelIdle(io.netty.channel.ChannelHandlerContext, io.netty.handler.timeout.IdleStateEvent) @bci=2, line=343 (Compiled frame)
- io.netty.handler.timeout.IdleStateHandler$AllIdleTimeoutTask.run() @bci=144, line=468 (Compiled frame)
- io.netty.util.concurrent.PromiseTask$RunnableAdapter.call() @bci=4, line=38 (Compiled frame)
- io.netty.util.concurrent.ScheduledFutureTask.run() @bci=46, line=120 (Compiled frame)
- io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(long) @bci=26, line=339 (Compiled frame)
- io.netty.channel.nio.NioEventLoop.run() @bci=137, line=393 (Compiled frame)
- io.netty.util.concurrent.SingleThreadEventExecutor$5.run() @bci=44, line=742 (Interpreted frame)
- io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() @bci=4, line=145 (Interpreted frame)
- java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
3、结论
多线程场景下不能使用hashMap,修改为concurrenthashmap后解决