一次排查线上线程池数量过高的报警经历

本文记录了一次线上线程池数量过高引发的报警经历。通过jstack分析,发现大量http-nio线程在等待,由于商家活动导致优惠券接口调用量激增,使得线程池达到1000以上,远超正常水平。解决方案包括tomcat线程池调优以应对高流量,并考虑机器扩容。临时措施是将问题机器从nginx负载均衡中移除,线程数随即恢复正常。

 

线上问题图

线上jstack查看正常机器和问题机器对比堆栈信息,发现大量的http-nio-1601-exec-线程在等待,查看线上监控发现优惠券接口调用量飙升,中午商家做活动抢券,根据线程名字可以看出是tomcat的线程池被打满了,线上监控显示线程数超过1000,比正常机器线程数超出500左右,对比上图基本确定原因。

进一步需要做tomcat线程池调优,应对每周一的高流量请求,甚至需要机器扩容,当机器挂在nginx上时,源源不断的请求进入,导致线程数持续不下降。(查看了下线上的tomcat配置,最大线程数好像是1000)

暂时解决方案:暂时将问题机器从nginx摘下后,问题机器线程数很快下降至正常值

2020-01-06 14:14:54
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.20-b23 mixed mode):

"Attach Listener" #36860 daemon prio=9 os_prio=0 tid=0x00007fb39c002000 nid=0x43bc7 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"ForkJoinPool.commonPool-worker-21" #36809 daemon prio=5 os_prio=0 tid=0x00007fb004f4e000 nid=0x40eb4 waiting on condition [0x00007fafee581000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000074423ef68> (a java.util.concurrent.ForkJoinPool)
	at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1756)
	at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1696)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1644)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

"ForkJoinPool.commonPool-worker-19" #36700 daemon prio=5 os_prio=0 tid=0x00007fb004635800 nid=0x3b0c8 waiting on condition [0x00007faffbd92000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000074423ef68> (a java.util.concurrent.ForkJoinPool)
	at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1756)
	at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1696)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1644)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

"ForkJoinPool.commonPool-worker-6" #36697 daemon prio=5 os_prio=0 tid=0x00007fb006892000 nid=0x3b002 waiting on condition [0x00007fafc2eef000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000074423ef68> (a java.util.concurrent.ForkJoinPool)
	at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1756)
	at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1696)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1644)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

"ForkJoinPool.commonPool-worker-28" #36698 daemon prio=5 os_prio=0 tid=0x00007fb006890800 nid=0x3b001 waiting on condition [0x00007fafc2ff0000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000074423ef68> (a java.util.concurrent.ForkJoinPool)
	at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1756)
	at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1696)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1644)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

"http-nio-1601-exec-570" #36696 daemon prio=5 os_prio=0 tid=0x00007fb00688c000 nid=0x3affa waiting on condition [0x00007fafc30f1000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000748ae6e78> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:85)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:31)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:745)

"http-nio-1601-exec-569" #36695 daemon prio=5 os_prio=0 tid=0x00007fb006888000 nid=0x3aff9 waiting on condition [0x00007fafc31f2000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000748ae6e78> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:85)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:31)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:745)

"http-nio-1601-exec-568" #36694 daemon prio=5 os_prio=0 tid=0x00007fb006883800 nid=0x3aff8 waiting on condition [0x00007fafc32f3000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000748ae6e78> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:85)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:31)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:745)

	at java.lang.Thread.run(Thread.java:745)



"http-nio-1601-exec-484" #36578 daemon prio=5 os_prio=0 tid=0x00007fb00649e800 nid=0x3aec3 waiting on condition [0x00007fafd13d3000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000748ae6e78> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:85)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:31)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:745)

"http-nio-1601-exec-483" #36577 daemon prio=5 os_prio=0 tid=0x00007fb00462d800 nid=0x3aec2 waiting on condition [0x00007fafff6c3000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000748ae6e78> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:85)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:31)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:745)

"http-nio-1601-exec-482" #36576 daemon prio=5 os_prio=0 tid=0x00007fb005a75000 nid=0x3aec1 waiting on condition [0x00007fb2fa0eb000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000748ae6e78> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:85)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:31)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:745)

"ForkJoinPool.commonPool-worker-31" #36569 daemon prio=5 os_prio=0 tid=0x00007fb0056cf800 nid=0x3ac19 waiting on condition [0x00007faff06f0000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000074423ef68> (a java.util.concurrent.ForkJoinPool)
	at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1756)
	at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1696)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1644)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

"ForkJoinPool.commonPool-worker-17" #36570 daemon prio=5 os_prio=0 tid=0x00007fb004911800 nid=0x3ac18 waiting on condition [0x00007fafe9349000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000074423ef68> (a java.util.concurrent.ForkJoinPool)
	at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1756)
	at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1696)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1644)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

"ForkJoinPool.commonPool-worker-18" #36568 daemon prio=5 os_prio=0 tid=0x00007fb0066dd800 nid=0x3aba6 waiting on condition [0x00007fafca464000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000074423ef68> (a java.util.concurrent.ForkJoinPool)
	at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1756)
	at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1696)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1644)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

"ForkJoinPool.commonPool-worker-25" #36567 daemon prio=5 os_prio=0 tid=0x00007fb0066db800 nid=0x3aba5 waiting on condition [0x00007fafca565000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000074423ef68> (a java.util.concurrent.ForkJoinPool)
	at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1756)
	at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1696)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1644)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

"http-nio-1601-exec-481" #36566 daemon prio=5 os_prio=0 tid=0x00007fb0066d7800 nid=0x3aba4 waiting on condition [0x00007fafca666000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000748ae6e78> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:85)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:31)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:745)

"http-nio-1601-exec-480" #36565 daemon prio=5 os_prio=0 tid=0x00007fb0066d3000 nid=0x3aba3 waiting on condition [0x00007fafca767000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000748ae6e78> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:85)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:31)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:745)

"http-nio-1601-exec-479" #36564 daemon prio=5 os_prio=0 tid=0x00007fb0066cf000 nid=0x3aba2 waiting on condition [0x00007fafca868000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000748ae6e78> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:85)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:31)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:745)

"http-nio-1601-exec-478" #36563 daemon prio=5 os_prio=0 tid=0x00007fb0066ca800 nid=0x3aba1 waiting on condition [0x00007fafca969000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000748ae6e78> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:85)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:31)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:745)

"http-nio-1601-exec-477" #36562 daemon prio=5 os_prio=0 tid=0x00007fb0066c6800 nid=0x3aba0 waiting on condition [0x00007fafcaa6a000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000748ae6e78> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:85)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:31)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:745)

"http-nio-1601-exec-476" #36561 daemon prio=5 os_prio=0 tid=0x00007fb0066c2800 nid=0x3ab9f waiting on condition [0x00007fafcab6b000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000748ae6e78> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:85)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:31)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:745)

"http-nio-1601-exec-475" #36560 daemon prio=5 os_prio=0 tid=0x00007fb0066be000 nid=0x3ab9e waiting on condition [0x00007fafcac6c000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000748ae6e78> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:85)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:31)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:745)

"http-nio-1601-exec-473" #36558 daemon prio=5 os_prio=0 tid=0x00007fb0066b3800 nid=0x3ab9d waiting on condition [0x00007fafcad6d000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000748ae6e78> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:85)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:31)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:745)

"http-nio-1601-exec-474" #36559 daem
<think>嗯,用户询问的是CPU负载过的运维处理方法。根据引用内容,我需要重点关注Linux环境下CPU负载但利用率不一定的情况。引用[2]提到一个典型案例:负载持续达5但wa(iowait)为0,排除了磁盘I/O瓶颈的可能性,这提示负载的原因可能比较复杂。 用户可能是运维工程师,正在处理线上故障,情绪可能比较焦虑。需要给出既专业又容易操作的排查步骤。引用[3]关于1/5/15分钟负载值的解释很有价值,应该纳入判断标准。 引用[4]的sar数据显示%idle达99.87%,说明CPU资源充足但负载,这种矛盾现象通常指向进程问题。结合这个特点,我计划从六个层面构建排查体系: 首先是负载确认环节,要强多指标验证。引用[2]的top命令和引用[3]的三个时间维度负载值需要组合观察,还要补充vmstat的r值作为交叉验证。这里可以加入计算公式:当1分钟负载值持续大于CPU核心数就需警惕。 线程分析部分要突出两种诊断工具的组合使用。top的shift+H切换线程模式对应引用[2]的排查起点,但需要补充pstree展示进程树关系,这对排查"僵尸进程引爆负载"的场景特别有效。 锁竞争排查需要解释清楚概念。引用虽然没直接提及,但负载低利用率往往由锁竞争导致,所以加入jstack和pidstat的锁统计方法,这是Java应用特有的排查点。 系统用追踪层面,strace和perf工具能验证引用[2]中"非I/O等待"的假设。特别是当%sys较时,系统用次数激增可能是元凶,比如log4j重复打印堆栈就会引发这种状况。 资源限制排查容易被忽视,其实ulimit限制和cgroup配置错误经常导致"伪负载",这在容器化环境尤为常见。 最后化部分要区分应急措施和根治方案。引用[1]提到的Java应用背景,应该强Heap配置和GC。对于无法立即解决的场景,给出临时扩容的具体操作示例,如k8s的HPA配置语法。 相关问题设计成阶梯式:从基础监控到进阶,覆盖用户可能的后续需求。特别是第3个线程死锁问题,直接关联引用[2]的异常案例,帮助用户举一反三。 </think>根据引用内容和典型运维经验,CPU负载过的处理流程如下(结合Linux环境): --- ### **一、确认负载与CPU资源状态** 1. **查看实时负载** ```bash top # 查看load average三个值(1/5/15分钟) uptime # 快速获取负载值 ``` - 判断标准(引用[3]): - 当负载持续 **> CPU逻辑核心数**(如4核机器负载>4)时需警惕 - 若15分钟负载持续位,表明问题已存在较长时间 2. **分析CPU使用分布** ```bash top → 按1 # 查看每个核心的利用率 ``` - 关键指标(引用[2][4]): - `%us`(用户态)过 → 应用代码问题 - `%sy`(内核态)过 → 系统用频繁 - `%wa`(I/O等待) → 磁盘瓶颈(但引用[2]案例中wa=0) - `%id`(空闲)持续低于10% → CPU资源紧张 --- ### **二、针对性排查流程** #### ▶ **场景1:负载但CPU利用率不**(如引用[2]案例) 1. **检查阻塞型I/O** ```bash iostat -x 2 # 查看%util和await ``` - 若磁盘`%util > 80%`且`await`值 → I/O瓶颈 2. **排查不可中断进程**(常见于磁盘/网络阻塞) ```bash top → 查看D状态进程(不可中断睡眠) ps aux | grep ' D ' ``` 3. **检查线程阻塞** ```bash pidstat -t -p <PID> 1 # 监控特定进程的线程 ``` #### ▶ **场景2:CPU利用率飙** 1. **定位CPU进程** ```bash top → 按P(CPU排序) ``` 2. **分析进程内部状态** - Java应用(引用[1]): ```bash jstack <PID> > thread_dump.txt # 抓取线程栈 jstat -gcutil <PID> 1000 # 监控GC状态(频繁Full GC会导致CPU飙) ``` - 通用程序: ```bash perf top # 查看函数级CPU占用 strace -p <PID> -c # 统计系统用 ``` --- ### **三、问题解决方案** | **问题类型** | **验证命令** | **解决策略** | |---------------------|------------------------------|------------------------------------------| | **线程死锁** | `jstack`分析线程阻塞链 | 修复代码同步逻辑,化锁机制 | | **频繁GC** | `jstat`观察GC频率 | 整JVM堆大小,化GC算法(如G1) | | **代码循环异常** | `perf top`定位热点函数 | 修复死循环/低效算法 | | **系统用过多** | `strace -c`统计用 | 减少IO操作,改用异步读写 | | **磁盘I/O瓶颈** | `iostat -x`查看await | 升级SSD,化SQL查询,增加缓存 | | **外部依赖阻塞** | `netstat -antp`查网络连接 | 化第三方API用,添加超时熔断机制 | --- ### **四、关键化措施** 1. **应用层** - Java程序:整JVM参数(`-Xmx`, `-XX:+UseG1GC`) - 线程池:合理设置核心/最大线程数(避免线程爆炸) - 缓存:引入Redis/Memcached减少重复计算 2. **系统层** ```bash echo "net.ipv4.tcp_tw_reuse=1" >> /etc/sysctl.conf # 化TCP连接 sysctl -p ``` 3. **监控增强** - 部署Prometheus+Grafana监控负载趋势 - 设置报警规则(如15分钟负载>核心数*2持续5分钟) > **引用补充**:当负载但`%wa`正常时(如引用[2][4]),需重点排查**锁竞争、线程阻塞、度延迟**等非I/O问题 [^2][^3]。 --- ### **
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值