当Java Web程序运行变慢,或者发生故障时,需要使用Thread Dumps. 如果你觉得ThreadDumps非常复杂,这篇文章很可能帮助你。将会分析Java中的线程,线程如何创建的,如何管理线程,怎么从运行中的程序中dump 线程,最后怎么分析他们得到阻塞和存在瓶颈的线程。本文是在应用程序调试下得到的结果。
Java和线程
一个web server使用几十到几百条线程去处理大量的并发用户。如果多条线程使用共享的资源,无法避免线程之间对数据的竞争,有时候还会发生死锁。
线程竞争是web程序上不同的线程去访问共享资源,一条线程等待另外线程释放锁。例如,在记录log的时候,线程记录log时,必须先获得锁,然后去再访问共享资源。
死锁是一种特殊的线程竞争,两个或多个线程要完成自己的任务,都要必须要等待其他的线程完成他们的任务。
线程竞争会带来各种不同的问题,为了分析这些问题,需要使用Thread Dump。Thread Dump记录了每个线程真正的状态。
Java线程的背景
线程同步
多条线程之间可以同时执行,为了确保多线程在使用共享资源上面的通用性,使用线程同步保证在同一时间只能有一条线程可以访问共享资源。
线程同步在Java中可以使用监视器。每个Java对象都有一个监视器,这个监视器只能被一个线程拥有。当一个线程要获得另外线程拥有的监视器时,需要进入等待队列直到线程释放监视器。
线程的状态
为了分析Thread Dump ,需要先了解线程的状态。线程的状态是在java.lang.Thread.State中。
NEW:线程被创建但是还没有被执行
RUNNABLE:线程正在占用cpu并且在执行任务
BLOCKED:线程为了获得监视器需要等待其他线程释放锁
WAITING:调用了wait,join,park方法使线程等待-无限期等待
TIMED_WAITING:调用了sleep,wait,join,park方法使线程等待--有限期等待
线程类型
java中线程可以分为两种:
1. 后台线程
2. 非后台线程
当没有其他的非后台线程运行时后台线程将会终止。即使你不创建线程,java应用默认也会创建很多线程。这些大多数都是后台线程,主要为了执行gc或者jmx等类型的任务
从 'static void main(String[] args)’方法中开启的线程叫做非后台线程,当这些线程停止时,其他的所有后台线程也会停止()
获得一个Thread Dump
将会介绍三种常用的方法。请注意还会有其他很多方法可以获取Thread Dump。一个Thread dump仅仅可以显示测量时的线程状态。所以为了查看线程状态的变化,建议5到10次,每次间隔5秒。
使用jstack获得Thread Dump
通过使用jps命令来获得当前正在运行的Java程序的PID
[user@linux ~]$ jps -v 25780 RemoteTestRunner -Dfile.encoding=UTF-8 25590 sub.rmi.registry.RegistryImpl 2999 -Dapplication.home=/home1/user/java/jdk.1.6.0_24 -Xms8m 26300 sun.tools.jps.Jps -mlvV -Dapplication.home=/home1/user/java/jdk.1.6.0_24 -Xms8m
使用PID作为jstack的参数获得Thread Dump
[user@linux ~]$ jstack -f 5824
在Linux终端中生成
通过使用ps -ef命令去获得当前正在运行的Java进程
[user@linux ~]$ ps - ef | grep java
user 2477 1 0 Dec23 ? 00:10:45 ...
user 25780 25361 0 15:02 pts/3 00:00:02 ./jstatd -J -Djava.security.policy=jstatd.all.policy -p 2999
user 26335 25361 0 15:49 pts/3 00:00:00 grep java
Use the extracted pid as the parameter of kill –SIGQUIT(3) to obtain a thread dump.
Thread Information from the Thread Dump File
"pool-1-thread-13" prio=6 tid=0x000000000729a000 nid=0x2fb4 runnable [0x0000000007f0f000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:264) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158) - locked <0x0000000780b7e688> (a java.io.InputStreamReader) at java.io.InputStreamReader.read(InputStreamReader.java:167) at java.io.BufferedReader.fill(BufferedReader.java:136) at java.io.BufferedReader.readLine(BufferedReader.java:299) - locked <0x0000000780b7e688> (a java.io.InputStreamReader) at java.io.BufferedReader.readLine(BufferedReader.java:362)
Thread name:当使用 Java.lang.Thread类去生成一个线程,将被命名为Thre-(Number),然而当使用java.util.concurrent,ThreadFactory类,将会被命名为pool-(Number)-thread-(Number)
Priority:表示线程的优先级
Thread ID:代表线程的唯一id。(通过线程id可以获得一些有用的信息,包括cpu使用率,或者内存使用率)
Thread status:代表线程的状态
Thread callstack:代表线程调用的堆栈信息
Thread Dump模式的类型
当无法获得一个锁(阻塞)
当一个线程占领住锁而其他线程无法获得这个锁,而导致应用程序所有的性能都下降。在下面的例子中,
BLOCKED_TEST pool-1-thread-1 线程运行时获得<0x0000000780a000b0>锁, 同时BLOCKED_TEST pool-1-thread-2 和 BLOCKED_TEST pool-1-thread-3正在等待获得<0x0000000780a000b0>锁
"BLOCKED_TEST pool-1-thread-1" prio=6 tid=0x0000000006904800 nid=0x28f4 runnable [0x000000000785f000] java.lang.Thread.State: RUNNABLE at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:282) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) - locked <0x0000000780a31778> (a java.io.BufferedOutputStream) at java.io.PrintStream.write(PrintStream.java:432) - locked <0x0000000780a04118> (a java.io.PrintStream) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:272) at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:85) - locked <0x0000000780a040c0> (a java.io.OutputStreamWriter) at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:168) at java.io.PrintStream.newLine(PrintStream.java:496) - locked <0x0000000780a04118> (a java.io.PrintStream) at java.io.PrintStream.println(PrintStream.java:687) - locked <0x0000000780a04118> (a java.io.PrintStream) at com.nbp.theplatform.threaddump.ThreadBlockedState.monitorLock(ThreadBlockedState.java:44) - locked <0x0000000780a000b0> (a com.nbp.theplatform.threaddump.ThreadBlockedState) at com.nbp.theplatform.threaddump.ThreadBlockedState$1.run(ThreadBlockedState.java:7) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Locked ownable synchronizers: - <0x0000000780a31758> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) "BLOCKED_TEST pool-1-thread-2" prio=6 tid=0x0000000007673800 nid=0x260c waiting for monitor entry [0x0000000008abf000] java.lang.Thread.State: BLOCKED (on object monitor) at com.nbp.theplatform.threaddump.ThreadBlockedState.monitorLock(ThreadBlockedState.java:43) - waiting to lock <0x0000000780a000b0> (a com.nbp.theplatform.threaddump.ThreadBlockedState) at com.nbp.theplatform.threaddump.ThreadBlockedState$2.run(ThreadBlockedState.java:26) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Locked ownable synchronizers: - <0x0000000780b0c6a0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) "BLOCKED_TEST pool-1-thread-3" prio=6 tid=0x00000000074f5800 nid=0x1994 waiting for monitor entry [0x0000000008bbf000] java.lang.Thread.State: BLOCKED (on object monitor) at com.nbp.theplatform.threaddump.ThreadBlockedState.monitorLock(ThreadBlockedState.java:42) - waiting to lock <0x0000000780a000b0> (a com.nbp.theplatform.threaddump.ThreadBlockedState) at com.nbp.theplatform.threaddump.ThreadBlockedState$3.run(ThreadBlockedState.java:34) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Locked ownable synchronizers: - <0x0000000780b0e1b8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
当是死锁的状态
线程A需要获得线程B的锁才能继续执行任务,同时线程B需要获得线程A的锁才能继续执行任务。在Thread Dump中,可以发现 DEADLOCK_TEST-1 线程拥有0x00000007d58f5e48锁,并且试着去获取0x00000007d58f5e60这把锁。另外 DEADLOCK_TEST-2 线程拥有0x00000007d58f5e60锁,并且尝试获取0x00000007d58f5e78锁。,DEADLOCK_TEST-3 线程拥有0x00000007d58f5e78锁,并且尝试获得0x00000007d58f5e48锁。可以看得出来,每个线程都在等待另外线程的锁,这种状态知道一个线程放弃锁之前都不会被改变。
"DEADLOCK_TEST-1" daemon prio=6 tid=0x000000000690f800 nid=0x1820 waiting for monitor entry [0x000000000805f000] java.lang.Thread.State: BLOCKED (on object monitor) at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.goMonitorDeadlock(ThreadDeadLockState.java:197) - waiting to lock <0x00000007d58f5e60> (a com.nbp.theplatform.threaddump.ThreadDeadLockState$Monitor) at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.monitorOurLock(ThreadDeadLockState.java:182) - locked <0x00000007d58f5e48> (a com.nbp.theplatform.threaddump.ThreadDeadLockState$Monitor) at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.run(ThreadDeadLockState.java:135) Locked ownable synchronizers: - None "DEADLOCK_TEST-2" daemon prio=6 tid=0x0000000006858800 nid=0x17b8 waiting for monitor entry [0x000000000815f000] java.lang.Thread.State: BLOCKED (on object monitor) at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.goMonitorDeadlock(ThreadDeadLockState.java:197) - waiting to lock <0x00000007d58f5e78> (a com.nbp.theplatform.threaddump.ThreadDeadLockState$Monitor) at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.monitorOurLock(ThreadDeadLockState.java:182) - locked <0x00000007d58f5e60> (a com.nbp.theplatform.threaddump.ThreadDeadLockState$Monitor) at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.run(ThreadDeadLockState.java:135) Locked ownable synchronizers: - None "DEADLOCK_TEST-3" daemon prio=6 tid=0x0000000006859000 nid=0x25dc waiting for monitor entry [0x000000000825f000] java.lang.Thread.State: BLOCKED (on object monitor) at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.goMonitorDeadlock(ThreadDeadLockState.java:197) - waiting to lock <0x00000007d58f5e48> (a com.nbp.theplatform.threaddump.ThreadDeadLockState$Monitor) at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.monitorOurLock(ThreadDeadLockState.java:182) - locked <0x00000007d58f5e78> (a com.nbp.theplatform.threaddump.ThreadDeadLockState$Monitor) at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.run(ThreadDeadLockState.java:135) Locked ownable synchronizers: - None
持续等待来自远程服务的信息
线程看起来是正常的,因为它的状态一直都是RUNNABLE,然而当你将thread dump按时间有序的排列,你可以看出来socketReadThread线程一直在读socket
"socketReadThread" prio=6 tid=0x0000000006a0d800 nid=0x1b40 runnable [0x00000000089ef000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:264) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158) - locked <0x00000007d78a2230> (a java.io.InputStreamReader) at sun.nio.cs.StreamDecoder.read0(StreamDecoder.java:107) - locked <0x00000007d78a2230> (a java.io.InputStreamReader) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:93) at java.io.InputStreamReader.read(InputStreamReader.java:151) at com.nbp.theplatform.threaddump.ThreadSocketReadState$1.run(ThreadSocketReadState.java:27) at java.lang.Thread.run(Thread.java:662)
当等待
线程保持wait状态,在thread dump中,ioWaitThread线程等待从LinkedBlockingQueue中接收信息,如果LinkedBlockQueue一直没有信息,那么线程的状态将不会改变。
"IoWaitThread" prio=6 tid=0x0000000007334800 nid=0x2b3c waiting on condition [0x000000000893f000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000007d5c45850> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:440) at java.util.concurrent.LinkedBlockingDeque.take(LinkedBlockingDeque.java:629) at com.nbp.theplatform.threaddump.ThreadIoWaitState$IoWaitHandler2.run(ThreadIoWaitState.java:89) at java.lang.Thread.run(Thread.java:662)
当线程资源不能正常组织
当线程的资源无法正常组织,不用的线程将会堆积起来。如果发生了,建议监听下线程组织的过程或者检查线程终止的条件
如何利用Thread Dump解决问题
例子1:当cpu使用率不正常的偏高
1 提取最高cpu使用率的线程
[user@linux ~]$ ps -mo pid.lwp.stime.time.cpu -C java PID LWP STIME TIME %CPU 10029 - Dec07 00:02:02 99.5 - 10039 Dec07 00:00:00 0.1 - 10040 Dec07 00:00:00 95.5
从应用中找出cpu使用率的线程
获得轻量级锁使用的cpu率,并将LWP对应的数字(10039) 转换成十六进制(0x2737)
2 获得thread dump后,检测线程的状态
提取出应用中thread dump中pid为10029 ,再找出nid为0x2737的线程
"NioProcessor-2" prio=10 tid=0x0a8d2800 nid=0x2737 runnable [0x49aa5000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked <0x74c52678> (a sun.nio.ch.Util$1) - locked <0x74c52668> (a java.util.Collections$UnmodifiableSet) - locked <0x74c501b0> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at external.org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:65) at external.org.apache.mina.common.AbstractPollingIoProcessor$Worker.run(AbstractPollingIoProcessor.java:708) at external.org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:51) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662)
每隔一小时提取几次,然后分析线程的状态以确定问题。
例2:当性能不正常的下降
在获得thread dump后,查询下面一组线程的状态是BLOCKED
" DB-Processor-13" daemon prio=5 tid=0x003edf98 nid=0xca waiting for monitor entry [0x000000000825f000] java.lang.Thread.State: BLOCKED (on object monitor) at beans.ConnectionPool.getConnection(ConnectionPool.java:102) - waiting to lock <0xe0375410> (a beans.ConnectionPool) at beans.cus.ServiceCnt.getTodayCount(ServiceCnt.java:111) at beans.cus.ServiceCnt.insertCount(ServiceCnt.java:43) "DB-Processor-14" daemon prio=5 tid=0x003edf98 nid=0xca waiting for monitor entry [0x000000000825f020] java.lang.Thread.State: BLOCKED (on object monitor) at beans.ConnectionPool.getConnection(ConnectionPool.java:102) - waiting to lock <0xe0375410> (a beans.ConnectionPool) at beans.cus.ServiceCnt.getTodayCount(ServiceCnt.java:111) at beans.cus.ServiceCnt.insertCount(ServiceCnt.java:43) " DB-Processor-3" daemon prio=5 tid=0x00928248 nid=0x8b waiting for monitor entry [0x000000000825d080] java.lang.Thread.State: RUNNABLE at oracle.jdbc.driver.OracleConnection.isClosed(OracleConnection.java:570) - waiting to lock <0xe03ba2e0> (a oracle.jdbc.driver.OracleConnection) at beans.ConnectionPool.getConnection(ConnectionPool.java:112) - locked <0xe0386580> (a java.util.Vector) - locked <0xe0375410> (a beans.ConnectionPool) at beans.cus.Cue_1700c.GetNationList(Cue_1700c.java:66) at org.apache.jsp.cue_1700c_jsp._jspService(cue_1700c_jsp.java:120)
If the threads are BLOCKED, extract the threads related to the lock that the threads are trying to obtain.
通过thread dump,可以确认处在BLOCKED状态的线程,因为<0xe0375410>锁无法被获得,这种问题可以目前持有锁的堆栈跟踪中解决问题。
在使用dbms时,有两个原因会导致上面描述的问题频繁发生。第一个原因是:
不适当的配置:尽管这些线程仍然在工作,因为数据库连接池的不适当配置,使得无法显示出最好的性能。通过多次收集分析thread dumps,将会发现一些之前处于BLOCKED状态的线程会处于不同的状态。
第二个原因是:不正当的连接。如果和数据库的连接保持异常,那么线程会一直等待超时。在这个例子中,通过多次收集分析thread dumps,将会看到和数据库有关的线程会一直保持BLOCKED状态。通过适当的参数,比如超时时间,可以缩短问题发生的时间
简单的Thread Dump编码:
线程的命名
使用java.lang.Thread 类来创建一个线程对象,将被命名为Thread-(Number)。当使用java.util.concurrent.DefaultThreadFactory对象创建线程,将被命名为pool-(Number)-thread-(Number)。在分析应用中几十到几百的线程时,如果所有的线程都使用默认的名字,分析会变得非常困难,很难区别出线程。
因此,可以养成在创建线程时为线程命名的好习惯
通过java.lang.Thread创建线程,可以给线程构造参数中传入自定义的名
public Thread(Runnable target, String name); public Thread(ThreadGroup group, String name); public Thread(ThreadGroup group, Runnable target, String name); public Thread(ThreadGroup group, Runnable target, String name, long stackSize);
通过java.util.concurrent.ThreadFactory创建线程,可以取自己ThredFactory的名。如果不需要特殊的功能,可以使用下面的MyThreadFactory
import java.util.concurrent.ConcurrentHashMap; import java.util.concurrent.ThreadFactory; import java.util.concurrent.atomic.AtomicInteger; public class MyThreadFactory implements ThreadFactory { private static final ConcurrentHashMap<String, AtomicInteger> POOL_NUMBER = new ConcurrentHashMap<String, AtomicInteger>(); private final ThreadGroup group; private final AtomicInteger threadNumber = new AtomicInteger(1); private final String namePrefix; public MyThreadFactory(String threadPoolName) { if (threadPoolName == null) { throw new NullPointerException("threadPoolName"); } POOL_NUMBER.putIfAbsent(threadPoolName, new AtomicInteger()); SecurityManager securityManager = System.getSecurityManager(); group = (securityManager != null) ? securityManager.getThreadGroup() : Thread.currentThread().getThreadGroup(); AtomicInteger poolCount = POOL_NUMBER.get(threadPoolName); if (poolCount == null) { namePrefix = threadPoolName + " pool-00-thread-"; } else { namePrefix = threadPoolName + " pool-" + poolCount.getAndIncrement() + "-thread-"; } } public Thread newThread(Runnable runnable) { Thread thread = new Thread(group, runnable, namePrefix + threadNumber.getAndIncrement(), 0); if (thread.isDaemon()) { thread.setDaemon(false); } if (thread.getPriority() != Thread.NORM_PRIORITY) { thread.setPriority(Thread.NORM_PRIORITY); } return thread; } }
通过MBean来获得更多的信息
可以通过MBean来获得更多的ThreadInfo的信息。同样使用ThreadInfo也可以获得更多的在thread dumps中难以获取的信息。
ThreadMXBean mxBean = ManagementFactory.getThreadMXBean(); long[] threadIds = mxBean.getAllThreadIds(); ThreadInfo[] threadInfos = mxBean.getThreadInfo(threadIds); for (ThreadInfo threadInfo : threadInfos) { System.out.println( threadInfo.getThreadName()); System.out.println( threadInfo.getBlockedCount()); System.out.println( threadInfo.getBlockedTime()); System.out.println( threadInfo.getWaitedCount()); System.out.println( threadInfo.getWaitedTime()); }
使用ThreaInfo中的信息可以获得等待线程或者阻塞线程所用的时间,也可以获得已经被停用了的异常线程的列表