Some Monitoring and Troubleshooting Tools on JDK

JDK中常用的监控和排错工具

从一个普通的Linux用户到JVM语言的开发者,这样的转变花费了不少的时间才完成。Java作为一门工业级编程语言,具有许多成熟的框架和工具。但是通常初学者并不能很好的利用这些工具,如何快速有效的定位程序的问题,一直是博主心中的痛点。结合最近的经历和学习,便写一篇笔记总结,方便以后查阅。
在JVM语言开发中,常常碰到以下问题:
* 程序假死,响应慢
* 线程死锁
* CPU占用率高
* OOM(OutOfMemoryException)
* 内存泄露
……

这些问题一般可以通过重启(程序)来解决,但是这终究是个“制标不治本”的方法。实际上通过定位、理解并解决这些问题才是“正道”,而JDK本身则提供了许多有用的工具来帮助开发人员。常用的工具有jps,jstat,jstack,jmap,jhat。

1.jps - Java Virtual Machine Process Status Tool

jps是一个最常用的指令,用于查看系统中各JVM进程的信息。

$ jps
29907 HeronInstance
30012 MetricsManager
30117 HeronInstance
30013 MetricsManager
16985 QuorumPeerMain
16943 Kafka

如果想要看进程更详细的信息,jps提供了一些选项:

-q
    Suppress the output of the class name, JAR file name, and arguments passed to the main method, producing only a list of local VM identifiers.
-m
    Output the arguments passed to the main method. The output may be null for embedded JVMs.
-l
    Output the full package name for the application's main class or the full path name to the application's JAR file.
-v
    Output the arguments passed to the JVM.
-V
    Output the arguments passed to the JVM through the flags file (the .hotspotrc file or the file specified by the -XX:Flags=<filename> argument).
-Joption
    Pass option to the java launcher called by jps. For example, -J-Xms48m sets the startup memory to 48 megabytes. It is a common convention for -J to pass options to the underlying VM executing applications written in Java. 

其中博主常用的有-m-l。-m可以查看传给程序中Main函数的参数(命令行或JVM参数);

$ jps -m
30012 MetricsManager metricsmgr-0 33191 ExclamationTopology ExclamationTopology760273e3-4dbf-4d26-9f64-4ed589b7c830 ./heron-conf/heron_internals.yaml ./heron-conf/metrics_sinks.yaml
30013 MetricsManager metricsmgr-1 42725 ExclamationTopology ExclamationTopology760273e3-4dbf-4d26-9f64-4ed589b7c830 ./heron-conf/heron_internals.yaml ./heron-conf/metrics_sinks.yaml
16943 Kafka /usr/lib/kafka/bin/../config/server.properties
4650 Jps -m
29907 HeronInstance ExclamationTopology ExclamationTopology760273e3-4dbf-4d26-9f64-4ed589b7c830 container_1_exclaim1_1 exclaim1 1 0 stmgr-1 39645 42725 ./heron-conf/heron_internals.yaml
30117 HeronInstance ExclamationTopology ExclamationTopology760273e3-4dbf-4d26-9f64-4ed589b7c830 container_1_word_2 word 2 0 stmgr-1 39645 42725 ./heron-conf/heron_internals.yaml
4495 ZooKeeperMain -server localhost:2181
16985 QuorumPeerMain /usr/lib/zookeeper/bin/../conf/zoo.cfg

-l则可以查看运行的程序的artifactory,来自于哪个Jar包

$ jps -l 
30012 com.twitter.heron.metricsmgr.MetricsManager
30013 com.twitter.heron.metricsmgr.MetricsManager
4719 sun.tools.jps.Jps
16943 kafka.Kafka
29907 com.twitter.heron.instance.HeronInstance
30117 com.twitter.heron.instance.HeronInstance
4495 org.apache.zookeeper.ZooKeeperMain
16985 org.apache.zookeeper.server.quorum.QuorumPeerMain

通过Jps,能快速找到程序的pid,方便进一步定位问题。

2.jstat - Java Virtual Machine Statistics Monitoring Tool

以每秒查看GC,共输出10条记录为例子:

$ jstat -gc 29907 100 10
 S0C    S1C    S0U    S1U      EC       EU        OC         OU       PC     PU    YGC     YGCT    FGC    FGCT     GCT   
512.0  512.0   0.0    0.0   998400.0   0.0    1998848.0   62155.4   83968.0 83968.0  11731   36.405 11713 1234.847 1271.252
512.0  512.0   0.0    0.0   998400.0   0.0    1998848.0   62155.2   83968.0 83968.0  11741   36.432 11723 1235.831 1272.263
512.0  512.0   0.0    0.0   998400.0   0.0    1998848.0   62155.5   83968.0 83968.0  11751   36.458 11733 1236.819 1273.277
512.0  512.0   32.0   0.0   998400.0   0.0    1998848.0   62155.4   83968.0 83968.0  11760   36.485 11742 1237.780 1274.265
512.0  512.0   32.0   0.0   998400.0   0.0    1998848.0   62155.4   83968.0 83968.0  11768   36.519 11750 1238.665 1275.184
512.0  512.0   0.0    0.0   998400.0   0.0    1998848.0   62155.4   83968.0 83968.0  11777   36.548 11759 1239.706 1276.254
512.0  512.0   0.0    0.0   998400.0   0.0    1998848.0   62155.4   83968.0 83968.0  11787   36.574 11768 1240.743 1277.317
512.0  512.0   0.0    0.0   998400.0   0.0    1998848.0   62155.4   83968.0 83968.0  11796   36.604 11778 1241.634 1278.238
512.0  512.0   0.0    0.0   998400.0   0.0    1998848.0   62155.2   83968.0 83968.0  11806   36.634 11788 1242.676 1279.310
512.0  512.0   0.0    0.0   998400.0   0.0    1998848.0   62155.3   83968.0 83968.0  11815   36.661 11797 1243.645 1280.306

S0C, S1C, S0U, S1U代表Survior 0和1区的capacity和usage
EC, EU, OC, OU, PC, PU代表Eden,Old,Permanent区的容量和使用量
YGC, YGCT, FGC, FGCT代表从JVM启动以来Young区GC和Full GC的次数和耗时(Time, 单位为秒)
GCT代表GC总耗时,是YGCT和FGCT之和

从上可以看出,该JVM程序卡住的问题很可能由于FGC次数多。如果不懂JVM中对内存的划分和管理,可以参考另一篇文。Troubleshooting Memory Issues in Java Applications

3.jstack - Stack Trace

调试程序时,一个重要的关注点就是stack trace,根据堆栈信息就可以找到具体的代码位置,以免盲目排查问题。jstack可以打印出指定进程、core dump文件或者远程JVM中线程的stack trace信息。

jstack -l 29907 
Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.80-b11 mixed mode):
"Attach Listener" daemon prio=10 tid=0x00007f7c24001000 nid=0x171f waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE
   Locked ownable synchronizers:
        - None

"DestroyJavaVM" prio=10 tid=0x00007f7c5c00b800 nid=0x74d4 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE
   Locked ownable synchronizers:
        - None

"SlaveThread" prio=10 tid=0x00007f7c5c298800 nid=0x74e4 in Object.wait() [0x00007f7c45df8000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000ee1cd530> (a java.lang.Object)
        at com.twitter.heron.common.basics.SlaveLooper.doWait(SlaveLooper.java:47)
        - locked <0x00000000ee1cd530> (a java.lang.Object)
        at com.twitter.heron.common.basics.WakeableLooper.runOnce(WakeableLooper.java:72)
        at com.twitter.heron.common.basics.WakeableLooper.loop(WakeableLooper.java:64)
        at com.twitter.heron.instance.Slave.run(Slave.java:169)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
   Locked ownable synchronizers:
        - <0x00000000ee0407e0> (a java.util.concurrent.ThreadPoolExecutor$Worker)

"GatewayThread" prio=10 tid=0x00007f7c5c293000 nid=0x74e3 runnable [0x00007f7c45ef9000]
   java.lang.Thread.State: RUNNABLE
……

其中关于线程的基础信息如下

"SlaveThread" prio=10 tid=0x00007f7c5c298800 nid=0x74e4 in Object.wait() [0x00007f7c45df8000]
      |            |      |                      |                |                + ???
      |            |      |                      |                + 说明在等待Object的锁
      |            |      |                      + 在OS中线程的id
      |            |      + 在JVM中的线程id,JVM的线程是基于OS的线程
      |            + 优先级
      + 线程类型

可以看出该线程处于等待状态:- waiting on <0x00000000ee1cd530> (a java.lang.Object)

接着查看进程里哪些线程的CPU占用率高

$ top -Hp 29907
Threads: 107 total,   4 running, 103 sleeping,   0 stopped,   0 zombie
%Cpu(s): 60.9 us,  1.2 sy,  0.1 ni, 37.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 11991800 total,  3699040 free,  6984008 used,  1308752 buff/cache
KiB Swap:  3963900 total,  3637072 free,   326828 used.  4296828 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                                                                
 29916 root   20   0 5505572 709388  29300 S 54.0  5.9  61:20.00 java                                                                                                                   
 29914 root   20   0 5505572 709388  29300 R 42.3  5.9  45:09.51 java                                                                                                                   
 29913 root   20   0 5505572 709388  29300 R 41.7  5.9  45:08.21 java                                                                                                                   
 29915 root   20   0 5505572 709388  29300 R 41.3  5.9  45:07.39 java                                                                                                                   
 29912 root   20   0 5505572 709388  29300 R 39.3  5.9  45:08.47 java

其中有五条线程的CPU占用率高,而29916是最高的,那么先来看看它在做什么。

$ jstack 29907 | grep `printf "%x\n" 29916`  # 系统的pid 29916对应jstack中输出的nid,所以需要转换成十六进制
"VM Thread" prio=10 tid=0x00007fd8cc076000 nid=0xe98 runnable 

这是一条JVM自己的线程,没有提供太多有用的信息,那么来看看剩下四条线程在干什么

$ jstack 29907 | grep `printf "%x\n" 29914`
"GC task thread#2 (ParallelGC)" prio=10 tid=0x00007fd8cc030000 nid=0xe96 runnable 
$ jstack 29907 | grep `printf "%x\n" 29915`
^[[A"GC task thread#3 (ParallelGC)" prio=10 tid=0x00007fd8cc031800 nid=0xe97 runnable 
$ jstack 29907 | grep `printf "%x\n" 29913`
"GC task thread#1 (ParallelGC)" prio=10 tid=0x00007fd8cc02e000 nid=0xe95 runnable 
$ jstack 29907 | grep `printf "%x\n" 29912`
"GC task thread#0 (ParallelGC)" prio=10 tid=0x00007fd8cc02c000 nid=0xe94 runnable

四条GC task线程,和上文jstat中的结论一致,说明GC占用了大量CPU资源,导致程序卡顿。

4.jmap - Memory Map and jhat - Java Heap Analysis Tool

jmap用来查看堆内存使用状况,一般结合jhat使用。

值得注意的是,使用jmap采集信息时,最好停止该JVM进程的操作,即停止与外部的交互。一方面,这样才能采集到准确的heap信息,另一方面jmap会使JVM进程或者内部线程停止一段时间,直到采集完成。

-histo 和-heap 工作原理是不同的:

关于”停止时间”,heap的大小会决定进程/线程停止多久。例如,假设dump一个1GB的heap文件,耗时肯定是以秒来记的。

以下是jmap的参数解释:

-dump:[live,]format=b,file=<filename>
    Dumps the Java heap in hprof binary format to filename. The live suboption is optional. If specified, only the live objects in the heap are dumped. To browse the heap dump, you can use jhat (Java Heap Analysis Tool) to read the generated file.
-finalizerinfo
    Prints information on objects awaiting finalization.
-heap
    Prints a heap summary. GC algorithm used, heap configuration and generation wise heap usage are printed.
-histo[:live]
    Prints a histogram of the heap. For each Java class, number of objects, memory size in bytes, and fully qualified class names are printed. VM internal class names are printed with '*' prefix. If the live suboption is specified, only live objects are counted.
-permstat
    Prints class loader wise statistics of permanent generation of Java heap. For each class loader, its name, liveness, address, parent class loader, and the number and size of classes it has loaded are printed. In addition, the number and size of interned Strings are printed.
-F
    Force. Use with jmap -dump or jmap -histo option if the pid does not respond. The live suboption is not supported in this mode.
-h
    Prints a help message.
-help
    Prints a help message.
-J<flag>
    Passes <flag> to the Java virtual machine on which jmap is run.
$ jmap -J-d64 -heap 29907 # -J-d64代表64位的JVM, -heap查看进程堆内存使用情况,包括使用的GC算法、堆配置参数和各代中堆内存使用情况
Attaching to process ID 29907, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.80-b11

using thread-local object allocation.
Parallel GC with 4 thread(s)

Heap Configuration:
   MinHeapFreeRatio = 0
   MaxHeapFreeRatio = 100
   MaxHeapSize      = 3070230528 (2928.0MB)
   NewSize          = 1310720 (1.25MB)
   MaxNewSize       = 17592186044415 MB
   OldSize          = 5439488 (5.1875MB)
   NewRatio         = 2
   SurvivorRatio    = 8
   PermSize         = 21757952 (20.75MB)
   MaxPermSize      = 85983232 (82.0MB)
   G1HeapRegionSize = 0 (0.0MB)

Heap Usage:
PS Young Generation
Eden Space:
   capacity = 1022361600 (975.0MB)
   used     = 0 (0.0MB)
   free     = 1022361600 (975.0MB)
   0.0% used
From Space:
   capacity = 524288 (0.5MB)
   used     = 0 (0.0MB)
   free     = 524288 (0.5MB)
   0.0% used
To Space:
   capacity = 524288 (0.5MB)
   used     = 0 (0.0MB)
   free     = 524288 (0.5MB)
   0.0% used
PS Old Generation
   capacity = 2046820352 (1952.0MB)
   used     = 63744656 (60.79164123535156MB)
   free     = 1983075696 (1891.2083587646484MB)
   3.114325882958584% used
PS Perm Generation
   capacity = 85983232 (82.0MB)
   used     = 85983216 (81.99998474121094MB)
   free     = 16 (1.52587890625E-5MB)
   99.99998139172065% used

21007 interned Strings occupying 3147464 bytes.
$ jmap -J-d64 -permstat 29907  # 统计进程permanent区的信息
Attaching to process ID 29907, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.80-b11
finding class loader instances ..done.
computing per loader stat ..done.
please wait.. computing liveness...............................liveness analysis may be inaccurate ...
class_loader    classes bytes   parent_loader   alive?  type

<bootstrap>     972     5868136   null          live    <internal>
0x00000000ee14a4f8      1       1888    0x00000000ee004388      dead    sun/reflect/DelegatingClassLoader@0x00000000f8050c70
0x00000000ee058700      1       1888    0x00000000ee004388      dead    sun/reflect/DelegatingClassLoader@0x00000000f8050c70
0x00000000ee004388      751     5438056 0x00000000ee01bfd8      live    sun/misc/Launcher$AppClassLoader@0x00000000f821b748
0x00000000ee01bfd8      0       0         null          live    sun/misc/Launcher$ExtClassLoader@0x00000000f81c0d78
0x00000000ee058740      1       3056    0x00000000ee004388      dead    sun/reflect/DelegatingClassLoader@0x00000000f8050c70
0x00000000ee058780      1       3056    0x00000000ee004388      dead    sun/reflect/DelegatingClassLoader@0x00000000f8050c70
0x00000000ee058840      1       3056    0x00000000ee004388      dead    sun/reflect/DelegatingClassLoader@0x00000000f8050c70
0x00000000ee0587c0      1       1888    0x00000000ee004388      dead    sun/reflect/DelegatingClassLoader@0x00000000f8050c70
0x00000000ee06e8c8      0       0       0x00000000ee004388      dead    java/util/ResourceBundle$RBClassLoader@0x00000000f849a5b8
0x00000000ee058800      1       1888    0x00000000ee004388      dead    sun/reflect/DelegatingClassLoader@0x00000000f8050c70
0x00000000ee0586c0      1       3056    0x00000000ee004388      dead    sun/reflect/DelegatingClassLoader@0x00000000f8050c70

total = 12      1731    11325968            N/A         alive=3, dead=9     N/A
$ jmap -J-d64 -histo:live 29907   # 统计堆中存活对象的数目
 num     #instances         #bytes  class name
----------------------------------------------
   1:        124715       44776120  [S
   2:        143040       19941440  <constMethodKlass>
   3:        143040       18319424  <methodKlass>
   4:         12237       16013592  <constantPoolKlass>
   5:         12237       11496200  <instanceKlassKlass>
   6:         10019        8319232  <constantPoolCacheKlass>
   7:         46771        6290840  [C
   8:         41115        6217536  [B
   9:          1553        4303592  <methodDataKlass>
  10:         13045        1518056  java.lang.Class
  11:            39        1278576  [Lscala.concurrent.forkjoin.ForkJoinTask;
  12:         23396        1206288  [[I
  13:         36613        1171616  java.util.Hashtable$Entry
  14:         45121        1082904  java.lang.String
  15:         31037         993184  java.util.HashMap$Entry
  16:         22930         733760  java.util.concurrent.ConcurrentHashMap$HashEntry

class name这一列,需要对其含义说明一下:

B  byte
C  char
D  double
F  float
I  int
J  long
Z  boolean
[  数组,如[I表示int[]
[L+类名 其他对象
< > Klass

当jmap的输出信息过多,不方便在命令行查看时,可以先dump成文件,在通过jhat在浏览器中详细查看。

$ jmap -dump:format=b,file=file.dump 29907
Dumping heap to /home/horatio/file.dump ...
Heap dump file created

$ jhat -J-Xmx256m -port 20000 ~/file.dump # -J-Xmx指定最大堆内存,用于加载较大的dump的文件
Reading from /home/horatio/file.dump...
Dump file created Sat Sep 17 17:26:21 HKT 2016
Snapshot read, resolving...
Resolving 483745 objects...
Chasing references, expect 96 dots................................................................................................
Eliminating duplicate references................................................................................................
Snapshot resolved.
Started HTTP server on port 20000
Server is ready.

接着在浏览器中访问http://localhost:20000, 点击对应的类,可以查看详细的信息
这里写图片描述

参考资料:

  1. JVM性能调优监控工具jps、jstack、jmap、jhat、jstat、hprof使用详解
  2. Orcale JDK Tools and Utilities
  3. How to analyze a java thread dump?
  4. jmap - Does histo & heap operation bring overhead to jvm?
  5. is JVM stopped while executing jmap?
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值