jvm中的stop the world 在日志中的表现

本文探讨了 JVM 中导致应用线程全局暂停 (STW) 的多种原因,不仅限于垃圾回收过程,并介绍了如何通过 JVM 参数记录这些暂停事件以进行分析。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

原文讲述了jvm发生STW可能的原因(不仅是GC),以及在日志中的体现(主要是对gc.log中,total time for which application threads were stopped这行的解释)。

链接:https://dzone.com/articles/logging-stop-world-pauses-jvm

下附原文。

Logging Stop-the-world Pauses in JVM


Different events can cause the JVM to pause all the application threads. Such pauses are called Stop-The-World (STW) pauses. The most common cause for an STW pause to be triggered is garbage collection (example in github) , but different JIT actions (example), biased lock revocation (example), certainJVMTI operations , and many more also require the application to be stopped.

The points at which the application threads may be safely stopped are called, surprise, safepoints. This term is also often used to refer to all the STW pauses.

It is more or less common that GC logs are enabled. However, this does not capture information on all the safepoints. To get it all, use these JVM options:

1.  -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime

If you are wondering about the naming explicitly referring to GC, don’t be alarmed – turning on these options logs all of the safepoints, not just garbage collection pauses. If you run a following example (source in github) with the flags specified above

01.  public class FullGc {
02.    private static final Collection<Object> leak =new ArrayList<>();
03.    private static volatile Object sink;
04.   
05.    public static void main(String[] args) {
06.      while (true) {
07.        try {
08.          leak.add(new byte[1024 1024]);
09.          sink = new byte[1024 1024];
10.        catch (OutOfMemoryError e) {
11.          leak.clear();
12.        }
13.      }
14.    }
15.  }

you should see entries similar to the following in your standard output:

1.  Application time: 0.3440086 seconds
2.  Total time for which application threads were stopped: 0.0620105 seconds
3.  Application time: 0.2100691 seconds
4.  Total time for which application threads were stopped: 0.0890223 seconds

Reading it is easy (especially comparing to GC logs) – from the above you can see that the application managed to do useful work for the first 344 milliseconds, paused all threads for 62 milliseconds, following with 210ms of useful work and another 89ms pause.

You can use these flags in combination with GC flags, for example when running the same program again with -XX:+PrintGCDetails added, the output would look similar to the following:

1.  [Full GC (Ergonomics) [PSYoungGen: 1375253K-&gt;0K(1387008K)] [ParOldGen: 2796146K-&gt;2049K(1784832K)] 4171400K-&gt;2049K(3171840K), [Metaspace: 3134K-&gt;3134K(1056768K)], 0.0571841 secs] [Times: user=0.02 sys=0.04, real=0.06 secs]
2.  Total time for which application threads were stopped: 0.0572646 seconds, Stopping threads took: 0.0000088 seconds

Based on the above, we see that the application threads were forced to stop for 57 ms to do a garbage collection. Out of which 8 microseconds were spent waiting for all the application threads to arrive to a safepoint. But if we take another example (source in github) and run it with the same options, we may see the following output:

1.  Total time for which application threads were stopped: 0.0001273 seconds, Stopping threads took: 0.0000196 seconds
2.  Total time for which application threads were stopped: 0.0000648 seconds, Stopping threads took: 0.0000174 seconds

From this, we cannot actually infer what caused the pause, because there was no garbage collecting activity. If you wish to understand the additional safepoints in more details, here is a couple of JVM arguments for you:

1.  -XX:+PrintSafepointStatistics  -XX:PrintSafepointStatisticsCount=1

Enabling them makes the JVM log additional information to the standard output, looking similar to the following:

1.  5.141: RevokeBias  [  13  0  2  ]  [  0  0  0  0  0  ]  0  
2.  Total time for which application threads were stopped: 0.0000782 seconds, Stopping threads took: 0.0000269 seconds

The information about the safepoints is displayed according to the following sequence :

  • The timestamp in milliseconds since the start of the JVM (5.141 in the example above)
  • The name of the operation that triggered the pause (RevokeBias).
    • If you see a “no vm operation”, it means that this is a so-called “guaranteed safepoint”. By default, the JVM triggers a safepoint each second to process all queued-up operations that are not urgent. The flag GuaranteedSafepointInterval can be used to change this behavior (set it to 0 to disable it altogether)
  • The number of threads that were stopped at safepoint (13)
  • The number of threads running at the beginning of the safepoint (0)
  • The number of threads that were blocked when the vm operation began its execution (2)
  • Timings for various stages of arriving at safepoint and executing the operation (all zeroes)

So now we see that the use of biased locks may result in a lot of STW pauses happening, even though they only take tens of microseconds. It is not uncommon to disable them right away in these massively parallel days.

All in all, you can really save yourself some trouble by having this data recorded at all times. To do so, specify these JVM options:

1.  -XX:+LogVMOutput -XX:LogFile=vm.log

And all the JVM’s output will be stored to the vm.log file. Interpreting and acting based on these logs, however, is a very broad topic quite outside the scope of this article, but I may write a post or two about it in the future, so if interested, stay tuned – either via our Twitter subscription or through theRSS feed.

If you want some further reading right away, there’s a great blog post by Nitsan Wakart: and, of course, the OpenJDK source code.


<think>嗯,用户想了解如何定位和解决Java项目中的JVM问题。首先,我需要回忆一下JVM的基本结构和常见的问题类型。根据提供的引用,JVM内存结构包括堆、方法区、栈等,而Java 8之后方法区被元数据区替代。内存泄漏和垃圾回收是常见的问题点。 接下来,定位问题的方法。通常,内存泄漏可以通过堆转储分析,比如使用jmap生成dump文件,然后用MAT工具查看。引用[4]提到饼状图显示对象持有的内存远大于自身,这可能就是内存泄漏的迹象。此外,监控工具如jstat和VisualVM可以帮助实时观察内存使用和GC情况。 然后,解决策略。调整JVM参数如堆大小、新生代和老年代的比例,选择合适的垃圾收集器(比如G1)。如果存在内存泄漏,需要检查代码中是否有长时间持有不必要对象引用的情况,比如静态集合或未关闭的资源。 用户可能需要具体的步骤,比如如何生成堆转储,如何使用工具分析。同时,他们可能也关心如何预防这类问题,比如代码审查和性能测试。还要注意引用中的内容,比如元数据区的变化,避免过时的信息。 最后,生成相关问题时要覆盖性能优化、参数调整、工具使用和常见错误,确保覆盖用户可能进一步询问的方向。需要确保回答结构清晰,分步骤说明,并且引用正确的来源,比如引用[1]关于内存分配,引用[3]关于垃圾收集方法。</think>### 如何定位和解决Java项目中的JVM问题 #### 1. **定位JVM问题的核心步骤** **a. 监控基础指标** - 使用`jstat`命令实时监控堆内存使用和垃圾回收(GC)情况,例如: ```bash jstat -gcutil <pid> 1000 # 每秒输出一次GC统计信息 ``` - 通过`jmap`生成堆转储文件(Heap Dump),分析内存占用分布: ```bash jmap -dump:format=b,file=heapdump.hprof <pid> ``` - 借助`VisualVM`或`JConsole`可视化工具,观察内存、线程、类加载等实时数据[^1][^3]。 **b. 分析内存泄漏** - 使用`Eclipse Memory Analyzer (MAT)`分析堆转储文件,关注以下问题: - **支配树(Dominator Tree)**:识别占用内存最多的对象。 - **泄漏报告(Leak Suspects)**:根据工具提示定位可疑对象(例如静态集合类长期持有对象引用)[^4]。 **c. 检查GC行为** - 通过GC日志分析垃圾回收效率: ```bash -Xlog:gc*:file=gc.log # Java 9+的GC日志参数 ``` - 关注`Full GC`频率和耗时,频繁Full GC可能表示堆内存不足或对象晋升策略不合理。 --- #### 2. **常见问题与解决方案** **a. 内存溢出(OOM)** - **现象**:`java.lang.OutOfMemoryError: Java heap space`。 - **解决**: - 调整堆大小:`-Xmx4g -Xms4g`(根据物理内存合理分配)。 - 优化代码,避免大对象长期存活(如缓存未设置过期时间)。 **b. 元数据区溢出** - **现象**:`java.lang.OutOfMemoryError: Metaspace`(Java 8+)。 - **解决**: - 增加元数据区大小:`-XX:MaxMetaspaceSize=256m`。 - 检查类加载器泄漏(如动态生成的类未卸载)[^3]。 **c. GC停顿过长** - **现象**:应用暂停时间(Stop-the-World)超过预期。 - **解决**: - 切换垃圾收集器:例如使用G1 GC(`-XX:+UseG1GC`)优化延迟。 - 调整新生代与老年代比例:`-XX:NewRatio=2`(老年代占2/3堆内存)[^1][^3]。 --- #### 3. **性能优化实践** - **参数调优示例**: ```bash -XX:+UseG1GC -Xmx8g -Xms8g -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=45 ``` - **代码优化**: - 避免在循环中创建大量临时对象。 - 及时关闭资源(如数据库连接、文件流)[^4]。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值