《Optimizing Java》读书笔记中

最新推荐文章于 2021-02-28 18:38:31 发布

原创最新推荐文章于 2021-02-28 18:38:31 发布 · 1.6k 阅读

2 ·

CC 4.0 BY-SA版权

文章标签：

#java #读书笔记

jvm 专栏收录该内容

11 篇文章

订阅专栏

本文详细探讨了Java的垃圾收集机制，包括标记-清除、HotSpot运行时的 Ordinary Object Pointer、并发收集器的工作原理，以及如何通过调整GC参数进行性能优化。同时，文章还介绍了GC日志分析、监控工具和调优策略，帮助开发者更好地理解和管理Java应用的内存行为。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

开头吐槽一句：当初被Java骗的呀，什么自动内存管理，到头来还是都要学的。还不如直接去学C++呐。

第六章：理解垃圾收集

标记-清除

for each object in allocatedObjectList:
    clearing the mark bit
    // 所以对象是8字节的倍数，遍历还可以跳着来

DFS starting from GC-Roots:
    set the reached object mark bit

for each object in allocatedObjectList:
    if mark bit hasn't setted:
        remove it from allocatedObjectList

内存布局如下图

这里写图片描述

jmap -histo [pid]

 num     #instances         #bytes  class name
 ----------------------------------------------
   1:         20839       14983608  [B
   2:        118743       12370760  [C
   3:         14528        9385360  [I
   4:           282        6461584  [D
   5:        115231        3687392  java.util.HashMap$Node
   6:        102237        2453688  java.lang.String
   7:         68388        2188416  java.util.Hashtable$Entry
   8:          8708        1764328  [Ljava.util.HashMap$Node;
   9:         39047        1561880  jdk.nashorn.internal.runtime.CompiledFunction
  10:         23688        1516032  com.mysql.jdbc.ConnectionPropertiesImpl$BooleanConnectionProperty
  11:         24217        1356152  jdk.nashorn.internal.runtime.ScriptFunction
  12:         27344        1301896  [Ljava.lang.Object;
  13:         10040        1107896  java.lang.Class
  14:         44090        1058160  java.util.LinkedList$Node
  15:         29375         940000  java.util.LinkedList
  16:         25944         830208  jdk.nashorn.internal.runtime.FinalScriptFunctionData
  17:            20         655680  [Lscala.concurrent.forkjoin.ForkJoinTask;
  18:         19943         638176  java.util.concurrent.ConcurrentHashMap$Node
  19:           730         614744  [Ljava.util.Hashtable$Entry;
  20:         24022         578560  [Ljava.lang.Class;

HotSpot 运行时

Ordinary Object Pointer: 这是Java对象在JVM中的表示，以两个机器字长大的对象头作为开头，mark word指向对象独有的元数据（如hashcode），klass word指向类级别的元数据（PermGen永久代中的）

使用-XX:+UseCompressedOops压缩对象头，在Java7以上是默认开启的。

KlassOops和Class Objects

这里写图片描述

Oops的继承结构

oop (abstract base)
 |-instanceOop (instance objects)
 |-methodOop (representations of methods)
 |-arrayOop (array abstract base)
 |-symbolOop (internal symbol / string class)
 |-klassOop (klass Header) (Java 7 and before only)
 |-markOop

GC Roots

栈帧
JNI
寄存器
Code roots（from JVM code cache）
全局对象
加载类的元数据

GC In HotSpot

Weak Generational Hypothesis发现大量对象是很短命的，只有一部分对象能够活得时间长一些。

记录了每个对象的年龄 (逃过了几次GC)
对象优先分配了Eden区，哪怕存活也要移到Survivor区
由另一个内存区域-老年代保存长期存活的对象

这里写图片描述

为了加快mark-sweep的速度，HotSpot维持一个“Card table”的数据结构，记录下哪些老年代对象指向年轻代对象。表中每个元素与512字节相对应

cards[*instanceOop >> 9] = 0;

TLABs: thread local allocation buffers, 在线程独有的一块缓冲区分配对象。

这里写图片描述

并发收集器

在Java8以前，默认的收集器是并发收集器，因此YGC和FGC都是要STW的。并发收集器为了吞吐量而设计，在STW后，收集器竭尽所能尽快完成内存回收。

ParallelGC: 年轻代最简单的收集器
ParNew：和ParallelGC区别很小，主要为了和CMS配合使用
ParallelOld：老年代（包括永久代）的并发收集器

年轻代并行回收：但对象在Eden区分配失败，JVM就会停止用户线程，进行垃圾回收

这里写图片描述

老年代并发回收：和年轻代不同，老年代会为年轻代提供空间分配担保，且老年代使用一整块连续的内存空间，因此老年代没有临时存放对象的地方，所以ParallelOld使用标记-压缩算法。

复制算法 vs 压缩算法

这里写图片描述

JVM内存分配实例

堆分配

Heap Area	Size
Overall	2G
Old Gen	1.5G
Young Gen	500M
Eden	400M
S1	50M
S2	50M

GC数据


Allocation Rate	100M/s
YGC time	2ms
FGC time	100ms
Object lifetime	200ms

因为对象分配速率为100MB/s, 所以4s就将Eden分配光了，即每4s会发生一次YGC

GC次数	时间点	数据情况
GC0	4s	20M Eden -> S1(20M)
GC1	8.002s	20M Eden -> S2(20M)
GC2	12.004s	20M Eden -> S1(20M)

public class ModelAllocator implements Runnable {
    private volatile boolean shutdown = false;

    private double chanceOfLongLived = 0.02;
    private int multiplierForLongLived = 20;
    private int x = 1024;
    private int y = 1024;
    private int mbPerSec = 50;
    private int shortLivedMs = 100;
    private int nThreads = 8;
    private Executor exec = Executors.newFixedThreadPool(nThreads);

    public void run() {
        final int mainSleep = (int) (1000.0 / mbPerSec);

        while (!shutdown) {
            for (int i = 0; i < mbPerSec; i++) {
                ModelObjectAllocation to = new ModelObjectAllocation(x, y, lifetime());
                exec.execute(to);
                try {
                    Thread.sleep(mainSleep);
                } catch (InterruptedException ex) {
                    shutdown = true;
                }
            }
        }
    }

    // Simple function to model Weak Generational Hypothesis
    // Returns the expected lifetime of an object - usually this
    // is very short, but there is a small chance of an object
    // being "long-lived"
    public int lifetime() {
        if (Math.random() < chanceOfLongLived) {
            return multiplierForLongLived * shortLivedMs;
        }

        return shortLivedMs;
    }

    static class ModelObjectAllocation implements Runnable {
        private final int[][] allocated;
        private final int lifeTime;

        public ModelObjectAllocation(final int x, final int y, final int liveFor) {
            allocated = new int[x][y];
            lifeTime = liveFor;
        }

        @Override
        public void run() {
            try {
                Thread.sleep(lifeTime);
                System.err.println(System.currentTimeMillis() +": "+ allocated.length);
            } catch (InterruptedException ex) {
            }
        }
    }
}

第七章：高级垃圾收集

选择GC的指标

停顿时间
吞吐量（GC time/app run time）
停顿频率
回收效率（一个停顿周期能回收多少内存）
停顿一致性（是否每次停顿的时间差不多）

大数据应用应该更在乎吞吐量而不是停顿时间。对于一些批处理任务，10s的暂停时间也无关紧要，GC算法更关心CPU的使用效率和吞吐量。

并发GC理论

safepoint: JVM开始执行GC时，线程的暂停点

JVM不会强制一个线程到safepoint
JVM可以阻止一个线程离开safepoint

到达safepoint的流程

JVM设置一个全局的“time to safepoint”标志
应用线程能够查询这个标志位
应用线程暂停，并等待被唤醒

safepoint情景

线程自动达到safepoint，当线程被锁阻塞
线程自动达到safepoint，当线程在执行JNI代码
线程不必达到safepoint，当线程被OS打断
线程不必达到safepoint，当字节码执行到一半

Tri-color marking

GC roots 被标记为灰色
其他对象被标记为白色
标记线程如果能沿着灰节点移动到白节点，就标记为灰色
如果灰节点没有白色子节点，就标记为黑色
停止标记，直到没有灰色节点
回收所有白节点
当一个对象已经被一个线程标记为黑色，然后又被标记为白色。即Mutator（获取？）线程会使标记对象无效。
在并发标记期间，没有黑色的对象会持有一个指向白色对象的引用。

CMS

流程

初始标记（STW）
并发标记
并发预清理
重新标记（STW）
并发清理
并发重置

CMF并发模式失败

这里写图片描述

如果老年代有太多的对象，而年轻代中晋升得太多了
则会使用ParallelOld, 这会使得完全的STW。

这里写图片描述

而CMS在老年代75%（默认）的时候，就会进行回收
CMS在回收老年代时，不会进行压缩，空间是分散的
而如果老年代没有可用的连续空间，也会使用ParallelOld
-XX:+UseConcMarkSweepGC

第8章：GC日志，监控，调优，工具

GC日志简介

-Xloggc:gc.log -XX:+PrintGCDetails 
-XX:+PrintTenuringDistribution
-XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps

Effect	Flags
Controls which file to log GC events to	-Xloggc:gc.log
Logs GC event details	-XX:+PrintGCDetails
Prints the wallclock time that GC events occured at.	-XX:+PrintGCDateStamps
Prints the time (in secs since VM start) that GC events occured at.	-XX:+PrintGCTimeStamps
Adds extra GC event detail that is vital for tooling	-XX:+PrintTenuringDistribution
Switches on log file rotation	-XX:+UseGCLogFileRotation
Set the maximum number of log files to keep	-XX:+NumberOfGCLogFiles=< n>
Set the maximum size of each file before rotation	-XX:+GCLogFileSize=< size>

Log分析工具

Censum
GCViewer

基本调优

Table 8-3. GC heap sizing flags

Effect	Flag
Set the minimum size reserved for the heap	-Xms< size>
Set the maximum size reserved for the heap	-Xmx< size>
Set the maximum size permitted for PermGen (Java 7)	-XX:MaxPermSize=< size>
Set the maximum size permitted for Metaspace (Java 8)	-XX:MaxMetaspaceSize=< size>
临界对象大小	-XX:PretenureSizeThreshold=N>
最小TLAB大小	-XX:MinTLABSize=N

GC测试代码

@State(Scope.Benchmark)
@BenchmarkMode(Mode.Throughput)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@OutputTimeUnit(TimeUnit.SECONDS)
@Fork(1)
public class SimulateCardTable {

    // OldGen is 3/4 of heap, 2M of card table is required for 1G of old gen
    private static final int SIZE_FOR_20_GIG_HEAP = 15 * 2 * 1024 * 1024;

    private static final byte[] cards = new byte[SIZE_FOR_20_GIG_HEAP];

    @Setup
    public static final void setup() {
        final Random r = new Random(System.nanoTime());
        for (int i=0; i<100_000; i++) {
            cards[r.nextInt(SIZE_FOR_20_GIG_HEAP)] = 1;
        }
    }


    @Benchmark
    public int scanCardTable() {
        int found = 0;
        for (int i=0; i<SIZE_FOR_20_GIG_HEAP; i++) {
            if (cards[i] > 0)
                found++;
        }
        return found;
    }

}

/*
Result "scanCardTable":
  108.904 ±(99.9%) 16.147 ops/s [Average]
  (min, avg, max) = (102.915, 108.904, 114.266), stdev = 4.193
  CI (99.9%): [92.757, 125.051] (assumes normal distribution)


# Run complete. Total time: 00:01:46

Benchmark                         Mode  Cnt    Score    Error  Units
SimulateCardTable.scanCardTable  thrpt    5  108.904 ± 16.147  ops/s
*/

并发调优

Effect	Flag
(Old flag) Set ratio of YoungGen to Heap	-XX:NewRatio=N
(Old flag) Set ratio of Survivor spaces to YoungGen	-XX:SurvivorRatio=N
(Old flag) Set min size of YoungGen	-XX:NewSize=N
(Old flag) Set max size of YoungGen	-XX:MaxNewSize=N
(Old flag) Set min % of heap free after GC to avoid expanding	-XX:MinHeapFreeRatio
(Old flag) Set max % of heap free after GC to avoid shrinking	-XX:MaxHeapFreeRatio

Flags set:

-XX:NewRatio=N
-XX:SurvivorRatio=K

YoungGen = 1 / (N+1) of heap
OldGen = N / (N+1) of heap

Eden = (K – 2) / K of YoungGen
Survivor1 = 1 / K of YoungGen
Survivor2 = 1 / K of YoungGen

第9章：JVM上的代码执行

。。。。。。

第10章：理解JIT编译

JITWatch

https://github.com/AdoptOpenJDK/jitwatch/

-XX:+UnlockDiagnosticVMOptions 
-XX:+TraceClassLoading 
-XX:+LogCompilation

hsdis

-XX:+PrintAssembly

内联

Switch	Default (JDK 8, Linux x86_64)	Explanation
-XX:MaxInlineSize=n	35 bytes of bytecode	Inline methods up to this size
-XX:FreqInlineSize=n	325 bytes of bytecode	Inline “hot” (frequently called) methods up to this size
-XX:InlineSmallCode=n	1000 bytes of native code (non-Tiered)2000 bytes of native code (Tiered)	Do not inline methods where there is already a final-tier compilation that occupies more than this amount of space in the code cache.
-XX:MaxInlineLevel	9	Maximum number of call frames to inline