Blog

  • 一次奇怪GC的排查

    一次Dse server full gc查找过程

    • 读文件的时候会发生full gc
    • 老年代还没有达到容量就发生full gc

    G1 GC

    G1 GC接收,原理可以看这篇,图挺多的.

    请参考深入剖析JVM之G1收集器、及回收流程、与推荐用例

    开启G1 GC 参数

    CommandLine flags: -XX:+ExitOnOutOfMemoryError -XX:G1HeapRegionSize=16777216 -XX:G1HeapWastePercent=10 -XX:G1LogLevel=finest -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/octopus/oom/storage -XX:InitialHeapSize=19327352832 -XX:MaxHeapSize=19327352832 -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UnlockExperimentalVMOptions -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC
    

    最好再加上下面三个,但是如果使用java 11,会不支持

    #jre 11 not support
    add_jvm_arg "-XX:+PrintAdaptiveSizePolicy"
    add_jvm_arg "-XX:+PrintReferenceGC"
    add_jvm_arg "-XX:+PrintGCTimeStamps"
    

    环境背景

    实验对象是 octopus v4.1.7-5 版本 群集模式, storage server.

    利用jstat 查内存情况

    jstat  -gccause #gc原因
    jstat  -gcutil #gc整体情况
    jstat  -gcnew # 新生代使用情况
    jstat  -gcold # 老年代使用情况
    

    查看gc日志

    试验组

    设置Xmx 18GB,,读文件发送full gc. 年轻代总和一直是912MB.

    这里有两个问题

    • 发生了full gc

    • g1自动调整的策略没有起效,可以看的日志. 新生代一直是912MB

      在对照组,堆增大之后,自动调整的策略就起效了.情况请看对照组

    [root@node2 octopus]# grep Eden /tmp/dse.log-Survivors
       [Eden: 16.0M(912.0M)->0.0B(912.0M) Survivors: 0.0B->0.0B Heap: 16.0M(18.0G)->601.1K(18.0G)], [Metaspace: 3182K->3182K(1056768K)]
       [Eden: 912.0M(912.0M)->0.0B(800.0M) Survivors: 0.0B->112.0M Heap: 1244.2M(18.0G)->430.1M(18.0G)]
       [Eden: 800.0M(800.0M)->0.0B(784.0M) Survivors: 112.0M->128.0M Heap: 4479.7M(18.0G)->4459.7M(18.0G)]
       [Eden: 272.0M(784.0M)->0.0B(784.0M) Survivors: 128.0M->128.0M Heap: 6738.8M(18.0G)->6668.0M(18.0G)]
       [Eden: 784.0M(784.0M)->0.0B(784.0M) Survivors: 128.0M->128.0M Heap: 10.3G(18.0G)->10126.2M(18.0G)]
       [Eden: 784.0M(784.0M)->0.0B(784.0M) Survivors: 128.0M->128.0M Heap: 10.7G(18.0G)->10.3G(18.0G)]
       [Eden: 784.0M(784.0M)->0.0B(784.0M) Survivors: 128.0M->128.0M Heap: 11.1G(18.0G)->10.6G(18.0G)]
       [Eden: 784.0M(784.0M)->0.0B(784.0M) Survivors: 128.0M->128.0M Heap: 11.3G(18.0G)->10.9G(18.0G)]
       [Eden: 784.0M(784.0M)->0.0B(784.0M) Survivors: 128.0M->128.0M Heap: 11.6G(18.0G)->11.2G(18.0G)]
       [Eden: 784.0M(784.0M)->0.0B(784.0M) Survivors: 128.0M->128.0M Heap: 12.0G(18.0G)->11.6G(18.0G)]
       [Eden: 784.0M(784.0M)->0.0B(784.0M) Survivors: 128.0M->128.0M Heap: 11.5G(18.0G)->11.2G(18.0G)]
       [Eden: 784.0M(784.0M)->0.0B(784.0M) Survivors: 128.0M->128.0M Heap: 12.0G(18.0G)->11.6G(18.0G)]
    ...
       [Eden: 304.0M(784.0M)->0.0B(912.0M) Survivors: 128.0M->0.0B Heap: 13.2G(18.0G)->13.2G(18.0G)]
       [Eden: 48.0M(912.0M)->0.0B(912.0M) Survivors: 0.0B->0.0B Heap: 13.2G(18.0G)->13.2G(18.0G)]
       [Eden: 0.0B(912.0M)->0.0B(912.0M) Survivors: 0.0B->0.0B Heap: 13.2G(18.0G)->10147.6M(18.0G)], [Metaspace: 20886K->20886K(1069056K)]
       [Eden: 912.0M(912.0M)->0.0B(784.0M) Survivors: 0.0B->128.0M Heap: 10.8G(18.0G)->10.5G(18.0G)]
    ...
       [Eden: 784.0M(784.0M)->0.0B(784.0M) Survivors: 128.0M->128.0M Heap: 12.7G(18.0G)->12.8G(18.0G)]
       [Eden: 208.0M(784.0M)->0.0B(912.0M) Survivors: 128.0M->0.0B Heap: 13.0G(18.0G)->13.0G(18.0G)]
       [Eden: 0.0B(912.0M)->0.0B(912.0M) Survivors: 0.0B->0.0B Heap: 13.0G(18.0G)->10143.7M(18.0G)], [Metaspace: 20892K->20892K(1069056K)]
       [Eden: 912.0M(912.0M)->0.0B(784.0M) Survivors: 0.0B->128.0M Heap: 10.8G(18.0G)->10.4G(18.0G)]
       [Eden: 784.0M(784.0M)->0.0B(784.0M) Survivors: 128.0M->128.0M Heap: 11.2G(18.0G)->10.9G(18.0G)]
    
    

    参照文章当jvm的eden区满了,进行回收时,s0区满了,此时eden区还有存活对象没复制完,会怎样?

    1. 非GC时,新生代的分布是这样,eden区和s0区有object,s1区为空;
    2. eden区满时,触发MinorGC,将eden区和s0区尚存活的object,复制到s1区,顺利完成后,s0区和s1区命名互换一下,返回情况1;
      • 2.1 MinorGC 中存活的object 如果超过tenuring threshold,会promote到老生代;
    3. 每次MinorGC后,都会计算一个合理的tenuring threshold和各年代区的size,以及适时地调整size

    4. 在2时,如果s1区提前满了,就是undesired的premature promotion,虽然从jvm角度是想通过方法3来避免的,但也无法保证;

    5. 老生代如果也满了(可能是由2.1或4导致),那就没什么好说了,直接Full GC了;

    并且触发了多次to-space exhausted

    [root@node2 octopus]# grep 'to-space exhausted' /tmp/dse.log-Survivors
    77.221: [GC pause (G1 Evacuation Pause) (young) (initial-mark) (to-space exhausted), 0.0337199 secs]
    78.450: [GC pause (G1 Evacuation Pause) (young) (to-space exhausted), 0.0394262 secs]
    78.877: [GC pause (G1 Evacuation Pause) (mixed) (to-space exhausted), 0.0080850 secs]
    78.902: [GC pause (G1 Evacuation Pause) (young) (to-space exhausted), 0.0022769 secs]
    92.943: [GC pause (G1 Evacuation Pause) (young) (initial-mark) (to-space exhausted), 0.0343345 secs]
    94.914: [GC pause (G1 Evacuation Pause) (young) (to-space exhausted), 0.0244542 secs]
    95.365: [GC pause (G1 Evacuation Pause) (young) (initial-mark) (to-space exhausted), 0.0064090 secs]
    110.848: [GC pause (G1 Evacuation Pause) (young) (initial-mark) (to-space exhausted), 0.0347729 secs]
    113.026: [GC pause (G1 Evacuation Pause) (young) (to-space exhausted), 0.0246771 secs]
    113.107: [GC pause (G1 Evacuation Pause) (young) (initial-mark) (to-space exhausted), 0.0060059 secs]
    124.546: [GC pause (G1 Evacuation Pause) (young) (initial-mark) (to-space exhausted), 0.0326312 secs]
    128.551: [GC pause (G1 Evacuation Pause) (young) (to-space exhausted), 0.0323352 secs]
    128.796: [GC pause (G1 Evacuation Pause) (young) (initial-mark) (to-space exhausted), 0.0068281 secs]
    136.617: [GC pause (G1 Evacuation Pause) (young) (initial-mark) (to-space exhausted), 0.0337861 secs]
    

    参照文章JVM调优实战:G1中的to-space exhausted问题_阿杜的世界-CSDN博客

    另外一点是 每次full gc 都在并发标记周期开始后.

    95.371: [GC concurrent-root-region-scan-start]
    95.372: [GC concurrent-root-region-scan-end, 0.0000127 secs]
    95.372: [GC concurrent-mark-start]
    95.372: [Full GC (Allocation Failure)  13G->10143M(18G), 0.1442809 secs]
       [Eden: 0.0B(912.0M)->0.0B(912.0M) Survivors: 0.0B->0.0B Heap: 13.0G(18.0G)->10143.7M(18.0G)], [Metaspace: 20892K->20892K(1069056K)]
     [Times: user=0.15 sys=0.00, real=0.15 secs]
    95.516: [GC concurrent-mark-abort]
    

    参照文章可能是最全面的G1学习笔记

    GC concurrent-mark-start开始之后就发生了FULL GC,这说明针对老年代分区的回收速度比较慢,或者说对象过快得从新生代晋升到老年代,或者说是有很多大对象直接在老年代分配。针对上述原因,我们可能需要做的调整有:调大整个堆的大小、更快得触发并发回收周期、让更多的回收线程参与到垃圾收集的动作中。

    但是由于设置过region size程序是没有在读文件过程中 分配过巨型对象.说明针对老年代的回收速度比较慢.

    在G1GC 中老年代是通过mixed gc回收的.

    Mixed GC指的不是一次GC,而是一个过程,这个过程从initial-mark开始.可以看到并发标记完之后,没有回收空间.以为回收收益达不到预值10%.

     68.242: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 1 regions, reclaimable: 16362912 bytes (0.08 %), threshold: 10.00 %]
     75.195: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 55 regions, reclaimable: 473957632 bytes (2.45 %), threshold: 10.00 %]
     80.499: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 72 regions, reclaimable: 743495848 bytes (3.85 %), threshold: 10.00 %]
     83.868: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 74 regions, reclaimable: 772788192 bytes (4.00 %), threshold: 10.00 %]
     87.226: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 85 regions, reclaimable: 948939880 bytes (4.91 %), threshold: 10.00 %]
     92.124: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 98 regions, reclaimable: 1183778416 bytes (6.12 %), threshold: 10.00 %]
     98.282: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 113 regions, reclaimable: 1426981920 bytes (7.38 %), threshold: 10.00 %]
     102.727: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 142 regions, reclaimable: 1896645496 bytes (9.81 %), threshold: 10.00 %]
     108.781: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: reclaimable percentage not over threshold, candidate old 
    

    This shows that the concurrent marking cycle has completed but the Mixed GCs are not done because the reclaimable space is less than tolerable limit set for the waste with G1HeapWastePercent. Its default value is 10%. G1HeapWastePercent can be decreased to be around 2% so that mixed GCs can occur at a lower threshold.

    参照文章Tuning G1GC For SOA–>Observations and Recommendations

    但是我对照的测试,把对调整到22g,没有发生full gc.也有相同没有执行 mixed gc的日志.因此可以确定 是g1 根据G1HeapWastePercent判断是否启动mixed gc

    于是我按照文章,设置了参数G1HeapWastePercent=2

    再次跑了一次测试. 发现,确实起了效果. 读一次文件的过程只发生了一次to-space exhausted和一次full gc,并且多次读文件没有重复发生full gc. 很明显起了效果.

    [root@node2 octopus]# grep 'to-space' /tmp/dse.log
     (to-space exhausted), 0.0317402 secs]
    207.971: [SoftReference, 0 refs, 0.0000234 secs]207.971: [WeakReference, 0 refs, 0.0000038 secs]207.971: [FinalReference, 0 refs, 0.0000033 secs]207.971: [PhantomReference, 0 refs, 0 refs, 0.0000039 secs]207.971: [JNI Weak Reference, 0.0000070 secs] (to-space exhausted), 0.0097462 secs]
    [root@node2 octopus]# grep 'Full' /tmp/dse.log
    0.125: [Full GC (System.gc()) 0.169: [SoftReference, 0 refs, 0.0000273 secs]0.169: [WeakReference, 0 refs, 0.0000034 secs]0.169: [FinalReference, 64 refs, 0.0000112 secs]0.169: [PhantomReference, 0 refs, 0 refs, 0.0000034 secs]0.169: [JNI Weak Reference, 0.0000043 secs] 15M->601K(18G), 0.0603628 secs]
    207.979: [Full GC (Allocation Failure) 208.028: [SoftReference, 0 refs, 0.0000299 secs]208.028: [WeakReference, 291 refs, 0.0000103 secs]208.028: [FinalReference, 387 refs, 0.0000247 secs]208.028: [PhantomReference, 0 refs, 100 refs, 0.0000059 secs]208.028: [JNI Weak Reference, 0.0000069 secs] 13G->10143M(18G), 0.2611047 secs]
    
    [root@node2 octopus]# cat /tmp/dse.log |grep 'GC pause' |grep mixed
    36.508: [GC pause (G1 Evacuation Pause) (mixed) 36.508: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 3655, predicted base time: 3.80 ms, remaining time: 196.20 ms, target pause time: 200.00 ms]
    39.633: [GC pause (G1 Evacuation Pause) (mixed) 39.633: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 3722, predicted base time: 2.50 ms, remaining time: 197.50 ms, target pause time: 200.00 ms]
    40.027: [GC pause (G1 Evacuation Pause) (mixed) 40.027: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 3720, predicted base time: 2.35 ms, remaining time: 197.65 ms, target pause time: 200.00 ms]
    40.558: [GC pause (G1 Evacuation Pause) (mixed) 40.558: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 4018, predicted base time: 2.26 ms, remaining time: 197.74 ms, target pause time: 200.00 ms]
    41.119: [GC pause (G1 Evacuation Pause) (mixed) 41.119: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 3898, predicted base time: 2.13 ms, remaining time: 197.87 ms, target pause time: 200.00 ms]
    44.982: [GC pause (G1 Evacuation Pause) (mixed) 44.982: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 3689, predicted base time: 2.10 ms, remaining time: 197.90 ms, target pause time: 200.00 ms]
    48.523: [GC pause (G1 Evacuation Pause) (mixed) 48.523: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 3523, predicted base time: 1.84 ms, remaining time: 198.16 ms, target pause time: 200.00 ms]
    

    多次触发mixed gc 可以起效果. 由此看起来我们这个是老年代内存释放失败的问题.

    上面说到日志里面有触发了多次to-space exhausted的记录

    那我们是发生了转移失败(Evacuation Failure)吗?

    对 survivors 或 promoted objects 进行GC时如果JVM的heap区不足就会发生提升失败(promotion failure). 堆内存不能继续扩充,因为已经达到最大值了. 当使用 -XX:+PrintGCDetails 时将会在GC日志中显示 to-space overflow (to-空间溢出).

    应该不是,因此日志里面出现的是to-space exhausted而不是to-space overflow.不过有文章说它们都是一个东西.

    You might have come across some of the terms “evacuation failure”, “to-space exhausted”, “to-space overflow” or maybe something like “promotion failure” during your GC travels. These terms all refer to the same thing, and the concept is similar in G1 GC as well.

    那我们使用避免Evacuation Failure的一些调优去测试一下.

    • 增大堆(内存)大小

      可行,对照组使用一样的参数,只有xmx调大到22g,就没有发生full gc情况

    • 增大-XX:G1ReservePercent=n,默认是10.G1会预留一部分内存,制造一个假天花板,防止to-space的方式。

      目前确实发生的to-space exhausted,并且Tuning G1GC For SOA也是一样建议的.

      于是我将-XX:G1ReservePercent调整为15,测试 full gc情况没有改善. 每测试一次读文件都会增加4次full gc. (存疑)

    • 早点启动标记周期

      和我们情况不同,其实现在项目concurrent mark挺频繁.(存疑)

    • 增大并行标记的线程数,使用-XX:ConcGCThreads=n选项

      测试单独设置这个,效果不大.(存疑).

    上面的文章Tuning G1GC For SOA–>Observations and Recommendations另外也讲到了 to-space exhausted的问题,可是我调整一下没啥改变,只有G1HeapWastePercent=2是有效修改.

    另外也可以参考这篇文章Tips for Tuning the Garbage First Garbage Collector (infoq.com),里面有个观点是starting marking cycle too early

    但是我也调整过-XX:InitiatingHeapOccupancyPercent为70%.没什么效果.甚至我直接调整到90.情况一样没有改善.

    并且我也尝试过手动指定新生代大小,依然会发生full gc.

    **到这我的结论是 mixed gc ,因为收益达不到预期,所以,一直没有回收空间. 但是另一方面,堆就已经分配不出来内存了. **

    另外可以看下面来个内存分配失败的日志.于是直接触发了FGC.

    • 无法分配内存导致fullgc,这里是需要分配16MB,可能代表了4个block.因为程序里每个需要4MB保存
    103.744: [GC concurrent-root-region-scan-start]
    103.744: [GC concurrent-root-region-scan-end, 0.0000096 secs]
    103.744: [GC concurrent-mark-start]
     103.744: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: allocation request failed, allocation request: 4194320 bytes]
     103.744: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 16777216 bytes, attempted expansion amount: 16777216 bytes]
     103.744: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap already fully expanded]
    103.744: [Full GC (Allocation Failure) 103.793: [SoftReference, 0 refs, 0.0000302 secs]103.793: [WeakReference, 288 refs, 0.0000086 secs]103.793: [FinalReference, 485 refs, 0.0000460 secs]103.793: [PhantomReference, 0 refs, 100 refs, 0.0000048 secs]103.793: [JNI Weak Reference, 0.0000054 secs] 13G->10143M(18G), 0.2612791 secs]
       [Eden: 0.0B(912.0M)->0.0B(944.0M) Survivors: 0.0B->0.0B Heap: 13.2G(18.0G)->10143.6M(18.0G)], [Metaspace: 20900K->20900K(1069056K)]
     [Times: user=0.27 sys=0.00, real=0.26 secs]
    104.006: [GC concurrent-mark-abort]
    
    • 在这之前还有类似的,这里需要分配4MB,恰好是读一个block需要的内存
    159.229: [GC pause (G1 Evacuation Pause) (young) 159.229: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 3940, predicted base time: 1.79 ms, remaining time: 198.21 ms, target pause time: 200.00 ms]
     159.229: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 49 regions, survivors: 8 regions, predicted young region time: 26.65 ms]
     159.229: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 49 regions, survivors: 8 regions, old: 0 regions, predicted pause time: 28.44 ms, target pause time: 200.00 ms]
     159.244: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: region allocation request failed, allocation request: 4194320 bytes]
     159.244: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 4194320 bytes, attempted expansion amount: 16777216 bytes]
     159.244: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap already fully expanded]
    159.246: [SoftReference, 0 refs, 0.0000314 secs]159.246: [WeakReference, 0 refs, 0.0000040 secs]159.246: [FinalReference, 82 refs, 0.0001580 secs]159.246: [PhantomReference, 0 refs, 0 refs, 0.0000040 secs]159.246: [JNI Weak Reference, 0.0000087 secs] 159.256: [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: occupancy higher than threshold, occupancy: 19025362944 bytes, allocation request: 0 bytes, threshold: 8697308760 bytes (45.00 %), source: end of GC]
     (to-space exhausted), 0.0277961 secs]
    

    现在依然困惑的问题是:

    堆实际使用的空间还远不到18GB.就无法分配内存了.

    jvm没有动态调整新生代比例和老年代比例. 但是把堆调大,这个自动调整又有效果了.

    是不是和【JVM】空间分配担保机制有关

    1.谁进行空间担保?

      JVM使用分代收集算法,将堆内存划分为年轻代和老年代,两块内存分别采用不同的垃圾回收算法,空间担保指的是老年代进行空间分配担保

    2.什么是空间分配担保?

        在发生Minor GC之前,虚拟机会检查老年代最大可用的连续空间是否大于新生代所有对象的总空间

        如果大于,则此次Minor GC是安全的

        如果小于,则虚拟机会查看HandlePromotionFailure设置值是否允许担保失败。如果HandlePromotionFailure=true,那么会继续检查老年代最大可用连续空间是否大于历次晋升到老年代的对象的平均大小,如果大于,则尝试进行一次Minor GC,但这次Minor GC依然是有风险的;如果小于或者HandlePromotionFailure=false,则改为进行一次Full GC。

    3.为什么要进行空间担保?

        是因为新生代采用复制收集算法,假如大量对象在Minor GC后仍然存活(最极端情况为内存回收后新生代中所有对象均存活),而Survivor空间是比较小的,这时就需要老年代进行分配担保,把Survivor无法容纳的对象放到老年代。老年代要进行空间分配担保,前提是老年代得有足够空间来容纳这些对象,但一共有多少对象在内存回收后存活下来是不可预知的,因此只好取之前每次垃圾回收后晋升到老年代的对象大小的平均值作为参考。使用这个平均值与老年代剩余空间进行比较,来决定是否进行Full GC来让老年代腾出更多空间。

    增加日志参数 “-XX:+G1PrintHeapRegions”

    打印存活对象 age 情况

    ➜  ~ grep '\- age' dse.log-18g-region -A 1 -B 1 |head -n 3
    Desired survivor size 67108864 bytes, new threshold 1 (max 15)
    - age   1:   77343736 bytes,   77343736 total
     11.349: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 2051, predicted base time: 11.74 ms, remaining time: 188.26 ms, target pause time: 200.00 ms]
    
    
    ➜  ~ grep '\- age' dse.log-18g-region -A 1 -B 1 |tail  -n 3
    Desired survivor size 67108864 bytes, new threshold 1 (max 15)
    - age   1:  100687440 bytes,  100687440 total
     180.986: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 3805, predicted base time: 1.96 ms, remaining time: 198.04 ms, target pause time: 200.00 ms]
    
    grep '\- age' dse.log-18g-region|grep -v 'age   1:'|wc -l
    0
    

    可以看到 ,存活的对象在新生代并没有存活很久. 还没有到达 15的晋升阀值. 那么就奇怪了,不是大对象分配,存活对象也没有达到晋升阀值.但是老年代确实也不断在增加.

    我尝试 把region size 调整为 32MB.发现Full GC 更频繁了.

    grep 'Full GC (Allocation Failure' /tmp/dse.log |wc -l
    14
    

    是不是 region size变大之后,region的数量变少了的影响. 那么我们现在面对的问题 是不是也是和region 数量有关呢?程序启动时,是有分配一些巨型对象的,这些H对象,占用了多少个region呢?

    增加日志参数”-XX:+G1TraceEagerReclaimHumongousObjects”

    分配巨型对象时的详细日志

     14.328: [G1Ergonomics (Concurrent Cycles) do not request concurrent cycle initiation, reason: still doing mixed collections, occupancy: 15082717184 bytes, allocation request: 17400888 bytes, threshold: 8697308760 bytes (45.00 %), source: concurrent humongous allocation]
     G1HR ALLOC(StartsH) 0x00000006c3000000 0x00000006c4000000
     G1HR ALLOC(ContinuesH) 0x00000006c4000000 0x00000006c4098438
    

    占取的region数量大概是323

    [root@node2 octopus]# cat h |head -n 3
    Live humongous region 1 size 69539616 start 0x0000000341000000 length 5 with remset 1 code roots 0 is marked 0 reclaim candidate 1 type array 1
    Live humongous region 6 size 69539616 start 0x0000000346000000 length 5 with remset 1 code roots 0 is marked 0 reclaim candidate 1 type array 1
    Live humongous region 11 size 69539616 start 0x000000034b000000 length 5 with remset 1 code roots 0 is marked 0 reclaim candidate 1 type array 1
    [root@node2 octopus]# cat h |tail -n 3
    Live humongous region 901 size 17400888 start 0x00000006c5000000 length 2 with remset 1 code roots 0 is marked 0 reclaim candidate 1 type array 1
    Live humongous region 903 size 17400888 start 0x00000006c7000000 length 2 with remset 1 code roots 0 is marked 0 reclaim candidate 1 type array 1
    Live humongous region 905 size 17400888 start 0x00000006c9000000 length 2 with remset 1 code roots 0 is marked 0 reclaim candidate 1 type array 1
    [root@node2 octopus]# cat h |wc -l
    323
    
    ## 这里occupancy 也是说堆已经使用了18941476864
     219.826: [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: occupancy higher than threshold, occupancy: 18941476864 bytes, allocation request: 0 bytes, threshold: 8697308760 bytes (45.00 %), source: end of GC]
     G1HR #EndGC 93
     (to-space exhausted), 0.0405062 secs]
       [Parallel Time: 26.6 ms, GC Workers: 13]
       ....
       [Other: 13.8 ms]
          [Evacuation Failure: 12.3 ms]
             [Recalculate Used: 0.0 ms]
             [Remove Self Forwards: 12.2 ms]
             [Restore RemSet: 0.0 ms]
          [Choose CSet: 0.0 ms]
          [Ref Proc: 0.4 ms]
          [Ref Enq: 0.0 ms]
          [Redirty Cards: 0.1 ms]
             [Parallel Redirty:  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
              Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
             [Redirtied Cards:  516  389  80  0  0  0  0  0  0  0  0  0  0
              Min: 0, Avg: 75.8, Max: 516, Diff: 516, Sum: 985]
          [Humongous Register: 0.3 ms]
             [Humongous Total: 323] #这里说了 巨型对象占据的region数量有 323个
             [Humongous Candidate: 323]
          [Humongous Reclaim: 0.5 ms]
             [Humongous Reclaimed: 0]
          [Free CSet: 0.0 ms]
             [Young Free CSet: 0.0 ms]
             [Non-Young Free CSet: 0.0 ms]
       [Eden: 784.0M(784.0M)->0.0B(784.0M) Survivors: 128.0M->128.0M Heap: 12.5G(18.0G)->12.8G(18.0G)] # 这里明明显示堆只使用12.5GB
    
    

    关于这个问题网上的说法,升级jdk. java – Why have different descriptions of heap usage in G1GC logs – Stack Overflow

    @Eugene I’m confused exactly how much space the heap uses. And I have solved this problem by upgrading openJDK to 8u232

    使用 java-11 跑一下

    也发生了full gc ,日志更加清晰.

    [107.914s][info][gc,cpu         ] GC(118) User=0.02s Sys=0.00s Real=0.00s
    [107.939s][info][gc,start       ] GC(119) Pause Young (Concurrent Start) (G1 Evacuation Pause) # 开始Young GC
    [107.939s][info][gc,task        ] GC(119) Using 13 workers of 13 for evacuation
    [107.940s][info][gc             ] GC(119) To-space exhausted
    [107.940s][info][gc,phases      ] GC(119)   Pre Evacuate Collection Set: 0.2ms
    [107.940s][info][gc,phases      ] GC(119)   Evacuate Collection Set: 0.7ms
    [107.940s][info][gc,phases      ] GC(119)   Post Evacuate Collection Set: 0.6ms
    [107.940s][info][gc,phases      ] GC(119)   Other: 0.2ms
    [107.940s][info][gc,heap        ] GC(119) Eden regions: 4->0(57)
    [107.940s][info][gc,heap        ] GC(119) Survivor regions: 0->0(8)
    [107.940s][info][gc,heap        ] GC(119) Old regions: 301->305 # 这里老年代 region + 巨型对象regions 刚刚好是1152. 
    [107.940s][info][gc,heap        ] GC(119) Humongous regions: 847->847
    [107.940s][info][gc,metaspace   ] GC(119) Metaspace: 23563K->23563K(1071104K)
    [107.940s][info][gc             ] GC(119) Pause Young (Concurrent Start) (G1 Evacuation Pause) 18093M->18093M(18432M) 1.732ms
    [107.940s][info][gc,cpu         ] GC(119) User=0.00s Sys=0.00s Real=0.01s
    
    [107.940s][info][gc,task        ] GC(120) Using 13 workers of 13 for full compaction
    [107.941s][info][gc,start       ] GC(120) Pause Full (G1 Evacuation Pause) # 发生了full gc
    [107.958s][info][gc,phases,start] GC(120) Phase 1: Mark live objects
    [107.961s][info][gc,stringtable ] GC(120) Cleaned string and symbol table, strings: 8654 processed, 0 removed, symbols: 51477 processed, 0 removed
    [107.961s][info][gc,phases      ] GC(120) Phase 1: Mark live objects 2.855ms
    [107.961s][info][gc,phases,start] GC(120) Phase 2: Prepare for compaction
    [107.967s][info][gc,phases      ] GC(120) Phase 2: Prepare for compaction 5.843ms
    [107.967s][info][gc,phases,start] GC(120) Phase 3: Adjust pointers
    [107.971s][info][gc,phases      ] GC(120) Phase 3: Adjust pointers 4.399ms
    [107.971s][info][gc,phases,start] GC(120) Phase 4: Compact heap
    [108.008s][info][gc,phases      ] GC(120) Phase 4: Compact heap 37.156ms
    [108.011s][info][gc,heap        ] GC(120) Eden regions: 0->0(57)
    [108.011s][info][gc,heap        ] GC(120) Survivor regions: 0->0(8)
    [108.011s][info][gc,heap        ] GC(120) Old regions: 305->117
    [108.011s][info][gc,heap        ] GC(120) Humongous regions: 847->847 # full gc 下 巨型对象占的region数量没有降下来.
    [108.011s][info][gc,metaspace   ] GC(120) Metaspace: 23563K->23563K(1071104K)
    [108.011s][info][gc             ] GC(120) Pause Full (G1 Evacuation Pause) 18093M->15024M(18432M) 70.173ms
    [108.011s][info][gc,cpu         ] GC(120) User=0.75s Sys=0.00s Real=0.07s
    
    

    看到了因为region 都 old regions 和Humongous regions用完了. 惊!

    一开始做这个测试,用java-11也会发生full gc. 但是这周一,再次测试. 并没有发生full gc. 我还以为是自己误操作了. 这中间机器有重启过.我赶紧重启再试试.

    结果没有变化,一样出现full gc.真的是玄学.

    观察一下,程序启动之后,不读文件前的情况,Humongous regions:到了815.

    [14.651s][info][gc,start       ] GC(31) Pause Young (Concurrent Start) (G1 Humongous Allocation)
    [14.652s][info][gc,task        ] GC(31) Using 13 workers of 13 for evacuation
    [14.653s][info][gc,phases      ] GC(31)   Pre Evacuate Collection Set: 0.2ms
    [14.653s][info][gc,phases      ] GC(31)   Evacuate Collection Set: 0.7ms
    [14.653s][info][gc,phases      ] GC(31)   Post Evacuate Collection Set: 0.3ms
    [14.653s][info][gc,phases      ] GC(31)   Other: 0.2ms
    [14.653s][info][gc,heap        ] GC(31) Eden regions: 6->0(137)
    [14.653s][info][gc,heap        ] GC(31) Survivor regions: 1->1(19)
    [14.653s][info][gc,heap        ] GC(31) Old regions: 72->72
    [14.653s][info][gc,heap        ] GC(31) Humongous regions: 815->815
    [14.653s][info][gc,metaspace   ] GC(31) Metaspace: 16605K->16605K(1064960K)
    [14.653s][info][gc             ] GC(31) Pause Young (Concurrent Start) (G1 Humongous Allocation) 14233M->14137M(18432M) 1.460ms
    [14.653s][info][gc,cpu         ] GC(31) User=0.01s Sys=0.00s Real=0.00s
    

    然后读一下文件,达到847,然后一直不变. regions是16MB.

    这里是明确可以看出来,To-space exhausted,这里是属于年轻代没有足够的region 用来保存存活的对象, 现在剩余的没有被使用的region = 1152-(49+8+234+847)=14.可能确实不够了. 可以看这个文章CDH6.3 HBase: G1 GC Tuning with JDK11

    When you see to-space exhausted messages in GC logs, it means that G1 does not have enough memory for either survived or tenured objects, or for both, and the Java heap cannot be further expanded since it is already at its max.

    [200.883s][info][gc,start       ] GC(51) Pause Young (Prepare Mixed) (G1 Evacuation Pause)
    [200.883s][info][gc,task        ] GC(51) Using 13 workers of 13 for evacuation
    [200.900s][info][gc             ] GC(51) To-space exhausted
    [200.900s][info][gc,phases      ] GC(51)   Pre Evacuate Collection Set: 0.2ms
    [200.900s][info][gc,phases      ] GC(51)   Evacuate Collection Set: 11.5ms
    [200.900s][info][gc,phases      ] GC(51)   Post Evacuate Collection Set: 4.6ms
    [200.900s][info][gc,phases      ] GC(51)   Other: 0.2ms
    [200.900s][info][gc,heap        ] GC(51) Eden regions: 49->0(49)
    [200.900s][info][gc,heap        ] GC(51) Survivor regions: 8->8(8)
    [200.900s][info][gc,heap        ] GC(51) Old regions: 234->292
    [200.900s][info][gc,heap        ] GC(51) Humongous regions: 847->847
    [200.900s][info][gc,metaspace   ] GC(51) Metaspace: 23822K->23822K(1071104K)
    [200.900s][info][gc             ] GC(51) Pause Young (Prepare Mixed) (G1 Evacuation Pause) 18157M->18301M(18432M) 16.636ms
    [200.900s][info][gc,cpu         ] GC(51) User=0.18s Sys=0.00s Real=0.01s
    

    在java11 ,设置"-XX:G1HeapWastePercent=2",让mixed gc 早点回收空间. 也不会出现Full gc. 可以看到老年代的regions 数量减少, 巨型对象的Region没有减少.于是剩余的region 给年轻代用.

    [103.137s][info][gc,start       ] GC(124) Pause Young (Concurrent Start) (G1 Evacuation Pause)
    [103.137s][info][gc,task        ] GC(124) Using 13 workers of 13 for evacuation
    [103.168s][info][gc,phases      ] GC(124)   Pre Evacuate Collection Set: 0.3ms
    [103.168s][info][gc,phases      ] GC(124)   Evacuate Collection Set: 30.9ms
    [103.168s][info][gc,phases      ] GC(124)   Post Evacuate Collection Set: 0.3ms
    [103.168s][info][gc,phases      ] GC(124)   Other: 0.3ms
    [103.168s][info][gc,heap        ] GC(124) Eden regions: 49->0(49)
    [103.168s][info][gc,heap        ] GC(124) Survivor regions: 8->8(8)
    [103.168s][info][gc,heap        ] GC(124) Old regions: 180->211
    [103.168s][info][gc,heap        ] GC(124) Humongous regions: 847->847
    [103.168s][info][gc,metaspace   ] GC(124) Metaspace: 23769K->23769K(1071104K)
    [103.168s][info][gc             ] GC(124) Pause Young (Concurrent Start) (G1 Evacuation Pause) 17310M->17014M(18432M) 31.744ms
    [103.168s][info][gc,cpu         ] GC(124) User=0.40s Sys=0.00s Real=0.03s
    

    结论是:巨型对象占用的region 数量比想象中多. 启动之后就占了815,开始读文件就涨到847. 总的region数量就1152.

    其实java-11 启动的时候 年轻带分配的还挺大的,但是慢慢就变小了.

    [14.746s][info][gc,start       ] GC(31) Pause Young (Concurrent Start) (GCLocker Initiated GC)
    [14.746s][info][gc,task        ] GC(31) Using 13 workers of 13 for evacuation
    [14.748s][info][gc,phases      ] GC(31)   Pre Evacuate Collection Set: 0.2ms
    [14.748s][info][gc,phases      ] GC(31)   Evacuate Collection Set: 0.6ms
    [14.748s][info][gc,phases      ] GC(31)   Post Evacuate Collection Set: 0.2ms
    [14.748s][info][gc,phases      ] GC(31)   Other: 0.2ms
    [14.748s][info][gc,heap        ] GC(31) Eden regions: 9->0(126)
    [14.748s][info][gc,heap        ] GC(31) Survivor regions: 1->1(20)   #(126+20 )* 16MB = 2GB 左右
    [14.748s][info][gc,heap        ] GC(31) Old regions: 72->72
    [14.748s][info][gc,heap        ] GC(31) Humongous regions: 829->829
    [14.748s][info][gc,metaspace   ] GC(31) Metaspace: 16610K->16610K(1064960K)
    [14.748s][info][gc             ] GC(31) Pause Young (Concurrent Start) (GCLocker Initiated GC) 14504M->14368M(18432M) 1.317ms
    [14.748s][info][gc,cpu         ] GC(31) User=0.00s Sys=0.00s Real=0.00s
    [14.748s][info][gc             ] GC(32) Concurrent Cycle
    [14.748s][info][gc,marking     ] GC(32) Concurrent Clear Claimed Marks
    [14.748s][info][gc,marking     ] GC(32) Concurrent Clear Claimed Marks 0.015ms
    [14.748s][info][gc,marking     ] GC(32) Concurrent Scan Root Regions
    [14.758s][info][gc,marking     ] GC(32) Concurrent Scan Root Regions 10.401ms
    [14.758s][info][gc,marking     ] GC(32) Concurrent Mark (14.758s)
    [14.758s][info][gc,marking     ] GC(32) Concurrent Mark From Roots
    [14.758s][info][gc,task        ] GC(32) Using 3 workers of 3 for marking
    [14.770s][info][gc,marking     ] GC(32) Concurrent Mark From Roots 12.128ms
    [14.770s][info][gc,marking     ] GC(32) Concurrent Preclean
    [14.771s][info][gc,marking     ] GC(32) Concurrent Preclean 0.331ms
    [14.771s][info][gc,marking     ] GC(32) Concurrent Mark (14.758s, 14.771s) 12.517ms
    [14.773s][info][gc,start       ] GC(32) Pause Remark
    [14.774s][info][gc,stringtable ] GC(32) Cleaned string and symbol table, strings: 5135 processed, 0 removed, symbols: 30087 processed, 0 removed
    [14.774s][info][gc             ] GC(32) Pause Remark 14720M->14720M(18432M) 0.985ms
    [14.774s][info][gc,cpu         ] GC(32) User=0.01s Sys=0.00s Real=0.00s
    [14.774s][info][gc,marking     ] GC(32) Concurrent Rebuild Remembered Sets
    [14.784s][info][gc,marking     ] GC(32) Concurrent Rebuild Remembered Sets 9.509ms
    [14.784s][info][gc,start       ] GC(32) Pause Cleanup
    [14.784s][info][gc             ] GC(32) Pause Cleanup 14752M->14752M(18432M) 0.189ms
    [14.784s][info][gc,cpu         ] GC(32) User=0.00s Sys=0.00s Real=0.00s
    [14.784s][info][gc,marking     ] GC(32) Concurrent Cleanup for Next Mark
    [14.808s][info][gc,marking     ] GC(32) Concurrent Cleanup for Next Mark 24.023ms
    [14.808s][info][gc             ] GC(32) Concurrent Cycle 60.487ms
    [51.306s][info][gc,start       ] GC(33) Pause Young (Normal) (G1 Evacuation Pause)
    [51.306s][info][gc,task        ] GC(33) Using 13 workers of 13 for evacuation
    [51.372s][info][gc,phases      ] GC(33)   Pre Evacuate Collection Set: 0.2ms
    [51.372s][info][gc,phases      ] GC(33)   Evacuate Collection Set: 65.2ms
    [51.372s][info][gc,phases      ] GC(33)   Post Evacuate Collection Set: 0.5ms
    [51.372s][info][gc,phases      ] GC(33)   Other: 0.3ms
    [51.372s][info][gc,heap        ] GC(33) Eden regions: 126->0(49)
    [51.372s][info][gc,heap        ] GC(33) Survivor regions: 1->16(16) # 看到eden的126用完之后,回收,然后把年轻代变少了.存活的对象survivor 写不下的都到了老年代.
    [51.372s][info][gc,heap        ] GC(33) Old regions: 72->99
    [51.372s][info][gc,heap        ] GC(33) Humongous regions: 847->847
    [51.372s][info][gc,metaspace   ] GC(33) Metaspace: 23705K->23705K(1071104K)
    [51.372s][info][gc             ] GC(33) Pause Young (Normal) (G1 Evacuation Pause) 16672M->15342M(18432M) 66.195ms
    [51.372s][info][gc,cpu         ] GC(33) User=0.38s Sys=0.40s Real=0.07s
    
    

    最终结果是老年代用的更多了,mixed gc 回收的收益又达不到预期的10%,因此也不回收空间. 于是G1的自动调节,给老年代分配了更多的空间.

    这问题还是 Humongous regions数量太多.

    对照组

    设置22GB,不修改其他参数,读文件没有发生full gc. 并且明显看到 新生代自动调整了.

       [Eden: 16.0M(1088.0M)->0.0B(1088.0M) Survivors: 0.0B->0.0B Heap: 16.0M(21.5G)->601.1K(21.5G)], [Metaspace: 3182K->3182K(1056768K)]
       [Eden: 1088.0M(1088.0M)->0.0B(944.0M) Survivors: 0.0B->144.0M Heap: 2149.7M(21.5G)->1335.2M(21.5G)]
       [Eden: 880.0M(944.0M)->0.0B(944.0M) Survivors: 144.0M->144.0M Heap: 7952.0M(21.5G)->7886.3M(21.5G)]
       [Eden: 944.0M(944.0M)->0.0B(944.0M) Survivors: 144.0M->144.0M Heap: 10.5G(21.5G)->10238.2M(21.5G)]
       [Eden: 944.0M(944.0M)->0.0B(944.0M) Survivors: 144.0M->144.0M Heap: 10.9G(21.5G)->10.4G(21.5G)]
       [Eden: 944.0M(944.0M)->0.0B(944.0M) Survivors: 144.0M->144.0M Heap: 11.3G(21.5G)->10.7G(21.5G)]
       [Eden: 944.0M(944.0M)->0.0B(944.0M) Survivors: 144.0M->144.0M Heap: 11.4G(21.5G)->10.9G(21.5G)]
       [Eden: 944.0M(944.0M)->0.0B(944.0M) Survivors: 144.0M->144.0M Heap: 11.8G(21.5G)->11.3G(21.5G)]
       [Eden: 944.0M(944.0M)->0.0B(944.0M) Survivors: 144.0M->144.0M Heap: 12.3G(21.5G)->11.8G(21.5G)]
       [Eden: 944.0M(944.0M)->0.0B(944.0M) Survivors: 144.0M->144.0M Heap: 11.6G(21.5G)->11.1G(21.5G)]
       [Eden: 944.0M(944.0M)->0.0B(944.0M) Survivors: 144.0M->144.0M Heap: 12.1G(21.5G)->11.6G(21.5G)]
       [Eden: 944.0M(944.0M)->0.0B(944.0M) Survivors: 144.0M->144.0M Heap: 12.5G(21.5G)->12.0G(21.5G)]
       [Eden: 944.0M(944.0M)->0.0B(944.0M) Survivors: 144.0M->144.0M Heap: 11.6G(21.5G)->11.2G(21.5G)]
       [Eden: 944.0M(944.0M)->0.0B(944.0M) Survivors: 144.0M->144.0M Heap: 12.1G(21.5G)->11.7G(21.5G)]
       [Eden: 944.0M(944.0M)->0.0B(944.0M) Survivors: 144.0M->144.0M Heap: 12.6G(21.5G)->12.1G(21.5G)]
       [Eden: 944.0M(944.0M)->0.0B(944.0M) Survivors: 144.0M->144.0M Heap: 11.7G(21.5G)->11.3G(21.5G)]
       [Eden: 944.0M(944.0M)->0.0B(1424.0M) Survivors: 144.0M->144.0M Heap: 12.2G(21.5G)->11.7G(21.5G)]
       [Eden: 1424.0M(1424.0M)->0.0B(1600.0M) Survivors: 144.0M->208.0M Heap: 13.1G(21.5G)->12.1G(21.5G)]
       [Eden: 1600.0M(1600.0M)->0.0B(1952.0M) Survivors: 208.0M->240.0M Heap: 12.4G(21.5G)->11.4G(21.5G)]
       [Eden: 1952.0M(1952.0M)->0.0B(2224.0M) Survivors: 240.0M->288.0M Heap: 13.3G(21.5G)->11.8G(21.5G)]
       [Eden: 2224.0M(2224.0M)->0.0B(1840.0M) Survivors: 288.0M->320.0M Heap: 14.0G(21.5G)->12.2G(21.5G)]
       [Eden: 1840.0M(1840.0M)->0.0B(2768.0M) Survivors: 320.0M->272.0M Heap: 12.7G(21.5G)->11.3G(21.5G)]
       [Eden: 2768.0M(2768.0M)->0.0B(2320.0M) Survivors: 272.0M->384.0M Heap: 14.0G(21.5G)->11.7G(21.5G)]
       [Eden: 2320.0M(2320.0M)->0.0B(2032.0M) Survivors: 384.0M->352.0M Heap: 13.9G(21.5G)->12.0G(21.5G)]
       [Eden: 2032.0M(2032.0M)->0.0B(2560.0M) Survivors: 352.0M->304.0M Heap: 13.0G(21.5G)->11.3G(21.5G)]
       [Eden: 2560.0M(2560.0M)->0.0B(2256.0M) Survivors: 304.0M->368.0M Heap: 13.8G(21.5G)->11.7G(21.5G)]
       [Eden: 2256.0M(2256.0M)->0.0B(1936.0M) Survivors: 368.0M->336.0M Heap: 13.9G(21.5G)->12.1G(21.5G)]
       [Eden: 1936.0M(1936.0M)->0.0B(2480.0M) Survivors: 336.0M->288.0M Heap: 13.0G(21.5G)->11.4G(21.5G)]
    

    没有full gc 也没有触发mixed gc 回收空间.

    [root@node2 octopus]# cat /tmp/dse.log-22g-soa |grep 'GC pause' |grep mixed |wc -l
    0
    
    [root@node2 octopus]# cat /tmp/dse.log-22g-soa |grep Mixed
     175.554: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 2 regions, reclaimable: 22810456 bytes (0.10 %), threshold: 10.00 %]
     178.803: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 54 regions, reclaimable: 440224032 bytes (1.91 %), threshold: 10.00 %]
     181.797: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 59 regions, reclaimable: 515677760 bytes (2.24 %), threshold: 10.00 %]
     183.525: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 67 regions, reclaimable: 654036576 bytes (2.84 %), threshold: 10.00 %]
     186.059: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 69 regions, reclaimable: 683362152 bytes (2.96 %), threshold: 10.00 %]
     190.912: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 75 regions, reclaimable: 825892960 bytes (3.58 %), threshold: 10.00 %]
     196.683: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 64 regions, reclaimable: 758671280 bytes (3.29 %), threshold: 10.00 %]
     202.352: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 63 regions, reclaimable: 758561896 bytes (3.29 %), threshold: 10.00 %]
     208.441: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 66 regions, reclaimable: 808763384 bytes (3.51 %), threshold: 10.00 %]
     212.869: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 71 regions, reclaimable: 888348096 bytes (3.85 %), threshold: 10.00 %]
     218.121: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 76 regions, reclaimable: 959590656 bytes (4.16 %), threshold: 10.00 %]
    

    这说明堆大了,G1自动调整起效果了. 对象基本在新生代分配,没有怎么晋升到老年代,young gc就满足了需求.因此也不会触发full gc.

    可以看到mixed gc 没有触发回收的原因是,可以回收的收益达不到预期.

    另外 如何看是否发生巨型对象分配.

    恰好程序现在启动时,会分配一些巨型对象

    程序启动时 巨型对象分配 尝试触发mixed gc
    [root@node2 octopus]# cat /tmp/dse.log-22g-soa |grep mixed|head -n 1
     15.834: [G1Ergonomics (Concurrent Cycles) do not request concurrent cycle initiation, reason: still doing mixed collections, occupancy: 12062818304 bytes, allocation request: 17400888 bytes, threshold: 10380902400 bytes (45.00 %), source: concurrent humongous allocation]
    

    再另外 jmap dump 和 程序实际占用的内存大小对不上

    这是使用java 11 时的情况.

    /usr/lib/jvm/java-11/bin/jmap  -dump:format=b,file=/data/all.dump  31821
    /usr/lib/jvm/java-11/bin/jmap  -dump:live,format=b,file=/data/live.dump  31821
    
    [root@node2 octopus]# ll -alh /data/*.dump
    -rw------- 1 root root 11G May 20 09:34 /data/all.dump #全部对象的堆只有11GB大小
    -rw------- 1 root root 10G May 20 09:35 /data/live.dump #存活对象的堆只有10GB大小
    [root@node2 octopus]# grep Dump /tmp/dse.log
    [46826.131s][info][gc,start       ] GC(150) Pause Full (Heap Dump Initiated GC)
    [46826.157s][info][gc             ] GC(150) Pause Full (Heap Dump Initiated GC) 15099M->15024M(18432M) 25.916ms
    # gc 日志里面显示,堆dump之后,内存占用还是15GB左右.
    
    [root@node2 octopus]# /usr/lib/jvm/java-11/bin/jstat -gcold 31821 1000
       MC       MU      CCSC     CCSU       OC          OU       YGC    FGC    FGCT    CGC    CGCT     GCT
     25728.0  24667.5   2688.0   2437.7  17891328.0  15385109.5     93     6    0.360    96    0.075    2.605
     25728.0  24667.5   2688.0   2437.7  17891328.0  15385109.5     93     6    0.360    96    0.075    2.605
    # 老年代使用量OU显示还是 使用了接近15GB内存
    
    ^C[root@node2 octopus]# /usr/lib/jvm/java-11/bin/jstat -gcnew 31821 1000
     S0C    S1C    S0U    S1U   TT MTT  DSS      EC       EU     YGC     YGCT
       0.0    0.0    0.0    0.0 15  15 65536.0 983040.0      0.0     93    2.170
       0.0    0.0    0.0    0.0 15  15 65536.0 983040.0      0.0     93    2.170
       0.0    0.0    0.0    0.0 15  15 65536.0 983040.0      0.0     93    2.170
    #年轻代使用量EU,显示已经是0
    
       PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
      31821 root      20   0  159.4g  15.0g  25260 S   0.0 31.9   3:42.77 jsvc
    #top也可以看到 程序实际使用了15GB. (15.0g - 25260)
    
  • 在Linux如何查看一个进程的内存情况

    经常使用 top 查看监控系统进程情况,但是里面很多指标还是不够清晰。
    * 为什么 VIRT 有时候会比系统内存还大
    * RES 比 配置jvm 的Xmx还大
    * top 中那些指标是关于内存的

    通过 top 命令

    CODEandDATA需要按F,然后使用空格键选中,才会显示出来

    top -p 1210
    
    PID USER      PR  NI    VIRT    RES    SHR   CODE    DATA   SWAP S %CPU %MEM     TIME+ COMMAND
    1210 mysql     20   0 1158504 130804   7736  22472 1052816      0 S  0.0  7.0   8:53.50 mariadbd
    

    VIRT

    36. VIRT  --  Virtual Memory Size (KiB)
         The total amount of virtual memory used by the task.  It includes all code, data and shared libraries plus pages that have been swapped out and pages that have  been mapped but not used.
    

    VIRT=CODE+DATA+shared libraries +pages that have been swapped out+pages that have been mapped but not used

    SWAP

    27. SWAP  --  Swapped Size (KiB)
            The non-resident portion of a task's address space.
    

    被 swap-out 的内存页大小

    RES

    17. RES  --  Resident Memory Size (KiB)
        The non-swapped physical memory a task is using.
    

    一个任务正在使用的,没有被swap-out 的物理内存

    CODE

     4. CODE  --  Code Size (KiB)
            The amount of physical memory devoted to executable code, also known as the Text Resident Set size or TRS.
    
    可执行代码驻留的物理内存总量,驻存代码集合(Text Resident Set, TRS)
    

    DATA

    6. DATA  --  Data + Stack Size (KiB)
         The amount of physical memory devoted to other than executable code, also known as the Data Resident Set size or DRS.
    

    data 是 VM_WRITE & ~VM_SHARED & ~VM_STACK 与 VM_STACK 占用内存页之和,也就是所有非栈内存中可写但非共享内存页与栈内存页之和

    $ANON = RES – SHR$ ( ANON 表示在堆上分配的内存)

    $ANON <= DATA$ (vm_physic)

    SHR

    21. SHR  --  Shared Memory Size (KiB)
            The amount of shared memory available to a task, not all of which is typically resident.  It simply reflects memory that could be potentially shared with other  processes.
    
            任务可用的共享内存量,但并非所有的共享内存都是常驻(resident)的。它(SHR)只是反映了可能与其他进程共享的内存
    

    SHR contains all virtual memory that could be shared with other processes, and RSS contains all memory physically in RAM that is used by the process.

    Thus all shared memory currently in RAM is counted both in SHR and in RSS, so SHR + RSS has no meaning since it can contain duplicates counts.(SHR + RSS没有意义,因为他们可能包含重复的项)

    1. 除了自身进程的共享内存,也包括其他进程的共享内存
    2. 虽然进程只使用了几个共享库的函数,但它包含了整个共享库的大小
    3. 计算某个进程所占的物理内存大小公式:RES – SHR
    4. swap out后,它将会降下来

    PR

    16. PR  --  Priority
        The scheduling priority of the task.  If you see `rt` in this field, it means the task is running under real time scheduling priority.
    
        Under linux, real time priority is somewhat misleading since traditionally the operating itself was not preemptible.  And while the 2.6 kernel  can  be  made  mostly preemptible, it is not always so.
        在linux下,实时优先级有些误导,因为传统上操作本身不是可抢占的。虽然2.6内核可以使其成为可抢占的,但并不总是如此。
    

    实时优先级取值范围是[1,99],它对应的就是0-99的优先级,优先级较高。

    In most cases PR value can be computed by the following formula: PR = 20 + NI.

    Theoretically the kernel can change PR value (but not NI) by itself. For example it may reduce the priority of a process if it consumes too much CPU, or it may increase the priority of a process if that process had no chance to run for a long time because of other higher priority processes. In these cases the PR value will be changed by kernel and NI will remain the same, thus the formula “PR = 20 + NI” will not be correct.

    NI

    11. NI  --  Nice Value
        The nice value of the task.  A negative nice value means higher priority, whereas a positive nice value means lower priority(值越小优先级越高).  Zero in this field simply means priority will not be adjusted in determining a task's dispatch-ability.
        任务的静态调度优先级,取值范围是[-20,19]。
    
    

    Linux实际上实现了140个优先级范围,取值范围是从0~139,这个值越小,优先级越高。nice值的-20到19,映射到实际的优先级范围是100-139。

    通过 proc filesystem

    cat /proc/1210/statm
    289626 32701 1934 5618 0 263204 0

    //os 内存页大小

    getconf PAGESIZE
    4096

    Table 1-3: Contents of the statm files (as of 2.6.8-rc3)

    Field Content 与 top 相关字段
    size total program size (pages) (same as VmSize in status) $$VIRT=289626*4096/1024=1158504$$
    resident size of memory portions (pages) (same as VmRSS in status) $$RES=32701*4096/1024 = 130804$$
    shared number of pages that are shared (i.e. backed by a file, same as RssFile+RssShmem in status) $$SHR=1934*4096/1024=7736$$
    trs number of pages that are ‘code’ (not including libs; broken, includes data segment) $$CODE=5618*4096/1024=22472$$
    lrs number of pages of library (always 0 on 2.6)
    drs number of pages of data/stack (including libs; broken, includes library text) $$DATA=263204*4096/1024=1052816$$
    dt number of dirty pages (always 0 on 2.6)

    pmap

    ➜  ~ pmap -X 1210|head -n 5
    1210:   /usr/sbin/mariadbd
             Address Perm   Offset Device Inode    Size    Rss    Pss Referenced Anonymous Swap Locked Mapping
        556335aee000 r-xp 00000000  fd:01 22182   22472   5228   5228       5172         0    0      0 mariadbd
        5563372df000 r--p 015f1000  fd:01 22182    1392   1392   1392       1392      1392    0      0 mariadbd
        55633743b000 rw-p 0174d000  fd:01 22182     720    416    416        416       384    0      0 mariadbd
    ➜  ~ pmap -X 1210|tail -n 5
        7ffc7142e000 rw-p 00000000  00:00     0     132     76     76         76        76    0      0 [stack]
        7ffc714af000 r-xp 00000000  00:00     0       8      4      0          4         0    0      0 [vdso]
    ffffffffff600000 r-xp 00000000  00:00     0       4      0      0          0         0    0      0 [vsyscall]
                                                ======= ====== ====== ========== ========= ==== ======
                                                1158508 131308 128832     131136    123272    0      0 KB
    

    In computing, proportional set size (PSS) is the portion of main memory (RAM) occupied by a process and is composed by the private memory of that process plus the proportion of shared memory with one or more other processes(由该进程的私有内存加上与一个或多个其他进程的共享内存的比例组成). Unshared memory including the proportion of shared memory is reported as the PSS.

    REF

    top命令显示信息之谜

    Linux top 命令里的内存相关字段(VIRT, RES, SHR, CODE, DATA)

    The /proc Filesystem

    Proportional set size

  • Rubrik Atlas文件系统

    Introducing Atlas, Rubrik’s Cloud-Scale File System
    THE ATLAS FILE SYSTEM – THE FOUNDATION OF THE RUBRIK PLATFORM
    Rubrik Atlas File System: Designed to be Masterless, Self-healing, and Cloud-scale

    做备份产品,底座最重要。就如《密集数据交互系统设计》书中第五章开头所说

    与可能出错的东西比,’不可能’出错的东西最显著的特点就是:一旦真的出错,通常就彻底玩完了。
    ——道格拉斯·亚当斯(1992)

    Rubrik作为业界一流的备份产品,altas文件系统是它产品底座是。我对altas如何保护数据完整性和 altas 如何提高性能特别感兴趣。

    节约空间

    • 用于支持 the time machine paradigm。
    • 可以理解 altas 存储了一系列版本不同的文件.每个系列的文件代表虚拟机,每个快照对于该文件一个版本
    • 对于增量备份,altas 每个快照所对应的文件只存储它相对于前一个快照的增量数据

    性能优化细节

    • altas 将虚拟机文件副本相关的数据块放置到特定的节点上,而不是随机分散到整个集群所有节点.

      for a file with three replicas, three nodes in our system will each contain the entirety of the data representing that file.

    • altas 数据落盘时,根据文件进行逻辑分组

      • 对于任意一个虚拟机,在集群中,都有一些节点会本地存储了整个虚拟机的备份历史的所有数据
      • Atlas会将相似的虚拟机的副本放在一起,以允许跨虚拟机重复数据删除
    • 数据均衡
      通过调整虚拟机复制机制和副本存放位置,平衡节点间的容量

    • 利用硬件

      • 以条带方式写到单个节点的多个磁盘上,聚合磁盘的性能.(读写性能分别提升 3 倍)
        将一个文件的数据横跨多个磁盘进行条带写
      • 利用ssd 加速
        • 存储虚拟机影响虚拟机性能的文件
        • 用于存储机械盘的缓冲层

    猜想

    1. altas 是没有将快照链的数据真实合并的。进行瞬时恢复时,才进行元数据合并对外提供需要的快照副本。我们设计的系统存储的虚拟机快照也是类似,并且数据真实合并了。但是性能比不上 rubirk。说明它有很多细节优化。猜想是不是把合并的元数据都放在 ssd 上。
    2. 节点间数据均衡如何实现。节点是需要有整个虚拟机的快照链数据,那么需要均衡时,需要把整个虚拟机的快照链迁移走。那么均衡时,要移动的数据量可能很大
  • GLIBCXX_USE_CXX11_ABI

    代码在centos7环境上编译可以跑.但是其他同事在测试.刚好手头上有一台rehl8的环境是空闲的.

    在rehl8上编译是成功的.但是程序运行就会崩溃.

    日志里面可以看到

    fusionsphere/so/libfc.so: undefined symbol: _ZN16CFusionSphereSDK10InitialSDKEPFviPKczEP23CFusionSphereDebugLevelRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESD_SD_
    

    可是程序明明是通过编译没有出错,甚至没有泛型这样的rtti. 明显也不是访问错误内存的段错误.

    (more…)

  • 如何在函数中给指针赋值

    指针在局部函数中也是值传递.因此修改指针本身的值,是不会在局部函数外其效果.

    #include <iostream>
    
    inline static void assignPtr1(int *p) {
        int n = 2;
        p = &n;
    }
    
    inline static void assignPtr2(int *p) {
        int n = 2;
        *p = n;
    }
    
    int main() {
        int v = 1;
        int *p = &v;
    
        std::cout << "ptr p : " << *p << std::endl;
        assignPtr1(p);
        std::cout << "change ptr p by p=&n : " << *p << std::endl;
    
    
        int *q = &v;
    
        std::cout << "ptr q : " << *q << std::endl;
        assignPtr2(q);
        std::cout << "change ptr q by *q=n : " << *q << std::endl;
        return 0;
    }
    
    ptr p : 1
    change ptr p by p=&n : 1
    ptr q : 1
    change ptr q by *q=n : 2