2024-12-28 23点左右,线上其他服务请求 content-cache 出现批量超时。
content-cache-03 机器内存使用率如下:
这里因为JVM参数设置为:
-Xms4g -Xmx4g -XX:MaxNewSize=1g
所以达到42%时,内存的使用率已经达到了3.3G。
查看GC日志如下:
2024-12-27T12:30:51.070+0800: 4745.252: [GC (CMS Initial Mark) [1 CMS-initial-mark: 2206547K(3145728K)] 2268778K(4089472K), 0.0037549 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 2024-12-27T12:30:51.075+0800: 4745.256: [CMS-concurrent-mark-start] 2024-12-27T12:30:51.206+0800: 4745.387: [CMS-concurrent-mark: 0.131/0.131 secs] [Times: user=0.43 sys=0.07, real=0.13 secs] 2024-12-27T12:30:51.206+0800: 4745.388: [CMS-concurrent-preclean-start] 2024-12-27T12:30:51.214+0800: 4745.396: [CMS-concurrent-preclean: 0.008/0.008 secs] [Times: user=0.02 sys=0.01, real=0.01 secs] 2024-12-27T12:30:51.215+0800: 4745.396: [CMS-concurrent-abortable-preclean-start] 2024-12-27T12:30:51.760+0800: 4745.941: [CMS-concurrent-abortable-preclean: 0.544/0.545 secs] [Times: user=1.22 sys=0.16, real=0.55 secs] 2024-12-27T12:30:51.764+0800: 4745.945: [GC (CMS Final Remark) [YG occupancy: 521124 K (943744 K)]2024-12-27T12:30:51.764+0800: 4745.945: [GC (CMS Final Remark) 2024-12-27T12:30:51.764+0800: 4745.946: [ParNew: 521124K->28400K(943744K), 0.0095777 secs] 2727671K->2234947K(4089472K), 0.0103370 secs] [Times: user=0.07 sys=0.00, real=0.01 secs] 2024-12-27T12:30:51.775+0800: 4745.956: [Rescan (parallel) , 0.0048798 secs]2024-12-27T12:30:51.779+0800: 4745.961: [weak refs processing, 0.0183285 secs]2024-12-27T12:30:51.798+0800: 4745.979: [class unloading, 0.0379042 secs]2024-12-27T12:30:51.836+0800: 4746.017: [scrub symbol table, 0.0131591 secs]2024-12-27T12:30:51.849+0800: 4746.030: [scrub string table, 0.0014270 secs][1 CMS-remark: 2206547K(3145728K)] 2234947K(4089472K), 0.1033068 secs] [Times: user=0.19 sys=0.00, real=0.10 secs] 2024-12-27T12:30:51.868+0800: 4746.050: [CMS-concurrent-sweep-start] 2024-12-27T12:30:52.865+0800: 4747.046: [CMS-concurrent-sweep: 0.997/0.997 secs] [Times: user=2.50 sys=0.46, real=1.00 secs] 2024-12-27T12:30:52.865+0800: 4747.047: [CMS-concurrent-reset-start] 2024-12-27T12:30:52.874+0800: 4747.056: [CMS-concurrent-reset: 0.005/0.009 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
(2)CMS老年代回收期间发生一次新生代GC
2024-12-27T12:39:03.418+0800: 5237.599: [GC (CMS Initial Mark) [1 CMS-initial-mark: 2221631K(3145728K)] 2326535K(4089472K), 0.0164690 secs] [Times: user=0.12 sys=0.00, real=0.02 secs] 2024-12-27T12:39:03.435+0800: 5237.617: [CMS-concurrent-mark-start] 2024-12-27T12:39:03.659+0800: 5237.841: [CMS-concurrent-mark: 0.224/0.224 secs] [Times: user=1.14 sys=0.22, real=0.23 secs] 2024-12-27T12:39:03.660+0800: 5237.842: [CMS-concurrent-preclean-start] 2024-12-27T12:39:03.681+0800: 5237.862: [CMS-concurrent-preclean: 0.020/0.021 secs] [Times: user=0.14 sys=0.01, real=0.02 secs] 2024-12-27T12:39:03.681+0800: 5237.863: [CMS-concurrent-abortable-preclean-start] 2024-12-27T12:39:04.070+0800: 5238.251: [GC (Allocation Failure) 2024-12-27T12:39:04.070+0800: 5238.252: [ParNew: 943744K->85754K(943744K), 0.0289108 secs] 3165375K->2332721K(4089472K), 0.0298252 secs] [Times: user=0.16 sys=0.01, real=0.03 secs] 2024-12-27T12:39:04.790+0800: 5238.972: [CMS-concurrent-abortable-preclean: 1.073/1.109 secs] [Times: user=2.95 sys=0.47, real=1.11 secs] 2024-12-27T12:39:04.795+0800: 5238.976: [GC (CMS Final Remark) [YG occupancy: 556391 K (943744 K)]2024-12-27T12:39:04.795+0800: 5238.976: [GC (CMS Final Remark) 2024-12-27T12:39:04.795+0800: 5238.977: [ParNew: 556391K->33226K(943744K), 0.0142288 secs] 2803358K->2280193K(4089472K), 0.0149618 secs] [Times: user=0.09 sys=0.00, real=0.02 secs] 2024-12-27T12:39:04.810+0800: 5238.992: [Rescan (parallel) , 0.0071785 secs]2024-12-27T12:39:04.817+0800: 5238.999: [weak refs processing, 0.0170401 secs]2024-12-27T12:39:04.834+0800: 5239.016: [class unloading, 0.0380791 secs]2024-12-27T12:39:04.872+0800: 5239.054: [scrub symbol table, 0.0136271 secs]2024-12-27T12:39:04.886+0800: 5239.068: [scrub string table, 0.0014743 secs][1 CMS-remark: 2246966K(3145728K)] 2280193K(4089472K), 0.1099299 secs] [Times: user=0.22 sys=0.00, real=0.11 secs] 2024-12-27T12:39:04.906+0800: 5239.087: [CMS-concurrent-sweep-start] 2024-12-27T12:39:05.848+0800: 5240.030: [CMS-concurrent-sweep: 0.943/0.943 secs] [Times: user=2.13 sys=0.24, real=0.94 secs] 2024-12-27T12:39:05.849+0800: 5240.030: [CMS-concurrent-reset-start] 2024-12-27T12:39:05.854+0800: 5240.035: [CMS-concurrent-reset: 0.005/0.005 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
(3)CMS老年代回收期间发生两次新生代GC(造成延迟的一次)
2024-12-28T23:01:04.020+0800: 128958.201: [GC (Allocation Failure) 2024-12-28T23:01:04.020+0800: 128958.202: [ParNew: 922124K->104832K(943744K), 0.1259681 secs] 3117264K->2392877K(4089472K), 0.1268294 secs] [Times: user=0.73 sys=0.00, real=0.12 secs] 2024-12-28T23:01:04.153+0800: 128958.335: [GC (CMS Initial Mark) [1 CMS-initial-mark: 2288045K(3145728K)] 2398447K(4089472K), 0.0167679 secs] [Times: user=0.11 sys=0.00, real=0.02 secs] 2024-12-28T23:01:04.171+0800: 128958.352: [CMS-concurrent-mark-start] 2024-12-28T23:01:04.480+0800: 128958.661: [GC (Allocation Failure) 2024-12-28T23:01:04.480+0800: 128958.662: [ParNew: 943744K->104832K(943744K), 0.3016805 secs] 3231789K->2686743K(4089472K), 0.3025824 secs] [Times: user=1.74 sys=0.00, real=0.30 secs] 2024-12-28T23:01:05.050+0800: 128959.231: [GC (Allocation Failure) 2024-12-28T23:01:05.050+0800: 128959.232: [ParNew: 943744K->104832K(943744K), 0.2612079 secs] 3525655K->2930593K(4089472K), 0.2621099 secs] [Times: user=1.51 sys=0.00, real=0.26 secs] 2024-12-28T23:01:05.505+0800: 128959.687: [CMS-concurrent-mark: 0.757/1.334 secs] [Times: user=8.24 sys=0.75, real=1.33 secs] 2024-12-28T23:01:05.506+0800: 128959.687: [CMS-concurrent-preclean-start] 2024-12-28T23:01:06.108+0800: 128960.289: [CMS-concurrent-preclean: 0.589/0.602 secs] [Times: user=2.02 sys=0.60, real=0.60 secs] 2024-12-28T23:01:06.108+0800: 128960.290: [CMS-concurrent-abortable-preclean-start] 2024-12-28T23:01:06.108+0800: 128960.290: [CMS-concurrent-abortable-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 2024-12-28T23:01:06.113+0800: 128960.295: [GC (CMS Final Remark) [YG occupancy: 895656 K (943744 K)]2024-12-28T23:01:06.113+0800: 128960.295: [GC (CMS Final Remark) 2024-12-28T23:01:06.114+0800: 128960.295: [ParNew: 895656K->895656K(943744K), 0.0000377 secs] 3721417K->3721417K(4089472K), 0.0006760 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 2024-12-28T23:01:06.114+0800: 128960.296: [Rescan (parallel) , 0.6955777 secs]2024-12-28T23:01:06.810+0800: 128960.991: [weak refs processing, 0.0174431 secs]2024-12-28T23:01:06.827+0800: 128961.009: [class unloading, 0.0391055 secs]2024-12-28T23:01:06.867+0800: 128961.048: [scrub symbol table, 0.0128982 secs]2024-12-28T23:01:06.879+0800: 128961.061: [scrub string table, 0.0014971 secs][1 CMS-remark: 2825761K(3145728K)] 3721417K(4089472K), 0.7845246 secs] [Times: user=5.51 sys=0.00, real=0.78 secs] 2024-12-28T23:01:06.899+0800: 128961.080: [CMS-concurrent-sweep-start] 2024-12-28T23:01:11.163+0800: 128965.345: [CMS-concurrent-sweep: 2.170/4.264 secs] [Times: user=13.16 sys=1.91, real=4.27 secs] 2024-12-28T23:01:11.173+0800: 128965.354: [GC (Allocation Failure) 2024-12-28T23:01:11.173+0800: 128965.355: [ParNew: 943744K->103600K(943744K), 0.0749256 secs] 2090310K->1292644K(4089472K), 0.0760357 secs] [Times: user=0.50 sys=0.00, real=0.07 secs] 2024-12-28T23:01:13.551+0800: 128967.733: [CMS-concurrent-reset-start] 2024-12-28T23:01:13.692+0800: 128967.874: [CMS-concurrent-reset: 0.005/0.141 secs] [Times: user=0.19 sys=0.00, real=0.14 secs]
ParNew:处理新生代GC
GC (Allocation Failure):触发新生代GC的原因是由新生代内存分配失败导致
943744K->104832K(943744K):新生代GC前内存大小943744K,回收后104832K,回收量 943744K - 104832K(注意,清空的这些一部分被销毁,一部分被转往了老年代)
3231789K->2686743K(4089472K):堆内存总体情况,回收前 3231789K,回收后 2686743K
0.3025824:耗时 0.3秒
CMS:CMS只处理老年代的垃圾回收
• Step1:初始标记(有STW)
CMS Initial Mark:标记 GC Roots 可以直接到达的对象。
2288045K(3145728K):当前老年代中存活对象的大小为2288045K / 当前老年代最大可用内存容量3145728K
2398447K(4089472K):垃圾回收前总的内存使用量(老年代+新生代)/堆内存的总大小(包括年轻代和老年代)
0.0167679 secs:此阶段耗时0.016毫秒
Times: user=0.11 sys=0.00, real=0.02 secs:用户态、内核态、实际耗时
• Step2:并发标记
CMS-concurrent-mark-start:并发标记,回收线程与用户线程并行执行。
• Step3:并发预处理与可中断的并发预处理
-XX:-CMSPrecleaningEnabled 可通过参数设置是否开启,默认开启。
这一步的主要目的是处理新生代进入老年代的对象,尽量减少最终标记的耗时。
CMS-concurrent-preclean-start:并发预处理开始
CMS-concurrent-preclean: 0.008/0.008 secs:并发预处理结束,耗时 0.008 秒
CMS-concurrent-abortable-preclean-start:可中断的并发预处理开始
CMS-concurrent-abortable-preclean: 0.000/0.000 secs:可中断的并发预处理结束,耗时为0
• Step4:最终标记(STW)
CMS Final Remark) [YG occupancy: 895656 K (943744 K):最终标记阶段,此时新生代内存占用 895656 K,最大可用量 943744 K
Rescan (parallel):并行扫描存活对象,确保标记准确性,耗时 0.6955777
秒
weak refs processing:处理弱引用,耗时 0.0174431
秒。
class unloading:卸载未使用的类,释放元数据空间,耗时 0.0391055
秒。
scrub symbol table:清理符号表,释放相关内存,耗时 0.0128982
秒。
scrub string table:清理字符串表,释放无用的字符串,耗时 0.0014971
秒
CMS-remark: 2825761K(3145728K):当前老年代使用量 2825761K,最大可用量 3145728K
堆内存使用总量:3721417 KB,总容量:4089472 KB
3721417K(4089472K):当前总内存使用量 3721417K,最大可用量 4089472K
0.7845246 secs:重新标记阶段总耗时(0.78 秒)
• Step5:并发清理
CMS-concurrent-sweep-start:CMS 并发清理阶段开始。
CMS-concurrent-sweep: 2.170/4.264 secs(清理任务的耗时/总耗时,包括线程调度等系统开销)
• Step6:并发重置
重置CMS算法相关的内部数据,为下一次GC循环做准备。
CMS-concurrent-reset-start:CMS 并发重置阶段开始。
[CMS-concurrent-reset: 0.005/0.141 secs]:重置结束,耗时
GC对系统的影响主要发生在 【初始标记】和【最终标记】两个阶段,因为这两个阶段是 STW 的。
数据1两阶段耗时:0.0037549 + 0.1033068 = 0.107秒
数据2两阶段耗时:0.0164690 + 0.1099299 = 0.126秒
数据3两阶段耗时:0.1268294 + 0.7845246 = 0.911秒
问题总结:在数据3中,可以明显看到,在并发标记阶段由于新生代内存使用量增长过快,导致发生了两次新生代GC,这就造成重复标记阶段耗时增加,达到了0.78秒。
4、解决思路
解决思路就是增加新生代的大小,同时为避免新生代一次回收量过大对老年代造成压力,老年代也增加,同时降低CMS的触发门槛(由70%—>60%)
启动参数由:
java -jar -Xms4g -Xmx4g -XX:MaxNewSize=1g -XX:MaxPermSize=1g -XX:+AggressiveOpts -XX:+UseBiasedLocking -XX:+DisableExplicitGC -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -Xloggc:/home/glory/dz-content-cache/logs/gc.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseCMSCompactAtFullCollection -XX:+CMSClassUnloadingEnabled -XX:SoftRefLRUPolicyMSPerMB=0 -XX:+CMSParallelInitialMarkEnabled -XX:+CMSScavengeBeforeRemark -Dnetworkaddress.cache.ttl=60 dz-content-cache.jar --spring.profiles.active=online
更新为:
java -jar -Xms6g -Xmx6g -XX:MaxNewSize=2g -XX:MaxPermSize=1g -XX:+AggressiveOpts -XX:+UseBiasedLocking -XX:+DisableExplicitGC -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=60 -Xloggc:/home/glory/dz-content-cache/logs/gc.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseCMSCompactAtFullCollection -XX:+CMSClassUnloadingEnabled -XX:SoftRefLRUPolicyMSPerMB=0 -XX:+CMSParallelInitialMarkEnabled -XX:+CMSScavengeBeforeRemark -Dnetworkaddress.cache.ttl=60 dz-content-cache.jar --spring.profiles.active=online