online环境发现有个task的微服务白天使用高峰期一直cpu非常高98%-99%
用 如下命令在cpu非常高的时候生成了多个dump文件
jstack <PID> > thread_dump1.txt
jstack <PID> > thread_dump2.txt
jstack <PID> > thread_dump3.txt
jstack <PID> > thread_dump4.txt
jstack <PID> > thread_dump5.txt
jstack <PID> > thread_dump6.txt
在thread_dump文件中发现很多如下的BLOCKED 的地方大概有100多个
"Thread-509207" #544421 prio=5 os_prio=0 cpu=0.10ms elapsed=712.35s tid=0x00007f2fc85419a0 nid=0xf0335 waiting for monitor entry [0x00007f2f2c1c0000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.hp.nova.service.impl.DataFromRedis.DataFromRedisImpl$ThreadTask3ReportInstanceFromRedis.run(DataFromRedisImpl.java:1239)
- waiting to lock <0x0000000550e06cb0> (a java.lang.String)
"Thread-509214" #544428 prio=5 os_prio=0 cpu=0.12ms elapsed=706.60s tid=0x00007f2fc853e190 nid=0xf033d waiting for monitor entry [0x00007f2f2eae9000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.hp.nova.service.impl.DataFromRedis.DataFromRedisImpl$ThreadTask3ReportInstanceFromRedis.run(DataFromRedisImpl.java:1239)
- waiting to lock <0x0000000550e06cb0> (a java.lang.String)
"Thread-509221" #544435 prio=5 os_prio=0 cpu=0.11ms elapsed=704.43s tid=0x00007f2fc85385f0 nid=0xf0344 waiting for monitor entry [0x00007f2f2d6d5000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.hp.nova.service.impl.DataFromRedis.DataFromRedisImpl$ThreadTask3ReportInstanceFromRedis.run(DataFromRedisImpl.java:1239)
- waiting to lock <0x0000000550e06cb0> (a java.lang.String)
"Thread-509230" #544445 prio=5 os_prio=0 cpu=0.10ms elapsed=701.86s tid=0x00007f2fe4416810 nid=0xf034e waiting for monitor entry [0x00007f2f2acab000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.hp.nova.service.impl.DataFromRedis.DataFromRedisImpl$ThreadTask3ReportInstanceFromRedis.run(DataFromRedisImpl.java:1239)
- waiting to lock <0x0000000550e06cb0> (a java.lang.String)
"Thread-509252" #544467 prio=5 os_prio=0 cpu=0.10ms elapsed=678.16s tid=0x00007f2fc85396e0 nid=0xf0364 waiting for monitor entry [0x00007f2f2c7c6000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.hp.nova.service.impl.DataFromRedis.DataFromRedisImpl$ThreadTask3ReportInstanceFromRedis.run(DataFromRedisImpl.java:1239)
- waiting to lock <0x0000000550e06cb0> (a java.lang.String)
然后发现DataFromRedisImpl.java:1239
这里出现了锁竞争
private class ThreadTask3ReportInstanceFromRedis extends Thread {
private String taskcode;
private Integer taskversion;
private String rediskey;
public ThreadTask3ReportInstanceFromRedis(String taskcode ,Integer taskversion)
{
this.taskcode = taskcode;
this.taskversion=taskversion;
this.rediskey=reportinstanceredispre+taskcode+"."+taskversion;
}
public void run() {
synchronized(reportinstanceredispre){
try {
getTask3ReportInstanceFromRedis(taskcode, taskversion);
} catch (Exception e) {
// TODO Auto-generated catch block
logger.error("error:{}", ExceptionUtil.stack(e));
}
}
}
}
然后改成如下方式,cpu明显下降很多,root cause是之前的代码可能是误写锁住了一个常量字符串,相当于整个method都锁了,改成下面的方式根据taskcode 和version减少了锁的粒度
synchronized(reportinstanceredispre)改为 synchronized(lock)
private static final Map<String, Object> locks = new ConcurrentHashMap<>();
private class ThreadTask3ReportInstanceFromRedis extends Thread {
private String taskcode;
private Integer taskversion;
private String rediskey;
public ThreadTask3ReportInstanceFromRedis(String taskcode ,Integer taskversion)
{
this.taskcode = taskcode;
this.taskversion=taskversion;
this.rediskey=reportinstanceredispre+taskcode+"."+taskversion;
}
public void run() {
Object lock = locks.computeIfAbsent(rediskey, key -> new Object());
synchronized(lock){
try {
logger.info("rediskey======{}",rediskey);
getTask3ReportInstanceFromRedis(taskcode, taskversion);
} catch (Exception e) {
// TODO Auto-generated catch block
logger.error("error:{}", ExceptionUtil.stack(e));
}
finally {
// 清理锁,避免内存泄漏
locks.remove(rediskey); // Clean up lock if no longer needed
}
}
}
}