背景
我们平台系统的微信支付突然不可用,用户点击支付都提示错误“系统繁忙”。
排查
查看日志,发现“支付聚合服务”调用“微信支付服务”的http请求返回read timeout,问题很显然出在“微信支付服务”。http请求报read timeout,说明能建立connection,应用没有死亡,只是响应慢。
一个应用响应慢,要么是请求流量大被“压死”,要么是依赖组件慢被“拖死”。
通过日志量分析,并没有突发的流量,那只有可能是被“拖死”了。
被“拖死”的情况,应用web容器线程会表现出所有线程都阻塞在某个操作。
我们马上通过jstack命令dump出应用的线程栈信息,发现一个问题:所有的web容器线程都阻塞在com.wechat.pay.contrib.apache.httpclient.cert.CertificatesManager.putMerchant方法
"XNIO-1 task-8" #298 prio=5 os_prio=0 tid=0x00007f33b0072800 nid=0x12b waiting for monitor entry [0x00007f342051a000]java.lang.Thread.State: BLOCKED (on object monitor)at com.wechat.pay.contrib.apache.httpclient.cert.CertificatesManager.putMerchant(CertificatesManager.java:142)- waiting to lock <0x00000000da83be48> (a com.wechat.pay.contrib.apache.httpclient.cert.CertificatesManager)
原因分析
什么原因引起阻塞?
com.wechat.pay.contrib.apache.httpclient.cert.CertificatesManager属于wechatpay-apache-httpclient包,是微信支付开源的官方依赖包,项目地址:https://github.com/wechatpay-apiv3/wechatpay-apache-httpclient
CertificatesManager.putMerchant是向微信请求证书,通过查看源码,发现CertificatesManager是个单例,putMerchant是synchronized同步方法,内部最终通过httpclient发起http请求从微信支付平台拉取证书。这个http请求没有设置超时时间,默认不超时,如果微信提供证书的服务稍微抖动不响应一下,这里就会阻塞住。
代码如下,
/*** 增加需要自动更新平台证书的商户信息** @param merchantId 商户号* @param credentials 认证器* @param apiV3Key APIv3密钥* @throws IOException IO错误* @throws GeneralSecurityException 通用安全错误* @throws HttpCodeException HttpCode错误*/
public synchronized void putMerchant(String merchantId, Credentials credentials, byte[] apiV3Key)throws IOException, GeneralSecurityException, HttpCodeException {......initCertificates(merchantId, credentials, apiV3Key);......
}
/*** 下载和更新平台证书** @param merchantId 商户号* @param verifier 验签器* @param credentials 认证器* @param apiV3Key apiv3密钥* @throws HttpCodeException Http返回码异常* @throws IOException IO异常* @throws GeneralSecurityException 通用安全性异常*/
private synchronized void downloadAndUpdateCert(String merchantId, Verifier verifier, Credentials credentials,byte[] apiV3Key) throws HttpCodeException, IOException, GeneralSecurityException {try (CloseableHttpClient httpClient = WechatPayHttpClientBuilder.create().withCredentials(credentials).withValidator(verifier == null ? (response) -> true: new WechatPay2Validator(verifier)).withProxy(proxy).build()) {HttpGet httpGet = new HttpGet(CERT_DOWNLOAD_PATH);httpGet.addHeader(ACCEPT, APPLICATION_JSON.toString());try (CloseableHttpResponse response = httpClient.execute(httpGet)) {int statusCode = response.getStatusLine().getStatusCode();String body = EntityUtils.toString(response.getEntity());if (statusCode == SC_OK) {Map<BigInteger, X509Certificate> newCertList = CertSerializeUtil.deserializeToCerts(apiV3Key, body);if (newCertList.isEmpty()) {log.warn("Cert list is empty");return;}ConcurrentHashMap<BigInteger, X509Certificate> merchantCertificates = certificates.get(merchantId);merchantCertificates.clear();merchantCertificates.putAll(newCertList);} else {log.error("Auto update cert failed, statusCode = {}, body = {}", statusCode, body);throw new HttpCodeException("下载平台证书返回状态码异常,状态码为:" + statusCode);}}}
}
为什么会所有线程都阻塞?
原因是我们使用CertificatesManager.putMerchant的用法错误。
我们的代码直接抄了wechatpay-apache-httpclient包样例代码,其实是每次支付请求,都向微信获取了一次证书。在微信支付平台证书服务抖动的情况下,只要同时有足够的支付请求,就会把“微信支付服务”所有容器线程给阻塞住。微信样例代码如下图,我们的代码如下,
public static Verifier createVerifier(WechatPayMerchant wechatPayMerchant) {Objects.requireNonNull(wechatPayMerchant, "商户配置不能为空");try {PrivateKey merchantPrivateKey = PemUtil.loadPrivateKey(new ByteArrayInputStream(wechatPayMerchant.getMerchantPrivateKey().getBytes(StandardCharsets.UTF_8)));// 获取证书管理器实例CertificatesManager certificatesManager = CertificatesManager.getInstance();// 向证书管理器增加需要自动更新平台证书的商户信息certificatesManager.putMerchant(wechatPayMerchant.getPayUsedMchId(), new WechatPay2Credentials(wechatPayMerchant.getPayUsedMchId(),new PrivateKeySigner(wechatPayMerchant.getMerchantSerialNumber(), merchantPrivateKey)), wechatPayMerchant.getApiV3Key().getBytes(StandardCharsets.UTF_8));Verifier verifier = certificatesManager.getVerifier(wechatPayMerchant.getPayUsedMchId());return verifier;} catch (Exception e) {log.error("createVerifier报错", e);throw new ServiceException("创建WechatPay Verifier出错");}
}
参考
issue链接
优化
调整代码,利用CertificatesManager的缓存和自动更新策略,只在第一次加载证书,之后依赖CertificatesManager每24小时的自动更新机制。
调整后代码如下,
public static Verifier createVerifier(WechatPayMerchant wechatPayMerchant) {Objects.requireNonNull(wechatPayMerchant, "商户配置不能为空");try {// 获取证书管理器实例CertificatesManager certificatesManager = CertificatesManager.getInstance();try{//先从缓存找证书Verifier verifier = certificatesManager.getVerifier(wechatPayMerchant.getPayUsedMchId());log.debug("从缓存获取证书:{}", wechatPayMerchant.getPayUsedMchId());return verifier;}catch (Exception e){log.warn("获取证书报错:{}, {}", wechatPayMerchant.getPayUsedMchId(), e.getMessage());if(e instanceof NotFoundException){// 证书不存在PrivateKey merchantPrivateKey = PemUtil.loadPrivateKey(new ByteArrayInputStream(wechatPayMerchant.getMerchantPrivateKey().getBytes(StandardCharsets.UTF_8)));//向证书管理器增加需要自动更新平台证书的商户信息certificatesManager.putMerchant(wechatPayMerchant.getPayUsedMchId(), new WechatPay2Credentials(wechatPayMerchant.getPayUsedMchId(),new PrivateKeySigner(wechatPayMerchant.getMerchantSerialNumber(), merchantPrivateKey)), wechatPayMerchant.getApiV3Key().getBytes(StandardCharsets.UTF_8));Verifier verifier = certificatesManager.getVerifier(wechatPayMerchant.getPayUsedMchId());log.info("实时获取一次证书:{}", wechatPayMerchant.getPayUsedMchId());return verifier;}else{throw e;}}} catch (Exception e) {log.error("createVerifier报错", e);throw new ServiceException("创建WechatPay Verifier出错");}
}