tidb节点重启后,服务无法重连

大家好,我是烤鸭:

    前几天遇到tidb节点重启后服务无法重连,确切地说是两个服务,一个可以正常重连,一个不行。

问题复现

由于线上执行慢SQL,导致TiDB 单个节点宕机重启。

其中A服务的3个节点和B服务的1个节点开始报错,重启后B服务恢复,A服务还在报错。

使用的组件版本:

A服务:tomcat-jdbc-9.0.71.jar
B服务:HikariCP-4.0.3.jar

数据库配置:

A服务:

spring.xxx-tidb-data-source.url=jdbc:mysql://xxx.xxx.xxx:6000/xxxA?useUnicode=true&characterEncoding=utf8&zeroDateTimeBehavior=convertToNull&useSSL=false&serverTimezone=GMT%2B8

B服务:

spring.datasource.url=jdbc:mysql://xxx.xxx.xxx:6000/xxxB?useSSL=false&autoReconnect=true

可以看到两边的配置和连接池不同。

报错日志

无法重连服务的报错日志:

Caused by: com.mysql.cj.exceptions.ConnectionIsClosedException: No operations allowed after connection closed.at sun.reflect.GeneratedConstructorAccessor139.newInstance(Unknown Source)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:423)at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:61)at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:105)at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:151)at com.mysql.cj.NativeSession.checkClosed(NativeSession.java:762)at com.mysql.cj.jdbc.ConnectionImpl.checkClosed(ConnectionImpl.java:569)at com.mysql.cj.jdbc.ConnectionImpl.prepareStatement$original$McwhO8C0(ConnectionImpl.java:1580)... 125 common frames omitted
Caused by: com.mysql.cj.exceptions.CJCommunicationsException: The last packet successfully received from the server was 259,130,913 milliseconds ago. The last packet sent successfully to the server was 259,130,914 milliseconds ago.may or may not be greater than the server-side timeout (the driver was unable to determine the value of either the 'wait_timeout' or 'interactive_timeout' configuration values from the server.. You should consider either expiring and/or testing connection validity before use in your application, increasing the server configured values for client timeouts, or using the Connector/J connection property 'autoReconnect=true' to avoid this problem.at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:423)at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:61)at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:105)at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:151)at com.mysql.cj.exceptions.ExceptionFactory.createCommunicationsException(ExceptionFactory.java:167)at com.mysql.cj.protocol.a.NativeProtocol.readMessage(NativeProtocol.java:520)at com.mysql.cj.protocol.a.NativeProtocol.checkErrorMessage(NativeProtocol.java:700)at com.mysql.cj.protocol.a.NativeProtocol.sendCommand(NativeProtocol.java:639)at com.mysql.cj.protocol.a.NativeProtocol.sendQueryPacket(NativeProtocol.java:987)at com.mysql.cj.NativeSession.execSQL(NativeSession.java:666)at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:930)at com.mysql.cj.jdbc.ClientPreparedStatement.execute$original$mDKgOeFa(ClientPreparedStatement.java:371)at com.mysql.cj.jdbc.ClientPreparedStatement.execute$original$mDKgOeFa$accessor$UdYpLG4x(ClientPreparedStatement.java)at com.mysql.cj.jdbc.ClientPreparedStatement$auxiliary$BAL6Kq1Z.call(Unknown Source)at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInter.intercept(InstMethodsInter.java:86)at com.mysql.cj.jdbc.ClientPreparedStatement.execute(ClientPreparedStatement.java)at sun.reflect.GeneratedMethodAccessor261.invoke(Unknown Source)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.tomcat.jdbc.pool.StatementFacade$StatementProxy.invoke(StatementFacade.java:118)

问题猜想

找到变量了,就顺着猜测查一下。

  • 数据库配置参数 autoReconnect=true
  • 连接池 tomcat-jdbc 和 HikariCP

报错日志都来源于 CJCommunicationsException 这个类,其实调用的是 ExceptionFactory.createLinkFailureMessageBasedOnHeuristics,报错日志来源于配置文件。

我保留了一部分一会可能用到的变量

LocalizedErrorMessages.properties

CommunicationsException.2=\ is longer than the server configured value of 
CommunicationsException.3=''wait_timeout''
CommunicationsException.4=''interactive_timeout''
CommunicationsException.5=may or may not be greater than the server-side timeout 
CommunicationsException.6=(the driver was unable to determine the value of either the 
CommunicationsException.7=''wait_timeout'' or ''interactive_timeout'' configuration values from 
CommunicationsException.8=the server.
CommunicationsException.11=. You should consider either expiring and/or testing connection validity 
CommunicationsException.12=before use in your application, increasing the server configured values for client timeouts, 
CommunicationsException.13=or using the Connector/J connection property ''autoReconnect=true'' to avoid this problem.
CommunicationsException.20=Communications link failure...Connection.0=Unable to connect to database.
Connection.1=Cannot connect to MySQL server on {0}:{1}.\n\nMake sure that there is a MySQL server running on the machine/port you are trying to connect to and that the machine this software is running on is able to connect to this host/port (i.e. not firewalled). Also make sure that the server has not been started with the --skip-networking flag.\n\n
Connection.2=No operations allowed after connection closed.
Connection.3=Can''t call commit when autocommit=true
Connection.4=Communications link failure during commit(). Transaction resolution unknown.
Connection.5=Unknown Java encoding for the character set with index ''{0}''. Use the ''customCharsetMapping'' property to force it.
Connection.6=Unknown character set index ''{0}'' received from server. The appropriate client character set can be forced via the ''characterEncoding'' property.
Connection.7=Can''t map {0} given for characterSetResults to a supported MySQL encoding.
Connection.8=Unable to use encoding: {0}

看看源码

从报错日志入手,根据链接状态和上次发送、接收时间判断,拼一些异常原因。

可以看出报错日志有几个组成部分:

1、上次发送packet的时间距离现在超过8小时,getLastPacketSentTime > 28800秒(8小时)

2、当前链接已关闭:No operations allowed after connection closed

ExceptionFactory

public static String createLinkFailureMessageBasedOnHeuristics(PropertySet propertySet, ServerSession serverSession,PacketSentTimeHolder packetSentTimeHolder, PacketReceivedTimeHolder packetReceivedTimeHolder, Throwable underlyingException) {long serverTimeoutSeconds = 0;boolean isInteractiveClient = false;long lastPacketReceivedTimeMs = packetReceivedTimeHolder == null ? 0L : packetReceivedTimeHolder.getLastPacketReceivedTime();long lastPacketSentTimeMs = packetSentTimeHolder.getLastPacketSentTime();if (lastPacketSentTimeMs > lastPacketReceivedTimeMs) {lastPacketSentTimeMs = packetSentTimeHolder.getPreviousPacketSentTime();}if (propertySet != null) {isInteractiveClient = propertySet.getBooleanProperty(PropertyKey.interactiveClient).getValue();String serverTimeoutSecondsStr = null;if (serverSession != null) {serverTimeoutSecondsStr = isInteractiveClient ? serverSession.getServerVariable("interactive_timeout"): serverSession.getServerVariable("wait_timeout");}if (serverTimeoutSecondsStr != null) {try {serverTimeoutSeconds = Long.parseLong(serverTimeoutSecondsStr);} catch (NumberFormatException nfe) {serverTimeoutSeconds = 0;}}}StringBuilder exceptionMessageBuf = new StringBuilder();long nowMs = System.currentTimeMillis();if (lastPacketSentTimeMs == 0) {lastPacketSentTimeMs = nowMs;}long timeSinceLastPacketSentMs = (nowMs - lastPacketSentTimeMs);long timeSinceLastPacketSeconds = timeSinceLastPacketSentMs / 1000;long timeSinceLastPacketReceivedMs = (nowMs - lastPacketReceivedTimeMs);int dueToTimeout = DUE_TO_TIMEOUT_FALSE;StringBuilder timeoutMessageBuf = null;if (serverTimeoutSeconds != 0) {if (timeSinceLastPacketSeconds > serverTimeoutSeconds) {dueToTimeout = DUE_TO_TIMEOUT_TRUE;timeoutMessageBuf = new StringBuilder();timeoutMessageBuf.append(Messages.getString("CommunicationsException.2"));timeoutMessageBuf.append(Messages.getString(isInteractiveClient ? "CommunicationsException.4" : "CommunicationsException.3"));}// 上次发送成功的时间距离现在超过8小时} else if (timeSinceLastPacketSeconds > DEFAULT_WAIT_TIMEOUT_SECONDS) {dueToTimeout = DUE_TO_TIMEOUT_MAYBE;timeoutMessageBuf = new StringBuilder();timeoutMessageBuf.append(Messages.getString("CommunicationsException.5"));timeoutMessageBuf.append(Messages.getString("CommunicationsException.6"));timeoutMessageBuf.append(Messages.getString("CommunicationsException.7"));timeoutMessageBuf.append(Messages.getString("CommunicationsException.8"));}if (dueToTimeout == DUE_TO_TIMEOUT_TRUE || dueToTimeout == DUE_TO_TIMEOUT_MAYBE) {exceptionMessageBuf.append(lastPacketReceivedTimeMs != 0? Messages.getString("CommunicationsException.ServerPacketTimingInfo",new Object[] { Long.valueOf(timeSinceLastPacketReceivedMs), Long.valueOf(timeSinceLastPacketSentMs) }): Messages.getString("CommunicationsException.ServerPacketTimingInfoNoRecv", new Object[] { Long.valueOf(timeSinceLastPacketSentMs) }));if (timeoutMessageBuf != null) {exceptionMessageBuf.append(timeoutMessageBuf);}exceptionMessageBuf.append(Messages.getString("CommunicationsException.11"));exceptionMessageBuf.append(Messages.getString("CommunicationsException.12"));exceptionMessageBuf.append(Messages.getString("CommunicationsException.13"));} else {//// Attempt to determine the reason for the underlying exception (we can only make a best-guess here)//if (underlyingException instanceof BindException) {String localSocketAddress = propertySet.getStringProperty(PropertyKey.localSocketAddress).getValue();boolean interfaceNotAvaliable;try {interfaceNotAvaliable = localSocketAddress != null && NetworkInterface.getByName(localSocketAddress) == null;} catch (SocketException e1) {interfaceNotAvaliable = false;}exceptionMessageBuf.append(interfaceNotAvaliable ? Messages.getString("CommunicationsException.LocalSocketAddressNotAvailable"): Messages.getString("CommunicationsException.TooManyClientConnections"));}}if (exceptionMessageBuf.length() == 0) {// We haven't figured out a good reason, so copy it.exceptionMessageBuf.append(Messages.getString("CommunicationsException.20"));if (propertySet.getBooleanProperty(PropertyKey.maintainTimeStats).getValue() && !propertySet.getBooleanProperty(PropertyKey.paranoid).getValue()) {exceptionMessageBuf.append("\n\n");exceptionMessageBuf.append(lastPacketReceivedTimeMs != 0? Messages.getString("CommunicationsException.ServerPacketTimingInfo",new Object[] { Long.valueOf(timeSinceLastPacketReceivedMs), Long.valueOf(timeSinceLastPacketSentMs) }): Messages.getString("CommunicationsException.ServerPacketTimingInfoNoRecv", new Object[] { Long.valueOf(timeSinceLastPacketSentMs) }));}}return exceptionMessageBuf.toString();}

再根据上边的两个猜测,找一下对应的源码。

猜想1:autoReconnect 这个参数是有用的

网上很多说这个参数只针对mysql5以下,这个倒是没找到,官网关于这个参数的说明。
https://dev.mysql.com/doc/connectors/en/connector-j-connp-props-high-availability-and-clustering.html#cj-conn-prop_autoReconnect

如果链接断开的情况下,配置了这个参数可以发起重试。如果不配置的话,就会一直报错。

在这里插入图片描述

猜想2:HikariCP 会自动重连,tomcat-jdbc不会

可以看到 tomcat-jdbc 执行 invoke,getConnetion一气呵成,在默认配置下,也没有任何校验,如果因为第三方导致的链接断开,就麻烦了。

public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {if (compare(ISCLOSED_VAL,method)) {return Boolean.valueOf(isClosed());}if (compare(CLOSE_VAL,method)) {if (connection==null){return null; //noop for already closed.}PooledConnection poolc = this.connection;this.connection = null;pool.returnConnection(poolc);return null;} else if (compare(TOSTRING_VAL,method)) {return this.toString();} else if (compare(GETCONNECTION_VAL,method) && connection!=null) {return connection.getConnection();} else if (method.getDeclaringClass().isAssignableFrom(XAConnection.class)) {try {return method.invoke(connection.getXAConnection(),args);}catch (Throwable t) {if (t instanceof InvocationTargetException) {throw t.getCause() != null ? t.getCause() : t;} else {throw t;}}}if (isClosed()) {throw new SQLException("Connection has already been closed.");}if (compare(UNWRAP_VAL,method)) {return unwrap((Class<?>)args[0]);} else if (compare(ISWRAPPERFOR_VAL,method)) {return Boolean.valueOf(this.isWrapperFor((Class<?>)args[0]));}try {PooledConnection poolc = connection;if (poolc!=null) {return method.invoke(poolc.getConnection(),args);} else {throw new SQLException("Connection has already been closed.");}}catch (Throwable t) {if (t instanceof InvocationTargetException) {throw t.getCause() != null ? t.getCause() : t;} else {throw t;}}}

在这里插入图片描述

HikariCP

HikarPool.getConnection 会校验当前是否存活,如果不存活,会关闭当前链接,并会重新创建连接。

在这里插入图片描述

问题解决

当天的解决方案比较简单,重启服务就行。不过如果想彻底解决这个问题的话。

  • 升级连接池,tomcat-jdbc 升级 druid 或者 HikariCP ,springboot 2.0以上默认是 HikariCP

  • 修改配置,如果还使用 tomcat-jdbc :

    增加以下配置:

    # 连接存活检查
    spring.datasource.tomcat.validation-query='SELECT 1'
    # 指定多少ms执行一次连接校验.	
    spring.datasource.tomcat.validation-interval=3000
    # 指定连接校验查询的超时时间
    spring.datasource.tomcat.validation-query-timeout=-1
    
    或者连接地址增加 autoReconnect=true
    

总结

有些问题不经常遇到,即便是预发1比1的环境可能也没办法避免。除了在事故处理的时候可以完善流程,就只能尽量保持架构统一,尤其是老项目。业务侧增加全方面中间件的监控,尤其是存活检测这种。坑是无穷无尽的,踩完了记得遇到相同的跳过去。

相关文章

https://zhuanlan.zhihu.com/p/361130333?utm_id=0

https://blog.csdn.net/zhangxin09/article/details/124901850

https://blog.csdn.net/u014398624/article/details/47746473

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/442631.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

STM32——USART

一、通信 1.1通信是什么&#xff1b; 通信是将一个设备的数据发送到另一个设备中&#xff0c;从而实现硬件的扩展&#xff1b; 1.2通信的目的是什么&#xff1b; 实现硬件的扩展-在STM32中集成了很多功能&#xff0c;例如PWM输出&#xff0c;AD采集&#xff0c;定时器等&am…

springboot142新冠病毒密接者跟踪系统

新冠病毒密接者跟踪系统设计与实现 摘 要 信息数据从传统到当代&#xff0c;是一直在变革当中&#xff0c;突如其来的互联网让传统的信息管理看到了革命性的曙光&#xff0c;因为传统信息管理从时效性&#xff0c;还是安全性&#xff0c;还是可操作性等各个方面来讲&#xff…

【css】设置渐变阴影

css的属性中没有直接设置渐变阴影的&#xff0c;但是可以通过伪元素去实现。 .box-wrap{width: 100%;display: grid;place-content: center; } .box {width: 150px;height: 150px;background: #eee;border: 1px solid #585252;position: relative;transform: translate(0);/* …

Observability:在 Elastic Stack 8.12 中使用 Elastic Agent 性能预设

作者&#xff1a;来自 Elastic Nima Rezainia, Bill Easton 8.12 中 Elastic Agent 性能有了重大改进 最新版本 8.12 标志着 Elastic Agent 和 Beats 调整方面的重大转变。 在此更新中&#xff0c;Elastic 引入了 Performance Presets&#xff0c;旨在简化用户的调整过程并增强…

MATLAB - 仿真单摆的周期性摆动

系列文章目录 前言 本例演示如何使用 Symbolic Math Toolbox™ 模拟单摆的运动。推导摆的运动方程&#xff0c;然后对小角度进行分析求解&#xff0c;对任意角度进行数值求解。 一、步骤 1&#xff1a;推导运动方程 摆是一个遵循微分方程的简单机械系统。摆最初静止在垂直位置…

力扣136、只出现一次的数字(简单)

1 题目描述 图1 题目描述 2 题目解读 在非空整数数组nums中&#xff0c;有一个元素只出现了一次&#xff0c;其余元素均出现两次。要求找出那个只出现一次的元素。 3 解法一&#xff1a;位运算 位运算&#xff0c;是一种非常简便的方法。 3.1 解题思路 异或运算&#xff0c;有…

《PCI Express体系结构导读》随记 —— 第II篇 第4章 PCIe总线概述(3)

接前一篇文章&#xff1a;《PCI Express体系结构导读》随记 —— 第II篇 第4章 PCIe总线概述&#xff08;2&#xff09; 4.1 PCIe总线的基础知识 与PCI总线不同&#xff0c;PCIe总线使用端到端的连接方式&#xff0c;在一条PCIe链路的两端只能各连接一个设备&#xff0c;这两个…

如何开立一个融资融券账户?

第一&#xff1a;什么是融资融券&#xff1f;融资融券交易&#xff0c;又称“证券信用交易”&#xff0c;是指投资者通过向具有融资融券业务资格的证券公司提供担保物&#xff0c;借入资金买入证券&#xff08;融资交易&#xff09;或借入证券并卖出&#xff08;融券交易&#…

【LeetCode: 25. K 个一组翻转链表 + 链表 + 递归】

&#x1f680; 算法题 &#x1f680; &#x1f332; 算法刷题专栏 | 面试必备算法 | 面试高频算法 &#x1f340; &#x1f332; 越难的东西,越要努力坚持&#xff0c;因为它具有很高的价值&#xff0c;算法就是这样✨ &#x1f332; 作者简介&#xff1a;硕风和炜&#xff0c;…

盲盒小程序开发,实现“双收益”

盲盒在我国是一个发展潜力较高的市场&#xff0c;盲盒具有的刺激和收藏价值&#xff0c;深受消费者的喜爱&#xff0c;盲盒的“隐藏款”机制&#xff0c;能够为消费者带来惊喜感。盲盒一般与影视动漫IP合作&#xff0c;推出盲盒商品&#xff0c;这也是深受年轻人追捧的一大特点…

Linux 系统服务

一、CentOS 6 与 CentOS 7开机流程 1.CentOS 6 1.1.打开电源首先通过内核引导开机。 1.2.开机自检&#xff0c;加载 BIOS 硬件信息。 1.3.MBR 记录一般是在磁盘0磁道0扇区&#xff0c;共512个字节。前446个字节是 BootLoder。计算机不知道我们的系统在哪里&#xff0c;所以需…

医用机器人市场调研:预计2029年将达到37亿美元

我国医护人员缺口大、居民医疗资源的需求持续增加&#xff0c;医用机器人的普及是良好的解决方案&#xff0c;换而言之&#xff0c;我国医疗资源紧缺问题为医用机器人市场孕育了增长空间。 医用机器人&#xff0c;是指用于医院、诊所的医疗或辅助医疗的机器人。是一种智能型…