重试机制与 CompletableFuture 拓展
禁止转载。
本文旨在讨论重试机制的特点和策略,分析常用重试类库的实现,讨论为 CompletableFuture 添加重试机制的方法。文章首发同名公众号,欢迎关注。
重试示例
以下是一个常见的使用异步重试的例子,当我们需要重试功能时,只需调用 retry 方法,传入相应的重试策略即可。这里的重试策略为重试 2 次,使用回退策略(backoff),重试间隔为 100ms,抖动因子为 0.75,同时指定了调度器。
// Project Reactor 提供的重试方法
public Mono<String> getUsername(String userId) {// backoff 重试策略var backoffRetry = Retry.backoff(2, Duration.ofMillis(100)).jitter(0.75).scheduler(Schedulers.newSingle("retry scheduler"));return webClient.get().uri("localhost:8080/user/", userId).accept(MediaType.APPLICATION_JSON).retrieve().bodyToMono(String.class)// 若为简单重试可改为调用 retry(n).retryWhen(backoffRetry);
}
以下图片摘自 Mono#retryWhen 文档注释:
Project Reactor 是基于发布-订阅模型的响应式组件。从图中可以看出,每次获取数据失败后,会等待一段时间,然后再次订阅发布者以获取数据,反复以上过程直到达到最终重试次数或者出现成功结果。
Spring Retry 类库提供了重试模版:
RetryTemplate template = RetryTemplate.builder().maxAttempts(3).fixedBackoff(1000).retryOn(RemoteAccessException.class).build();
// 重试
template.execute(ctx -> {// ... do something
});
重试模版需要传入任务,而 Project Reactor 中发布者-订阅者两者解耦,可以实现多次订阅,因此不影响链式调用。
若想为 CompletableFuture 增加重试功能,最好是使用类似 Spring-Retry 的模式,添加工具类方法 retry,参数包括任务、重试策略等。
重试策略
-
触发重试策略
特定异常(如支持黑白名单)、特定返回值、自定义
-
等待策略(backoff 算法)
无等待、固定时间(fixed)、等量增长时间(incremental)、指数增长时间(exponentail backoff)、随机时间(random)、斐波那契数列(fibnonacci) 、自定义
-
终止策略
尝试次数(maxAttempts)、超时终止、自定义
重试策略应该注意区分有状态重试和无状态重试:
有状态重试表示各个重试之间存在相互依赖,比如
-
每次访问网站信息时,返回错误信息包含了下一次可以正常访问的时间
-
输入密码多次错误后,需要等待若干时间再重试
-
共用相同的限流组件;
无状态重试表示每次重试不依赖其他重试的结果,实现容易,某些复杂的有状态重试可以使用无状态重试实现。
重试上下文信息
常见的重试上下文有:重试次数、每次返回结果、日志记录、回调。
回调方法包括每次返回结果时回调、最终返回结果时回调。
简易实现代码
手动实现最简单的方法是调用 exceptionally 或者 exceptionallyCompose 方法,多次传入重试任务。
1. 迭代实现 N 次重试
以下代码使用了迭代法,缺点是造成 CompletableFuture 内部维护的 stack 过深,增加不必要的内存开销;无法实现无限次重试。
public static <T> CompletableFuture<T> retry(Supplier<T> supplier, int attempts) {var cf = supplyAsync(supplier);for (int i = 0; i < attempts; i++) {cf = cf.exceptionally(ex -> supplier.get());}return cf;
}
2. 递归实现 N 次重试
使用递归解决了以上问题:
@Slf4j
class RetryNAttemptsDemo {// 演示用,忽略线程池配置public static void main(String[] args) {// 任务3次重试后返回正确结果var times = new AtomicInteger();Supplier<Integer> task = () -> {if (times.getAndIncrement() < 3) {throw new RuntimeException("异常结果");} else {return 42;}};// 使用重试retry(4, () -> supplyAsync(task)).thenAcceptAsync(r -> log.info("获取结果: {}", r)).whenComplete((__, ex) -> log.error("最终获取结果异常", ex)).join();}public static <T> CompletableFuture<T> retry(int attempts, Supplier<CompletionStage<T>> supplier) {// 使用CompletableFuture的写功能var result = new CompletableFuture<T>();retryNAttempts(result, attempts, supplier);return result;}private static <T> void retryNAttempts(CompletableFuture<T> result, int attempts, Supplier<CompletionStage<T>> supplier) {supplier.get().thenAccept(result::complete).whenComplete((__, throwable) -> {if (attempts > 0L) {log.warn("异常重试");retryNAttempts(result, attempts - 1, supplier);} else {log.error("多次重试异常结果", throwable);result.completeExceptionally(throwable);}});}
}
执行结果如下,符合预期。
> Task :RetryNAttemptsDemo.main()
23:18:32.042 [main] WARN com.example.demo.futures.RetryNAttemptsDemo -- 异常重试
23:18:32.043 [main] WARN com.example.demo.futures.RetryNAttemptsDemo -- 异常重试
23:18:32.044 [main] WARN com.example.demo.futures.RetryNAttemptsDemo -- 异常重试
23:18:32.044 [ForkJoinPool.commonPool-worker-1] INFO com.example.demo.futures.RetryNAttemptsDemo -- 获取结果: 42
3. 递归实现 backoff
思路:
- 正常结果和异常结果分别处理,若有最终结果则记录到 result
- 处理结果为重试等待时间
- 执行重试(使用 ScheduledExecutorService#schedule)
@Slf4j
class BackoffRetryDemo {public static final long STOP_RETRY = -1L;private final int maxAttempts;private final AtomicInteger attempts = new AtomicInteger();// 延迟时间(ms)private final long delay;BackoffRetryDemo(int maxAttempts, long delay) {this.maxAttempts = maxAttempts;this.delay = delay;}public <T> CompletableFuture<T> retry(Supplier<CompletionStage<T>> stageSupplier, ScheduledExecutorService delayer) {CompletableFuture<T> result = new CompletableFuture<>();retry(stageSupplier, delayer, result);return result;}private <T> void retry(Supplier<CompletionStage<T>> stageSupplier, ScheduledExecutorService delayer, CompletableFuture<T> result) {attempts.incrementAndGet();stageSupplier.get().thenApply(r -> {result.complete(r);return STOP_RETRY;}).exceptionally(throwable -> {if (attempts.get() >= maxAttempts) {result.completeExceptionally(throwable);return STOP_RETRY;}log.warn("异常重试");return delay;}).thenAccept(delay -> {if (delay == 0L)delayer.execute(() -> retry(stageSupplier, delayer, result));else if (delay > 0L)delayer.schedule(() -> retry(stageSupplier, delayer, result), delay, TimeUnit.MILLISECONDS);});}public static void main(String[] args) {var times = new AtomicInteger();Supplier<Integer> task = () -> {if (times.getAndIncrement() < 3) {throw new RuntimeException("异常结果");} else {return 42;}};var backoffRetry = new BackoffRetryDemo(4, 500);backoffRetry.retry(() -> supplyAsync(task), Executors.newSingleThreadScheduledExecutor()).thenAcceptAsync(r -> log.info("获取结果: {}", r)).exceptionallyAsync(throwable -> {log.error("最终获取结果异常", throwable);return null;}).join();}
}
执行日志如下:
> Task :BackoffRetryDemo.main()
23:54:12.099 [main] WARN com.example.demo.futures.BackoffRetryDemo -- 异常重试
23:54:12.610 [pool-1-thread-1] WARN com.example.demo.futures.BackoffRetryDemo -- 异常重试
23:54:13.113 [ForkJoinPool.commonPool-worker-1] WARN com.example.demo.futures.BackoffRetryDemo -- 异常重试
23:54:13.621 [ForkJoinPool.commonPool-worker-1] INFO com.example.demo.futures.BackoffRetryDemo -- 获取结果: 42
从结果可以看出,实现了延迟重试,重试等待时间为 500ms,三次尝试后获取到了正确结果。
不同类库的实现浅析
1. Resiliance4J
将 Retry 视为高阶函数装饰器,可以实现对任意方法的增强,如 Supplier, Consumer, CompletableFuture
CheckedFunction0<String> retryableSupplier = Retry.decorateCheckedSupplier(retry, helloWorldService::sayHelloWorld);
// 线程安全类
public interface Retry {// 装饰器方法,为 supplier 增加可重试功能static <T> Supplier<CompletionStage<T>> decorateCompletionStage(Retry retry,ScheduledExecutorService scheduler,Supplier<CompletionStage<T>> supplier) {return () -> {// 这里使用 final 可能是为了兼容 JDK7final CompletableFuture<T> promise = new CompletableFuture<>();final Runnable block = new AsyncRetryBlock<>(scheduler, retry.asyncContext(), supplier,promise);block.run();return promise;};}// 全局管理 Retry 支持String getName();Map<String, String> getTags();// 上下文支持回调<T> Retry.Context<T> context();<T> Retry.AsyncContext<T> asyncContext();// 重试策略RetryConfig getRetryConfig();// 事件支持EventPublisher getEventPublisher();default <T> CompletionStage<T> executeCompletionStage(ScheduledExecutorService scheduler,Supplier<CompletionStage<T>> supplier) {return decorateCompletionStage(this, scheduler, supplier).get();}// 略去其他执行方法,如 executeSupplier,executeRunnable// 监控信息Metrics getMetrics();interface Metrics {long getNumberOfSuccessfulCallsWithoutRetryAttempt();long getNumberOfFailedCallsWithoutRetryAttempt();long getNumberOfSuccessfulCallsWithRetryAttempt();long getNumberOfFailedCallsWithRetryAttempt();}// 回调支持interface AsyncContext<T> {void onComplete();long onError(Throwable throwable);long onResult(T result);}interface Context<T> {void onComplete();boolean onResult(T result);void onError(Exception exception) throws Exception;void onRuntimeError(RuntimeException runtimeException);}// 事件支持,发布订阅模式,实现回调或者异步的另一种机制,发布者和订阅者(消费者)解耦interface EventPublisher extends io.github.resilience4j.core.EventPublisher<RetryEvent> {EventPublisher onRetry(EventConsumer<RetryOnRetryEvent> eventConsumer);EventPublisher onSuccess(EventConsumer<RetryOnSuccessEvent> eventConsumer);EventPublisher onError(EventConsumer<RetryOnErrorEvent> eventConsumer);EventPublisher onIgnoredError(EventConsumer<RetryOnIgnoredErrorEvent> eventConsumer);}// 这个类不知为何放在接口里面,实际上可以提出来class AsyncRetryBlock<T> implements Runnable {// 下一部分分析}
}
不过异步增强的 CompletableFuture 不支持 Error 类型 fallback,封装了异步执行逻辑,实现逻辑和上一节 backoff 简易实现一致。
class AsyncRetryBlock<T> implements Runnable {private final ScheduledExecutorService scheduler;// 调用其回调方法 onResult, onErrorprivate final Retry.AsyncContext<T> retryContext;private final Supplier<CompletionStage<T>> supplier;// 最终结果,使用 CompletableFuture 的写功能private final CompletableFuture<T> promise;// 略去构造器代码@Overridepublic void run() {final CompletionStage<T> stage = supplier.get();stage.whenComplete((result, throwable) -> {if (throwable != null) {// 支持 Exception 类型 fallbackif (throwable instanceof Exception) {onError((Exception) throwable);} else {promise.completeExceptionally(throwable);}} else {onResult(result);}});}// 重试或结束private void onError(Exception t) {final long delay = retryContext.onError(t);if (delay < 1) {promise.completeExceptionally(t);} else {scheduler.schedule(this, delay, TimeUnit.MILLISECONDS);}}// 重试或结束private void onResult(T result) {final long delay = retryContext.onResult(result);if (delay < 1) {try {retryContext.onComplete();promise.complete(result);} catch (Exception e) {promise.completeExceptionally(e);}} else {scheduler.schedule(this, delay, TimeUnit.MILLISECONDS);}}
}
再来看 Context 的具体实现,总结为以下几点:
- 记录执行统计信息(如 numOfAttempts, lastException, succeededWithoutRetryCounter)
- 发布相关事件(publishRetryEvent)
- 每次执行前后支持回调, 如 consumeResultBeforeRetryAttempt
- 代码执行时调用 RetryConfig 指定的策略(策略模式)
// RetryImpl 的内部类, RetryImpl 持有统计信息相关字段,重试策略相关字段
public final class AsyncContextImpl implements Retry.AsyncContext<T> {private final AtomicInteger numOfAttempts = new AtomicInteger(0);private final AtomicReference<Throwable> lastException = new AtomicReference<>();@Overridepublic long onError(Throwable throwable) {totalAttemptsCounter.increment();// Handle the case if the completable future throw CompletionException wrapping the original exception// where original exception is the one to retry not the CompletionException.// 异常解包if (throwable instanceof CompletionException || throwable instanceof ExecutionException) {Throwable cause = throwable.getCause();return handleThrowable(cause);} else {return handleThrowable(throwable);}}// handleThrowable 和 handleOnError 做了类似的逻辑,从名字上无法区分,还不如直接合并成一个方法private long handleThrowable(Throwable throwable) {// 自定义方法判断是否需要 retry,exceptionPredicate 来自 RetryConfigif (!exceptionPredicate.test(throwable)) {failedWithoutRetryCounter.increment();publishRetryEvent(() -> new RetryOnIgnoredErrorEvent(getName(), throwable));return -1;}return handleOnError(throwable);}private long handleOnError(Throwable throwable) {lastException.set(throwable);int attempt = numOfAttempts.incrementAndGet();if (attempt >= maxAttempts) {failedAfterRetryCounter.increment();publishRetryEvent(() -> new RetryOnErrorEvent(name, attempt, throwable));return -1;}// backoff 策略, 来自 RetryConfiglong interval = intervalBiFunction.apply(attempt, Either.left(throwable));if (interval < 0) {publishRetryEvent(() -> new RetryOnErrorEvent(getName(), attempt, throwable));} else {publishRetryEvent(() -> new RetryOnRetryEvent(getName(), attempt, throwable, interval));}return interval;}// 略去其他方法
}
2. Spring Retry
这里不讨论 AOP 实现的重试增强,仅讨论命令式代码实现。
Spring Retry 实现了有状态的重试,很多方法需要显式传参数 RetryContext,有多种 RetryContext 支持,RetrySynchronizationManager 提供了全局 RetryContext 上下文支持,底层使用 ThreadLocal,提供获取上下文、注册上下文等方法。
任务封装为 RetryCallback,不直接支持 CompletableFuture。
// 封装的重试任务
public interface RetryCallback<T, E extends Throwable> {// 无状态重试不需要使用context/*** Execute an operation with retry semantics.*/T doWithRetry(RetryContext context) throws E;/*** A logical identifier for this callback to distinguish retries around business* operations.*/default String getLabel() {return null;}
}
RetryOperation 定义了重试操作:
public interface RetryOperations {<T, E extends Throwable> T execute(RetryCallback<T, E> retryCallback) throws E;<T, E extends Throwable> T execute(RetryCallback<T, E> retryCallback, RecoveryCallback<T> recoveryCallback) throws E;<T, E extends Throwable> T execute(RetryCallback<T, E> retryCallback, RetryState retryState) throws E, ExhaustedRetryException;<T, E extends Throwable> T execute(RetryCallback<T, E> retryCallback, RecoveryCallback<T> recoveryCallback, RetryState retryState) throws E;
}
回调接口定义了回调操作:
public interface RetryListener {// 开始重试时回调/*** Called before the first attempt in a retry. For instance, implementers can set up* state that is needed by the policies in the {@link RetryOperations}. The whole* retry can be vetoed by returning false from this method, in which case a* {@link TerminatedRetryException} will be thrown.*/default <T, E extends Throwable> boolean open(RetryContext context, RetryCallback<T, E> callback) {return true;}// 结束重试时回调/*** Called after the final attempt (successful or not). Allow the listener to clean up* any resource it is holding before control returns to the retry caller.*/default <T, E extends Throwable> void close(RetryContext context, RetryCallback<T, E> callback,Throwable throwable) {}// 成功时回调/*** Called after a successful attempt; allow the listener to throw a new exception to* cause a retry (according to the retry policy), based on the result returned by the* {@link RetryCallback#doWithRetry(RetryContext)}*/default <T, E extends Throwable> void onSuccess(RetryContext context, RetryCallback<T, E> callback, T result) {}// 失败时回调/*** Called after every unsuccessful attempt at a retry.*/default <T, E extends Throwable> void onError(RetryContext context, RetryCallback<T, E> callback,Throwable throwable) {}}
这里仅讨论第一个 execute 方法的实现:
// 不可变类,线程安全类
public class RetryTemplate implements RetryOperations {// 略去 execute 语义外方法,如对象创建与初始化protected final Log logger = LogFactory.getLog(getClass());private volatile BackOffPolicy backOffPolicy = new NoBackOffPolicy();private volatile RetryPolicy retryPolicy = new SimpleRetryPolicy(3);private volatile RetryListener[] listeners = new RetryListener[0];private RetryContextCache retryContextCache = new MapRetryContextCache();private boolean throwLastExceptionOnExhausted;@Overridepublic final <T, E extends Throwable> T execute(RetryCallback<T, E> retryCallback) throws E {return doExecute(retryCallback, null, null);}// 方法比较长,模版方法模式protected <T, E extends Throwable> T doExecute(RetryCallback<T, E> retryCallback,RecoveryCallback<T> recoveryCallback, RetryState state) throws E, ExhaustedRetryException {RetryPolicy retryPolicy = this.retryPolicy;BackOffPolicy backOffPolicy = this.backOffPolicy;// Allow the retry policy to initialise itself...// 重试过程中,context 不断变化,每次重试需要初始化RetryContext context = open(retryPolicy, state);if (this.logger.isTraceEnabled()) {this.logger.trace("RetryContext retrieved: " + context);}// Make sure the context is available globally for clients who need// it...// 保证重试执行时可以随时获得 context,使用了 ThreadLocal, context 和线程绑定RetrySynchronizationManager.register(context);Throwable lastException = null;boolean exhausted = false;try {// 一些准备工作// 回调,可提前终止重试// Give clients a chance to enhance the context...boolean running = doOpenInterceptors(retryCallback, context);if (!running) {throw new TerminatedRetryException("Retry terminated abnormally by interceptor before first attempt");}// 设置 context 最大重试数if (!context.hasAttribute(RetryContext.MAX_ATTEMPTS)) {context.setAttribute(RetryContext.MAX_ATTEMPTS, retryPolicy.getMaxAttempts());}// Get or Start the backoff context...BackOffContext backOffContext = null;Object resource = context.getAttribute("backOffContext");if (resource instanceof BackOffContext) {backOffContext = (BackOffContext) resource;}if (backOffContext == null) {backOffContext = backOffPolicy.start(context);if (backOffContext != null) {context.setAttribute("backOffContext", backOffContext);}}Object label = retryCallback.getLabel();String labelMessage = (label != null) ? "; for: '" + label + "'" : "";// 准备工作结束,开始执行 retry 核心代码// 循环内部为任务执行的完整 try-catch 过程,基本思想和函数式基于轨道编程(Railway-Oriented-Programming)的 CompletableFuture 不同/** We allow the whole loop to be skipped if the policy or context already* forbid the first try. This is used in the case of external retry to allow a* recovery in handleRetryExhausted without the callback processing (which* would throw an exception).*/while (canRetry(retryPolicy, context) && !context.isExhaustedOnly()) {try {if (this.logger.isDebugEnabled()) {this.logger.debug("Retry: count=" + context.getRetryCount() + labelMessage);}// Reset the last exception, so if we are successful// the close interceptors will not think we failed...lastException = null;// 任务执行T result = retryCallback.doWithRetry(context);// 成功回调doOnSuccessInterceptors(retryCallback, context, result);return result;}catch (Throwable e) {lastException = e;try {// 每次异常回调// 进行的操作一般有:失败次数 + 1, 记录 lastExceptionregisterThrowable(retryPolicy, state, context, e);}catch (Exception ex) {throw new TerminatedRetryException("Could not register throwable", ex);}finally {// RetryListener 失败回调doOnErrorInterceptors(retryCallback, context, e);}// 执行 backoff 策略if (canRetry(retryPolicy, context) && !context.isExhaustedOnly()) {try {backOffPolicy.backOff(backOffContext);}catch (BackOffInterruptedException ex) {// back off was prevented by another thread - fail the retryif (this.logger.isDebugEnabled()) {this.logger.debug("Abort retry because interrupted: count=" + context.getRetryCount()+ labelMessage);}throw ex;}}if (this.logger.isDebugEnabled()) {this.logger.debug("Checking for rethrow: count=" + context.getRetryCount() + labelMessage);}if (shouldRethrow(retryPolicy, context, state)) {if (this.logger.isDebugEnabled()) {this.logger.debug("Rethrow in retry for policy: count=" + context.getRetryCount() + labelMessage);}throw RetryTemplate.<E>wrapIfNecessary(e);}} // while 循环内 try-catch 结束// 仅考虑无状态重试(state is null),可以忽略这段代码/** A stateful attempt that can retry may rethrow the exception before now,* but if we get this far in a stateful retry there's a reason for it,* like a circuit breaker or a rollback classifier.*/if (state != null && context.hasAttribute(GLOBAL_STATE)) {break;}} // while 循环末尾if (state == null && this.logger.isDebugEnabled()) {this.logger.debug("Retry failed last attempt: count=" + context.getRetryCount() + labelMessage);}exhausted = true;return handleRetryExhausted(recoveryCallback, context, state);}catch (Throwable e) {// 重试代码抛出异常,无法处理,rethrowthrow RetryTemplate.<E>wrapIfNecessary(e);}finally {close(retryPolicy, context, state, lastException == null || exhausted);// RetryListener 关闭回调doCloseInterceptors(retryCallback, context, lastException);RetrySynchronizationManager.clear();}}
}
总结一下 Spring-Retry 的特点
-
支持回调(RetryListener) 和有状态上下文(RetryContext、backoffContext、RetryState)
-
缺点:不支持异步 backoff,backoff 在同一线程内。
-
上下文和线程绑定,底层使用 ThreadLocal,代码中会有隐式传参问题。
CompletableFuture 和重试机制有关的特点
-
若想实现特定返回值触发重试策略,CompletableFuture 存在成功运算管道和异常管道,推荐的做法是:thenCompose 转化某些错误值到特定异常,配置特定异常触发重试策略。
-
ComletableFuture 中的结果为异常时,需要解包才能获取真实的代码执行时异常。
-
CompletableFuture 提供了限时获取值方法,可以轻松实现超时终止策略。
-
取消机制,上文中的简易实现没有考虑 retry 方法返回结果被取消的情况,此时运行中的任务应该主动 cancel。
-
可以天然地支持异步重试(重试任务执行不限于同一线程中)
-
在单线程中sleep一段时间,再重试也是一种能接受的解决方案
CFFU
CFFU(CompletableFuture Fu )是一个小小的 CompletableFuture(CF)辅助增强库,提升 CF 使用体验并减少误用,在业务中更方便高效安全地使用 CF。
CFFU 并不支持重试,如果你想实现 CompletableFuture 的重试功能,可以使用 Resilience4J。
本文由博客一文多发平台 OpenWrite 发布!