1、问题的直接表现:
org.apache.dubbo.rpc.RpcException: No provider available from registry localhost:9090 for service com.hxy.boot.ticket.articles.api.ArticleService on consumer 192.168.137.1 use dubbo version 2.7.8, please check status of providers(disabled, not registered or in blacklist).
at org.apache.dubbo.registry.integration.RegistryDirectory.doList(RegistryDirectory.java:599)
at org.apache.dubbo.rpc.cluster.directory.AbstractDirectory.list(AbstractDirectory.java:74)
at org.apache.dubbo.rpc.cluster.support.AbstractClusterInvoker.list(AbstractClusterInvoker.java:292)
at org.apache.dubbo.rpc.cluster.support.AbstractClusterInvoker.invoke(AbstractClusterInvoker.java:257)
at org.apache.dubbo.rpc.cluster.interceptor.ClusterInterceptor.intercept(ClusterInterceptor.java:47)
at org.apache.dubbo.rpc.cluster.support.wrapper.AbstractCluster$InterceptorInvokerNode.invoke(AbstractCluster.java:92)
at org.apache.dubbo.rpc.cluster.support.wrapper.MockClusterInvoker.invoke(MockClusterInvoker.java:88)
at org.apache.dubbo.rpc.proxy.InvokerInvocationHandler.invoke(InvokerInvocationHandler.java:74)
2、问题的直接原因
调用服务提供者时,消费者的dubbo的服务目录 `org.apache.dubbo.registry.integration.RegistryDirectory` 的 `forbidden` 属性 为 `true`,如下图:
3、问题的重现
这个问题是偶尔出现的,不容易捕捉。经过分析,在服务提供者的 `org.apache.dubbo.config.spring.context.DubboBootstrapApplicationListener#onContextRefreshedEvent(ContextRefreshedEvent event)` 的 31行打上断点,并且`suspend`模式设为 `Thread`,然后重启服务提供者,就会一直重现此问题。如下图:
4、问题的根本原因
问题的根本原因是spring cloud alibaba框架启动nacos自动服务注册的时点比启动dubbo服务注册的时点早。前者的启动时点是监听到`WebServerInitializedEvent`事件时(`org.springframework.cloud.client.serviceregistry.AbstractAutoServiceRegistration#bind(WebServerInitializedEvent event)`),后者的启动时点是监听到`ContextRefreshedEvent`事件时(`org.apache.dubbo.config.spring.context.DubboBootstrapApplicationListener#onContextRefreshedEvent(ContextRefreshedEvent event)`)。
在`spring boot 2.2.x`中`ServletWebServerInitializedEvent`事件的发布是在`ContextRefreshedEvent`事件之后,如图:
但在 `spring boot 2.3.x` 中改在了`ContextRefreshedEvent`事件前,如图:
nacos服务端在处理了服务提供者的注册请求后向订阅者下发了实例变更通知,而在这个过程中提供者自身的dubbo服务暴露有可能还没有完成,最直接的表现就是服务提供者的 `com.alibaba.cloud.dubbo.metadata.repository.DubboServiceMetadataRepository` 的 `allExportedURLs `属性中还没有对应的dubbo服务的URL。
在第3条的问题重现里面,当程序跑到断点的时候,通过`jprofiler`查看此时的堆栈信息,可以看到`allExportedURLs`属性中没有期望的值。
因为`spring cloud alibaba + dubbo` 中dubbo的服务是暴露在本地的`com.alibaba.cloud.dubbo.metadata.repository.DubboServiceMetadataRepository`中的 `allExportedURLs` 属性中,不会传到注册中心服务端。所以最终暴露完成以后,nacos服务端无法感知到dubbo服务是否已准备妥当,也无法通知订阅者。这种情况下,提供者发起调用时通过泛化调用`DubboMetadataService`接口获取提供者暴露的服务时,从 `allExportedURLs` 中获取到的就是一个空的 `List`。然后消费者就会以为是没有提供者,于是在自己本地的dubbo服务目录 `RegistryDirectory` 中 把禁用属性 `forbidden` 的值更新为了 `true`。
这时消费者调用提供者时就出现了第1条中的问题。
5.1 应用端解决方案
1、添加一个切面,切点为 `spring cloud` 的服务注册入口,然后在 `nacos` 服务注册之前先启动 `dubbo`,暴露dubbo服务:
@Before("execution(* org.springframework.cloud.client.serviceregistry.ServiceRegistry.register(*)) && args(registration)")
public void beforeRegister(Registration registration) {
DubboBootstrap dubboBootstrap = DubboBootstrap.getInstance();
dubboBootstrap.start();
}
经测试,这个方案有两个缺陷:
A. `dubbo.protocol.port`值为 `-1` 时,再次启动dubbo的时候会改变port。
B. `dubboBootstrap.start()`再次执行后,dubbo服务会关闭后重启,重启期间端口不对外提供服务。这时 `DubboMetadataService` 请求的时候就会报以下错误:
org.apache.dubbo.rpc.RpcException: Failed to invoke the method getExportedURLs in the service org.apache.dubbo.rpc.service.GenericService. Tried 3 times of the providers [10.46.187.177:39053] (1/1) from the registry localhost:9090 on the consumer 10.46.187.177 using the dubbo version 2.7.8. Last error is: Failed to invoke remote method: $invoke, provider: dubbo://10.46.187.177:39053/com.alibaba.cloud.dubbo.service.DubboMetadataService?anyhost=true&application=financing-app&bind.ip=10.46.187.177&bind.port=39053&check=false&deprecated=false&dubbo=2.0.2&dynamic=true&generic=true&group=order-app&interface=com.alibaba.cloud.dubbo.service.DubboMetadataService&methods=getAllServiceKeys,getServiceRestMetadata,getExportedURLs,getAllExportedURLs&pid=1135&qos.enable=false®ister.ip=10.46.187.177&release=2.7.8&remote.application=order-app&revision=2.2.3.RELEASE&side=consumer&sticky=false&timeout=5000×tamp=1604642760476&version=1.0.0, cause: message can not send, because channel is closed . url:dubbo://10.46.187.177:39053/com.alibaba.cloud.dubbo.service.DubboMetadataService?anyhost=true&application=financing-app&bind.ip=10.46.187.177&bind.port=39053&check=false&codec=dubbo&deprecated=false&dubbo=2.0.2&dynamic=true&generic=true&group=order-app&heartbeat=60000&interface=com.alibaba.cloud.dubbo.service.DubboMetadataService&methods=getAllServiceKeys,getServiceRestMetadata,getExportedURLs,getAllExportedURLs&pid=1135&qos.enable=false®ister.ip=10.46.187.177&release=2.7.8&remote.application=order-app&revision=2.2.3.RELEASE&side=consumer&sticky=false&timeout=5000×tamp=1604634459861&version=1.0.0
at org.apache.dubbo.rpc.cluster.support.FailoverClusterInvoker.doInvoke(FailoverClusterInvoker.java:113)
at org.apache.dubbo.rpc.cluster.support.AbstractClusterInvoker.invoke(AbstractClusterInvoker.java:260)
at org.apache.dubbo.rpc.cluster.interceptor.ClusterInterceptor.intercept(ClusterInterceptor.java:47)
at org.apache.dubbo.rpc.cluster.support.wrapper.AbstractCluster$InterceptorInvokerNode.invoke(AbstractCluster.java:92)
at org.apache.dubbo.rpc.cluster.support.wrapper.MockClusterInvoker.invoke(MockClusterInvoker.java:88)
at org.apache.dubbo.rpc.proxy.InvokerInvocationHandler.invoke(InvokerInvocationHandler.java:74)
2、应用启动后,在 `ApplicationRunner`接口的`run`方法中,调用 `springCloudAlibaba`框架中的`NacosServiceRegistry`类的`setStatus`方法,更新一下在注册中心的实例状态:
方式一:
import com.alibaba.cloud.nacos.registry.NacosRegistration;
import com.alibaba.cloud.nacos.registry.NacosServiceRegistry;
import com.alibaba.nacos.api.exception.NacosException;
import com.alibaba.nacos.common.lifecycle.Closeable;
import com.alibaba.nacos.common.utils.ThreadUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.boot.ApplicationArguments;
import org.springframework.boot.ApplicationRunner;
import org.springframework.stereotype.Component;
import javax.annotation.PostConstruct;
import javax.annotation.Resource;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.ScheduledThreadPoolExecutor;
import java.util.concurrent.ThreadFactory;
import java.util.concurrent.TimeUnit;
@Component
public class NacosServiceInstanceUpAndDownOperator implements ApplicationRunner, Closeable {
protected Logger logger = LoggerFactory.getLogger(this.getClass());
/**
* nacos服务实例上线
*/
private static final String OPERATOR_UP = "UP";
/**
* nacos服务实例下线
*/
private static final String OPERATOR_DOWN = "DOWN";
@Resource
NacosServiceRegistry nacosServiceRegistry;
@Resource
NacosRegistration nacosRegistration;
private ScheduledExecutorService executorService;
@PostConstruct
public void init() {
int poolSize = 1;
this.executorService = new ScheduledThreadPoolExecutor(poolSize, new ThreadFactory() {
@Override
public Thread newThread(Runnable r) {
Thread thread = new Thread(r);
thread.setDaemon(true);
thread.setName("NacosServiceInstanceUpAndDownOperator");
return thread;
}
});
}
@Override
public void run(ApplicationArguments args) throws Exception {
long delayDown = 5000L; //下线任务延迟
long delayUp = 10000L; // 上线任务延迟
this.executorService.schedule(new InstanceDownAndUpTask(nacosServiceRegistry, nacosRegistration, OPERATOR_DOWN), delayDown, TimeUnit.MILLISECONDS);
this.executorService.schedule(new InstanceDownAndUpTask(nacosServiceRegistry, nacosRegistration, OPERATOR_UP), delayUp, TimeUnit.MILLISECONDS);
}
@Override
public void shutdown() throws NacosException {
ThreadUtils.shutdownThreadPool(executorService, logger);
}
/**
* 服务实例上下线任务
*/
class InstanceDownAndUpTask implements Runnable {
private NacosServiceRegistry nacosServiceRegistry;
private NacosRegistration nacosRegistration;
//更新服务实例的状态 :UP 、DOWN
private String nacosServiceInstanceOperator;
InstanceDownAndUpTask(NacosServiceRegistry nacosServiceRegistry, NacosRegistration nacosRegistration, String nacosServiceInstanceOperator) {
this.nacosServiceRegistry = nacosServiceRegistry;
this.nacosRegistration = nacosRegistration;
this.nacosServiceInstanceOperator = nacosServiceInstanceOperator;
}
@Override
public void run() {
logger.info("===更新nacos服务实例的状态to:{}===start=", nacosServiceInstanceOperator);
this.nacosServiceRegistry.setStatus(nacosRegistration, nacosServiceInstanceOperator);
logger.info("===更新nacos服务实例的状态to:{}===end=", nacosServiceInstanceOperator);
//上线后,关闭线程池
if (NacosServiceInstanceUpAndDownOperator.OPERATOR_UP.equals(nacosServiceInstanceOperator)) {
ThreadUtils.shutdownThreadPool(NacosServiceInstanceUpAndDownOperator.this.executorService, NacosServiceInstanceUpAndDownOperator.this.logger);
}
}
}
}
方式二:
import com.alibaba.cloud.nacos.NacosDiscoveryProperties;
import com.alibaba.cloud.nacos.NacosServiceManager;
import com.alibaba.cloud.nacos.registry.NacosRegistration;
import com.alibaba.cloud.nacos.registry.NacosServiceRegistry;
import com.alibaba.nacos.api.naming.pojo.Instance;
import com.alibaba.nacos.common.lifecycle.Closeable;
import com.alibaba.nacos.common.utils.ThreadUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.boot.ApplicationArguments;
import org.springframework.boot.ApplicationRunner;
import org.springframework.cloud.client.serviceregistry.Registration;
import org.springframework.stereotype.Component;
import javax.annotation.PostConstruct;
import javax.annotation.Resource;
import java.util.Properties;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.ScheduledThreadPoolExecutor;
import java.util.concurrent.TimeUnit;
@Component
public class NacosUpDown implements ApplicationRunner, Closeable {
private static final Logger logger = LoggerFactory.getLogger(NacosUpDown.class);
/**
* nacos服务实例上线
*/
private static final String OPERATOR_UP = "UP";
/**
* nacos服务实例下线
*/
private static final String OPERATOR_DOWN = "DOWN";
@Resource
NacosServiceRegistry nacosServiceRegistry;
@Resource
NacosRegistration nacosRegistration;
@Resource
private NacosServiceManager nacosServiceManager;
@Resource
private NacosDiscoveryProperties nacosDiscoveryProperties;
private ScheduledExecutorService executorService;
@PostConstruct
public void init() {
int poolSize = 1;
this.executorService = new ScheduledThreadPoolExecutor(poolSize, r -> {
Thread thread = new Thread(r);
thread.setDaemon(true);
thread.setName("NacosUpAndDown");
return thread;
});
}
@Override
public void run(ApplicationArguments args){
//下线任务延迟
long delayDown = 15000L;
// 上线任务延迟
long delayUp = 21000L;
this.executorService.schedule(new InstanceDownAndUpTask(nacosServiceRegistry, nacosRegistration, OPERATOR_DOWN), delayDown, TimeUnit.MILLISECONDS);
this.executorService.schedule(new InstanceDownAndUpTask(nacosServiceRegistry, nacosRegistration, OPERATOR_UP), delayUp, TimeUnit.MILLISECONDS);
}
@Override
public void shutdown() {
ThreadUtils.shutdownThreadPool(executorService, logger);
}
/**
* 服务实例上下线任务
*/
class InstanceDownAndUpTask implements Runnable {
private final NacosServiceRegistry nacosServiceRegistry;
private final NacosRegistration nacosRegistration;
//更新服务实例的状态 :UP 、DOWN
private final String nacosServiceInstanceOperator;
InstanceDownAndUpTask(NacosServiceRegistry nacosServiceRegistry, NacosRegistration nacosRegistration, String nacosServiceInstanceOperator) {
this.nacosServiceRegistry = nacosServiceRegistry;
this.nacosRegistration = nacosRegistration;
this.nacosServiceInstanceOperator = nacosServiceInstanceOperator;
}
@Override
public void run() {
logger.info("===更新nacos服务实例的状态to:{}===start=", nacosServiceInstanceOperator);
setStatus(nacosRegistration, nacosServiceInstanceOperator);
logger.info("===更新nacos服务实例的状态to:{}===end=", nacosServiceInstanceOperator);
//上线后,关闭线程池
if (NacosUpDown.OPERATOR_UP.equals(nacosServiceInstanceOperator)) {
ThreadUtils.shutdownThreadPool(NacosUpDown.this.executorService, logger);
}
}
}
public void setStatus(Registration registration, String status) {
if (!status.equalsIgnoreCase(OPERATOR_UP) && !status.equalsIgnoreCase(OPERATOR_DOWN)) {
} else {
String serviceId = registration.getServiceId();
Instance instance = this.getNacosInstanceFromRegistration(registration);
if (status.equalsIgnoreCase(OPERATOR_DOWN)) {
instance.setEnabled(false);
} else {
instance.setEnabled(true);
}
try {
Properties nacosProperties = this.nacosDiscoveryProperties.getNacosProperties();
this.nacosServiceManager.getNamingMaintainService(nacosProperties).updateInstance(serviceId,nacosProperties.getProperty("group"), instance);
} catch (Exception var6) {
throw new RuntimeException("update nacos instance status fail", var6);
}
}
}
private Instance getNacosInstanceFromRegistration(Registration registration) {
Instance instance = new Instance();
instance.setIp(registration.getHost());
instance.setPort(registration.getPort());
instance.setWeight(this.nacosDiscoveryProperties.getWeight());
instance.setClusterName(this.nacosDiscoveryProperties.getClusterName());
instance.setEnabled(this.nacosDiscoveryProperties.isInstanceEnabled());
instance.setMetadata(registration.getMetadata());
instance.setEphemeral(this.nacosDiscoveryProperties.isEphemeral());
return instance;
}
}
5.2 框架端解决方案的几点意见
a. 调换`spring cloud`的服务自动注册 和 dubbo服务注册的触发时点
让dubbo服务暴露的启动早于spring cloud的服务自动注册。这样的话就需要修改`spring cloud commons`的源码 和 dubbo 框架的源码,而且动的是根基,感觉不太舒服。
b. spring cloud alibaba 中,dubbo服务暴露完成后向nacos注册中心发布一个更新通知
c. spring cloud alibaba 中,添加一个切面,切点为 spring cloud 的服务注册入口,然后在nacos服务注册之前先暴露dubbo服务
`spring cloud alibaba`框架中已经有一个现成的切面 `DubboServiceRegistrationEventPublishingAspect#beforeRegister(Registration registration)` ,可以在前置切点里面再加入dubbo服务的暴露就可以了,但对dubbo框架的服务暴露的过程需要做一些调整,避免在 `ContextRefreshedEvent` 事件后做一些重复的工作。