“不积跬步,无以至千里。”
Eureka Client启动之后,第一件事情一定是找Eureka Server进行服务注册与注册表的抓取,这一篇文章重点分析一下Eureka Client服务注册的流程。
Eureka Client这一块的代码架构设计方面做的不是特别好,为什么这么讲??不是个人吹毛求疵,因为Eureka作为一个服务注册中心的一个组件,那么这个服务注册一定是最重要的一块机制,一定是要放在显眼的位置,便于阅读和故障调试,但是在Eureka的代码里,这块代码藏的比较深,甚至于命名都不太规范,以至于通过静态源码,几乎很难找到在哪里??
首先分析一下,这块逻辑一定是在EurekaClient的初始化代码里,也就是DiscoveryClient的构造方法里,我们之前已经看过的一块代码;
@Inject
DiscoveryClient(ApplicationInfoManager applicationInfoManager, EurekaClientConfig config, AbstractDiscoveryClientOptionalArgs args,
Provider<BackupRegistry> backupRegistryProvider, EndpointRandomizer endpointRandomizer) {
if (args != null) {
this.healthCheckHandlerProvider = args.healthCheckHandlerProvider;
this.healthCheckCallbackProvider = args.healthCheckCallbackProvider;
this.eventListeners.addAll(args.getEventListeners());
this.preRegistrationHandler = args.preRegistrationHandler;
} else {
this.healthCheckCallbackProvider = null;
this.healthCheckHandlerProvider = null;
this.preRegistrationHandler = null;
}
this.applicationInfoManager = applicationInfoManager;
InstanceInfo myInfo = applicationInfoManager.getInfo();
clientConfig = config;
staticClientConfig = clientConfig;
transportConfig = config.getTransportConfig();
instanceInfo = myInfo;
if (myInfo != null) {
appPathIdentifier = instanceInfo.getAppName() + "/" + instanceInfo.getId();
} else {
logger.warn("Setting instanceInfo to a passed in null value");
}
this.backupRegistryProvider = backupRegistryProvider;
this.endpointRandomizer = endpointRandomizer;
this.urlRandomizer = new EndpointUtils.InstanceInfoBasedUrlRandomizer(instanceInfo);
localRegionApps.set(new Applications());
fetchRegistryGeneration = new AtomicLong(0);
remoteRegionsToFetch = new AtomicReference<String>(clientConfig.fetchRegistryForRemoteRegions());
remoteRegionsRef = new AtomicReference<>(remoteRegionsToFetch.get() == null ? null : remoteRegionsToFetch.get().split(","));
if (config.shouldFetchRegistry()) {
this.registryStalenessMonitor = new ThresholdLevelsMetric(this, METRIC_REGISTRY_PREFIX + "lastUpdateSec_", new long[]{15L, 30L, 60L, 120L, 240L, 480L});
} else {
this.registryStalenessMonitor = ThresholdLevelsMetric.NO_OP_METRIC;
}
if (config.shouldRegisterWithEureka()) {
this.heartbeatStalenessMonitor = new ThresholdLevelsMetric(this, METRIC_REGISTRATION_PREFIX + "lastHeartbeatSec_", new long[]{15L, 30L, 60L, 120L, 240L, 480L});
} else {
this.heartbeatStalenessMonitor = ThresholdLevelsMetric.NO_OP_METRIC;
}
... ...
}
这个方法里找过了,没有显眼的地方发现有服务注册的逻辑,不过还是看到有一个初始化调度任务的代码,猜想服务注册之后要持续发送心跳,可能会放在调度任务里面,所以点进去看一下,也许有想要的东西;
// finally, init the schedule tasks (e.g. cluster resolvers, heartbeat, instanceInfo replicator, fetch
initScheduledTasks();
private void initScheduledTasks() {
if (clientConfig.shouldFetchRegistry()) {
// registry cache refresh timer
int registryFetchIntervalSeconds = clientConfig.getRegistryFetchIntervalSeconds();
int expBackOffBound = clientConfig.getCacheRefreshExecutorExponentialBackOffBound();
cacheRefreshTask = new TimedSupervisorTask(
"cacheRefresh",
scheduler,
cacheRefreshExecutor,
registryFetchIntervalSeconds,
TimeUnit.SECONDS,
expBackOffBound,
new CacheRefreshThread()
);
scheduler.schedule(
cacheRefreshTask,
registryFetchIntervalSeconds, TimeUnit.SECONDS);
}
if (clientConfig.shouldRegisterWithEureka()) {
int renewalIntervalInSecs = instanceInfo.getLeaseInfo().getRenewalIntervalInSecs();
int expBackOffBound = clientConfig.getHeartbeatExecutorExponentialBackOffBound();
logger.info("Starting heartbeat executor: " + "renew interval is: {}", renewalIntervalInSecs);
// Heartbeat timer
heartbeatTask = new TimedSupervisorTask(
"heartbeat",
scheduler,
heartbeatExecutor,
renewalIntervalInSecs,
TimeUnit.SECONDS,
expBackOffBound,
new HeartbeatThread()
);
scheduler.schedule(
heartbeatTask,
renewalIntervalInSecs, TimeUnit.SECONDS);
// InstanceInfo replicator
instanceInfoReplicator = new InstanceInfoReplicator(
this,
instanceInfo,
clientConfig.getInstanceInfoReplicationIntervalSeconds(),
2); // burstSize
statusChangeListener = new ApplicationInfoManager.StatusChangeListener() {
@Override
public String getId() {
return "statusChangeListener";
}
@Override
public void notify(StatusChangeEvent statusChangeEvent) {
logger.info("Saw local status change event {}", statusChangeEvent);
instanceInfoReplicator.onDemandUpdate();
}
};
if (clientConfig.shouldOnDemandUpdateStatusChange()) {
applicationInfoManager.registerStatusChangeListener(statusChangeListener);
}
instanceInfoReplicator.start(clientConfig.getInitialInstanceInfoReplicationIntervalSeconds());
} else {
logger.info("Not registering with Eureka server per configuration");
}
}
instanceInfoReplicator!!!没错,找来找去,也只有这个东西应该是用来进行服务注册,replicator??这个词的语意一般都是用来做复制、副本相关的功能,绝对不是用在这种场景。。。
在方法的末尾,将这个组件启动;
instanceInfoReplicator.start(clientConfig.getInitialInstanceInfoReplicationIntervalSeconds());
在start方法里,将自己作为一个线程放到一个调度线程池中去,默认是延迟40s去执行这个线程,还将isDirty设置为了ture;
public void start(int initialDelayMs) {
if (started.compareAndSet(false, true)) {
instanceInfo.setIsDirty(); // for initial register
Future next = scheduler.schedule(this, initialDelayMs, TimeUnit.SECONDS);
scheduledPeriodicRef.set(next);
}
}
这个InstanceInfoReplicator组件继承自Runnable接口,所以放在调度线程池以后,会调用run方法:
public void run() {
try {
discoveryClient.refreshInstanceInfo();
Long dirtyTimestamp = instanceInfo.isDirtyWithTime();
if (dirtyTimestamp != null) {
discoveryClient.register();
instanceInfo.unsetIsDirty(dirtyTimestamp);
}
} catch (Throwable t) {
logger.warn("There was a problem with the instance info replicator", t);
} finally {
Future next = scheduler.schedule(this, replicationIntervalSeconds, TimeUnit.SECONDS);
scheduledPeriodicRef.set(next);
}
}
这个refreshInstanceInfo()方法主要做了两件事:
(1)调用ApplicationInfoManager的一些方法刷新了一下服务实例的配置,看看配置有没有改变,如果改变了,就刷新一下;
(2)用健康检查器,检查了一下状态,将状态设置到了ApplicationInfoManager中去,更新服务实例的状态;
void refreshInstanceInfo() {
applicationInfoManager.refreshDataCenterInfoIfRequired();
applicationInfoManager.refreshLeaseInfoIfRequired();
InstanceStatus status;
try {
status = getHealthCheckHandler().getStatus(instanceInfo.getStatus());
} catch (Exception e) {
logger.warn("Exception from healthcheckHandler.getStatus, setting status to DOWN", e);
status = InstanceStatus.DOWN;
}
if (null != status) {
applicationInfoManager.setInstanceStatus(status);
}
}
继续回到run方法里比较核心的一行代码:
discoveryClient.register();
这里会正式进行服务注册,调用了DiscoveryClient的register方法;
boolean register() throws Throwable {
logger.info(PREFIX + "{}: registering service...", appPathIdentifier);
EurekaHttpResponse<Void> httpResponse;
try {
//private final EurekaTransport eurekaTransport;
//private EurekaHttpClient registrationClient;
httpResponse = eurekaTransport.registrationClient.register(instanceInfo);
} catch (Exception e) {
logger.warn(PREFIX + "{} - registration failed {}", appPathIdentifier, e.getMessage(), e);
throw e;
}
if (logger.isInfoEnabled()) {
logger.info(PREFIX + "{} - registration status: {}", appPathIdentifier, httpResponse.getStatusCode());
}
return httpResponse.getStatusCode() == Status.NO_CONTENT.getStatusCode();
}
很明显,这个服务注册调用的是底层的EurekaTransport组件的EurekaHttpClient组件,执行了register()方法,将InstanceInfo服务实例的信息作为参数,通过httpclient组件发起http请求,调用了Eureka server对外提供的restful接口,然后拿到注册的结果;
那么这个EurekaTransport是在哪里初始化的呢?
答案是DiscoveryClient的构造方法里,正是通过scheduleServerEndpointTask方法完成的;
eurekaTransport = new EurekaTransport();
//给registrationClient赋值的逻辑在此方法中
scheduleServerEndpointTask(eurekaTransport, args);
我们再次回到register方法,这个httpclient使用的是子类AbstractJersey2EurekaHttpClient,这个只能通过打断点调试才知道,因为这个httpclient类的设计体系太乱了,通过静态源码很难找到,很快就能把你绕晕,也因为知道Eureka是基于jersey框架进行网络通信这个大背景,加上在对应的方法打上断点,很快定位到了真正发送register的逻辑:
@Override
public EurekaHttpResponse<Void> register(InstanceInfo info) {
String urlPath = "apps/" + info.getAppName();
Response response = null;
try {
Builder resourceBuilder = jerseyClient.target(serviceUrl).path(urlPath).request();
addExtraProperties(resourceBuilder);
addExtraHeaders(resourceBuilder);
response = resourceBuilder
.accept(MediaType.APPLICATION_JSON)
.acceptEncoding("gzip")
//发送的是POST请求
.post(Entity.json(info));
return anEurekaHttpResponse(response.getStatus()).headers(headersOf(response)).build();
} finally {
if (logger.isDebugEnabled()) {
logger.debug("Jersey2 HTTP POST {}/{} with instance {}; statusCode={}", serviceUrl, urlPath, info.getId(),
response == null ? "N/A" : response.getStatus());
}
if (response != null) {
response.close();
}
}
}
这个serviceUrl就是类似于“http://localhost:8080/v2”这样的一串东西,appName就是你配置的服务名称,所以最终发送的请求是“POST http://localhost:8080/v2/apps/ServiceA”这种形式的url;
在这个eureka core的resources包下面,有一堆的XXXResource,这些Resource相当于Springmvc的Controller,是用来接收这个http请求的。Resource相当于是jersey框架里面的controller吧。
EurekaClient通过http发送给Server的请求会先经过核心servlet–>ServletContainer,然后通过请求url分发给对应的Resources处理;
<filter>
<filter-name>jersey</filter-name>
<filter-class>com.sun.jersey.spi.container.servlet.ServletContainer</filter-class>
<init-param>
<param-name>com.sun.jersey.config.property.WebPageContentRegex</param-name>
<param-value>/(flex|images|js|css|jsp)/.*</param-value>
</init-param>
<init-param>
<param-name>com.sun.jersey.config.property.packages</param-name>
<param-value>com.sun.jersey;com.netflix</param-value>
</init-param>
... ...
</filter>
所以我们的请求最终被分发到ApplicationResource,由addInstance()方法来处理,至于分发的流程就不具体分析了,说实话jersey这个框架国内基本不用,当时netflix如果用springmvc不就省事了么,让我们先来看看addInstance方法的实现代码;
@POST
@Consumes({"application/json", "application/xml"})
public Response addInstance(InstanceInfo info,
@HeaderParam(PeerEurekaNode.HEADER_REPLICATION) String isReplication) {
logger.debug("Registering instance {} (replication={})", info.getId(), isReplication);
// validate that the instanceinfo contains all the necessary required fields
if (isBlank(info.getId())) {
return Response.status(400).entity("Missing instanceId").build();
} else if (isBlank(info.getHostName())) {
return Response.status(400).entity("Missing hostname").build();
} else if (isBlank(info.getIPAddr())) {
return Response.status(400).entity("Missing ip address").build();
} else if (isBlank(info.getAppName())) {
return Response.status(400).entity("Missing appName").build();
} else if (!appName.equals(info.getAppName())) {
return Response.status(400).entity("Mismatched appName, expecting " + appName + " but was " + info.getAppName()).build();
} else if (info.getDataCenterInfo() == null) {
return Response.status(400).entity("Missing dataCenterInfo").build();
} else if (info.getDataCenterInfo().getName() == null) {
return Response.status(400).entity("Missing dataCenterInfo Name").build();
}
// handle cases where clients may be registering with bad DataCenterInfo with missing data
DataCenterInfo dataCenterInfo = info.getDataCenterInfo();
if (dataCenterInfo instanceof UniqueIdentifier) {
String dataCenterInfoId = ((UniqueIdentifier) dataCenterInfo).getId();
if (isBlank(dataCenterInfoId)) {
boolean experimental = "true".equalsIgnoreCase(serverConfig.getExperimental("registration.validation.dataCenterInfoId"));
if (experimental) {
String entity = "DataCenterInfo of type " + dataCenterInfo.getClass() + " must contain a valid id";
return Response.status(400).entity(entity).build();
} else if (dataCenterInfo instanceof AmazonInfo) {
AmazonInfo amazonInfo = (AmazonInfo) dataCenterInfo;
String effectiveId = amazonInfo.get(AmazonInfo.MetaDataKey.instanceId);
if (effectiveId == null) {
amazonInfo.getMetadata().put(AmazonInfo.MetaDataKey.instanceId.getName(), info.getId());
}
} else {
logger.warn("Registering DataCenterInfo of type {} without an appropriate id", dataCenterInfo.getClass());
}
}
}
registry.register(info, "true".equals(isReplication));
return Response.status(204).build(); // 204 to be backwards compatible
}
方法的开始做了一堆的校验,这种防御式编程,可以很好的保持代码的健壮性。不过这种代码跟核心业务逻辑放在一起,不太好,最好是设计一个工具类来进行校验。
Eureka里大量的跟亚马逊云耦合在一起,很多地方使用硬编码,也是不太可取的,所以看源码要吸收人家的优点,但也要有自己的标准,不一定源码就全部都是精华,也有一些是糟粕,需要自己筛选、甄别,这就需要我们的技术内功,这个内功只能依赖阅读大量开源项目源码积攒而来。
随后进入PeerAwareInstanceRegistryImpl(注册表)的register方法,发现默认的心跳续约时间是90s,如果我们自己有配置,会使用我们的来覆盖默认值;
@Override
public void register(final InstanceInfo info, final boolean isReplication) {
//public static final int DEFAULT_DURATION_IN_SECS = 90;
int leaseDuration = Lease.DEFAULT_DURATION_IN_SECS;
if (info.getLeaseInfo() != null && info.getLeaseInfo().getDurationInSecs() > 0) {
leaseDuration = info.getLeaseInfo().getDurationInSecs();
}
super.register(info, leaseDuration, isReplication);
replicateToPeers(Action.Register, info.getAppName(), info.getId(), info, null, isReplication);
}
调用父类AbstractInstanceRegistry的注册方法完成注册;
public void register(InstanceInfo registrant, int leaseDuration, boolean isReplication) {
read.lock();
try {
Map<String, Lease<InstanceInfo>> gMap = registry.get(registrant.getAppName());
REGISTER.increment(isReplication);
if (gMap == null) {
final ConcurrentHashMap<String, Lease<InstanceInfo>> gNewMap = new ConcurrentHashMap<String, Lease<InstanceInfo>>();
gMap = registry.putIfAbsent(registrant.getAppName(), gNewMap);
if (gMap == null) {
gMap = gNewMap;
}
}
Lease<InstanceInfo> existingLease = gMap.get(registrant.getId());
// Retain the last dirty timestamp without overwriting it, if there is already a lease
if (existingLease != null && (existingLease.getHolder() != null)) {
Long existingLastDirtyTimestamp = existingLease.getHolder().getLastDirtyTimestamp();
Long registrationLastDirtyTimestamp = registrant.getLastDirtyTimestamp();
logger.debug("Existing lease found (existing={}, provided={}", existingLastDirtyTimestamp, registrationLastDirtyTimestamp);
// this is a > instead of a >= because if the timestamps are equal, we still take the remote transmitted
// InstanceInfo instead of the server local copy.
if (existingLastDirtyTimestamp > registrationLastDirtyTimestamp) {
logger.warn("There is an existing lease and the existing lease's dirty timestamp {} is greater" +
" than the one that is being registered {}", existingLastDirtyTimestamp, registrationLastDirtyTimestamp);
logger.warn("Using the existing instanceInfo instead of the new instanceInfo as the registrant");
registrant = existingLease.getHolder();
}
} else {
// The lease does not exist and hence it is a new registration
synchronized (lock) {
if (this.expectedNumberOfClientsSendingRenews > 0) {
// Since the client wants to register it, increase the number of clients sending renews
this.expectedNumberOfClientsSendingRenews = this.expectedNumberOfClientsSendingRenews + 1;
updateRenewsPerMinThreshold();
}
}
logger.debug("No previous lease information found; it is new registration");
}
Lease<InstanceInfo> lease = new Lease<InstanceInfo>(registrant, leaseDuration);
if (existingLease != null) {
lease.setServiceUpTimestamp(existingLease.getServiceUpTimestamp());
}
gMap.put(registrant.getId(), lease);
recentRegisteredQueue.add(new Pair<Long, String>(
System.currentTimeMillis(),
registrant.getAppName() + "(" + registrant.getId() + ")"));
// This is where the initial state transfer of overridden status happens
if (!InstanceStatus.UNKNOWN.equals(registrant.getOverriddenStatus())) {
logger.debug("Found overridden status {} for instance {}. Checking to see if needs to be add to the "
+ "overrides", registrant.getOverriddenStatus(), registrant.getId());
if (!overriddenInstanceStatusMap.containsKey(registrant.getId())) {
logger.info("Not found overridden id {} and hence adding it", registrant.getId());
overriddenInstanceStatusMap.put(registrant.getId(), registrant.getOverriddenStatus());
}
}
InstanceStatus overriddenStatusFromMap = overriddenInstanceStatusMap.get(registrant.getId());
if (overriddenStatusFromMap != null) {
logger.info("Storing overridden status {} from map", overriddenStatusFromMap);
registrant.setOverriddenStatus(overriddenStatusFromMap);
}
// Set the status based on the overridden status rules
InstanceStatus overriddenInstanceStatus = getOverriddenInstanceStatus(registrant, existingLease, isReplication);
registrant.setStatusWithoutDirty(overriddenInstanceStatus);
// If the lease is registered with UP status, set lease service up timestamp
if (InstanceStatus.UP.equals(registrant.getStatus())) {
lease.serviceUp();
}
registrant.setActionType(ActionType.ADDED);
recentlyChangedQueue.add(new RecentlyChangedItem(lease));
registrant.setLastUpdatedTimestamp();
invalidateCache(registrant.getAppName(), registrant.getVIPAddress(), registrant.getSecureVipAddress());
logger.info("Registered instance {}/{} with status {} (replication={})",
registrant.getAppName(), registrant.getId(), registrant.getStatus(), isReplication);
} finally {
read.unlock();
}
}
首先加了一把读锁,说明服务注册的方法允许多线程并发执行,为啥?你看看这个注册表是啥就知道了;
private final ConcurrentHashMap<String, Map<String, Lease<InstanceInfo>>> registry
= new ConcurrentHashMap<String, Map<String, Lease<InstanceInfo>>>();
ConcurrentHashMap!!!线程安全的Map集合,是注册表的真面目,jdk7以后对这个Map使用了CAS机制的优化,所以性能不会太差。
首先根据服务名称获取一个Map<String, Lease<InstanceInfo>>,这个Map的key可以看出来就是服务实例Id,value是一个Lease对象,而且使用InstanceInfo做为泛型,不过第一次注册肯定Map是空的,会new一个新的ConcurrentHashMap;
Map<String, Lease<InstanceInfo>> gMap = registry.get(registrant.getAppName());
REGISTER.increment(isReplication);
if (gMap == null) {
final ConcurrentHashMap<String, Lease<InstanceInfo>> gNewMap = new ConcurrentHashMap<String, Lease<InstanceInfo>>();
gMap = registry.putIfAbsent(registrant.getAppName(), gNewMap);
if (gMap == null) {
gMap = gNewMap;
}
}
接着会走到这段代码
// The lease does not exist and hence it is a new registration
synchronized (lock) {
if (this.expectedNumberOfClientsSendingRenews > 0) {
// Since the client wants to register it, increase the number of clients sending renews
this.expectedNumberOfClientsSendingRenews = this.expectedNumberOfClientsSendingRenews + 1;
updateRenewsPerMinThreshold();
}
}
logger.debug("No previous lease information found; it is new registration");
这段代码还是有点意思的,我先说说这里是在干什么,用过EurekaClient的朋友应该都知道,Eureka有一个保护机制,是说一段时间内,如果Eureka发现服务实例大规模的故障,就会认为自己的网络出现问题,就会进入保护模式,不再摘除任何服务实例,这个expectedNumberOfClientsSendingRenews就是它期望的发送心跳的实例,而下面的updateRenewsPerMinThreshold()方法就是在计算期望心跳,如果实际的集群心跳少于这个期望,就会开启保护模式,为什么说有意思,我之前看1.7.2的时候,你知道它这里是怎么设计的嘛,+2,就是说期望的心跳每上线一个服务实例就会+2,你可能会想,为什么是+2,不是加3或者别的,因为默认的EurekaClient发送心跳的间隔是30s!!!是不是瞬间明白了,这不是搞笑吗?我严重怀疑之前的版本这一块就是netflix的实习生写的,可能是觉得太离谱,在后续的版本修复了这个问题。。。
然后new了一个Lease,以实例Id做为key,放进Map<String, Lease>;
Lease<InstanceInfo> lease = new Lease<InstanceInfo>(registrant, leaseDuration);
if (existingLease != null) {
lease.setServiceUpTimestamp(existingLease.getServiceUpTimestamp());
}
gMap.put(registrant.getId(), lease);
接着用服务名称和实例Id封装了一个Pair对象,放到一个recentRegisteredQueue里面,这个queue保存的就是最新注册的服务
private final CircularQueue<Pair<Long, String>> recentRegisteredQueue;
recentRegisteredQueue.add(new Pair<Long, String>(
System.currentTimeMillis(),
registrant.getAppName() + "(" + registrant.getId() + ")"));
同时把这个Lease封装了一个RecentlyChangedItem放到最新变化的队列recentlyChangedQueue,这个东东后面也会说,先有个概念;
private ConcurrentLinkedQueue<RecentlyChangedItem> recentlyChangedQueue = new ConcurrentLinkedQueue<RecentlyChangedItem>();
recentlyChangedQueue.add(new RecentlyChangedItem(lease));
最后过期掉RW缓存,关于注册表的RW和RO缓存,后面拉取注册表会说,这里先有个印象;
invalidateCache(registrant.getAppName(), registrant.getVIPAddress(), registrant.getSecureVipAddress());
最后,来总结下服务注册表的数据结构
ConcurrentHashMap<String, Map<String, Lease>>
类似下面的结构,这个InstanceInfo主要就是包含主机名、ip地址、端口号、租约信息等等这些。
{
“ServiceA”: {
“001”: Lease<InstanceInfo>,
“002”: Lease<InstanceInfo>,
“003”: Lease<InstanceInfo>
},
“ServiceB”: {
“001”: Lease<InstanceInfo>,
“002”: Lease<InstanceInfo>,
“003”: Lease<InstanceInfo>
}
}
最后附上一张自绘制的EurekaClient服务注册流程图帮助理解。

本文深入剖析了Eureka客户端的服务注册流程,从EurekaClient初始化开始,详细讲解了服务注册的各个步骤,包括EurekaHttpClient的使用,InstanceInfoReplicator如何启动,以及注册信息如何通过HTTP POST请求发送到EurekaServer。在EurekaServer端,注册信息经过校验后被存储到ConcurrentHashMap结构的注册表中,确保服务实例的正确注册和管理。
2920

被折叠的 条评论
为什么被折叠?



