Eureka原理以及核心代码分析
一、前言
我个人理解,看源码抓住主线逻辑去看就行了,没有必要把所有内容都理解,除非是你真正需要。我们看源码就是为了了解一个技术组件的原理以及它的实现方式,方便我们去定位bug和学习它们的设计思想。
看代码一定不要揪着一个点去想他的意思是什么,要先看全局再看细节。
二、Eureka Client如何注册
正常我们把一个服务注册到eureka上只需要四步:
1.新建springboot项目 2.引入依赖 3.配置yml文件 4.启动项目
可以发现我们根本找不到这个服务注册到eureka service的入口在什么地方,所以我们得懂spring的原理,如果不懂的话就很难去理解springcloud相关组件的原理。
1. Lifecycle和SmartLifecycle
Lifecycle和SmartLifecycle是spring里面的东西,这里就不细说了,可参考:
https://blog.youkuaiyun.com/bronze5/article/details/106558309
2. 启动时注册代码分析
在spring启动的最后阶段,会从spring容器中调用实现了SmartLifecycle的类的start方法,eureka client就是根据这个原理,通过 EurekaAutoServiceRegistration 的 start方法 实现相关注册过程的。
EurekaAutoServiceRegistration中的start方法总的来说实现了三点:
1.更改实例状态 2.发布一个事件通知(这个步骤其实就会触发eureka client注册的功能) 3.注册健康检测机制
正常来说我们猜想eureka client向eureka server注册信息肯定会把自身实例的信息发送到eureka server服务上,但是start方法里并没有看到这部分代码。这部分内容在com.netflix.discovery包下的DiscoveryClient类中。
在DiscoveryClient类有一个服务注册的方法register(),该方法是通过Http请求向Eureka Server注册。其代码如下:
/**
* Register with the eureka service by making the appropriate REST call.
*/
boolean register() throws Throwable {
logger.info(PREFIX + "{}: registering service...", appPathIdentifier);
EurekaHttpResponse<Void> httpResponse;
try {
httpResponse = eurekaTransport.registrationClient.register(instanceInfo);
} catch (Exception e) {
logger.warn(PREFIX + "{} - registration failed {}", appPathIdentifier, e.getMessage(), e);
throw e;
}
if (logger.isInfoEnabled()) {
logger.info(PREFIX + "{} - registration status: {}", appPathIdentifier, httpResponse.getStatusCode());
}
return httpResponse.getStatusCode() == Status.NO_CONTENT.getStatusCode();
}
@Override
public EurekaHttpResponse<Void> register(InstanceInfo info) {
String urlPath = "apps/" + info.getAppName();
ClientResponse response = null;
try {
Builder resourceBuilder = jerseyClient.resource(serviceUrl).path(urlPath).getRequestBuilder();
addExtraHeaders(resourceBuilder);
response = resourceBuilder
.header("Accept-Encoding", "gzip")
.type(MediaType.APPLICATION_JSON_TYPE)
.accept(MediaType.APPLICATION_JSON)
.post(ClientResponse.class, info);
return anEurekaHttpResponse(response.getStatus()).headers(headersOf(response)).build();
} finally {
if (logger.isDebugEnabled()) {
logger.debug("Jersey HTTP POST {}/{} with instance {}; statusCode={}", serviceUrl, urlPath, info.getId(),
response == null ? "N/A" : response.getStatus());
}
if (response != null) {
response.close();
}
}
}
在DiscoveryClient类先上追踪register()方法被谁引用了,它被InstanceInfoReplicator 类的run()方法调用,其中InstanceInfoReplicator实现了Runnable接口,run()方法代码如下:
public void run() {
try {
discoveryClient.refreshInstanceInfo();
Long dirtyTimestamp = instanceInfo.isDirtyWithTime();
if (dirtyTimestamp != null) {
discoveryClient.register();
instanceInfo.unsetIsDirty(dirtyTimestamp);
}
} catch (Throwable t) {
logger.warn("There was a problem with the instance info replicator", t);
} finally {
Future next = scheduler.schedule(this, replicationIntervalSeconds, TimeUnit.SECONDS);
scheduledPeriodicRef.set(next);
}
}
而InstanceInfoReplicator类是在DiscoveryClient初始化过程中使用的,其中DiscoveryClient类中有一个initScheduledTasks()方法。该方法主要启动了一些定时任务:
- cacheRefreshTask:每30s去server拉取服务列表信息,默认30秒,可通过eureka.client.registryFetchIntervalSeconds 配置;
- heartbeatTask:每30s向server续约(心跳机制),默认30秒,可通过eureka.instance.leaseRenewalIntervalInSeconds 配置;
- InstanceInfoReplicator:每40S将InstanceInfo的信息往server同步,每当有instanceStatus改变的时候也会触发同步。默认40秒,可通过eureka.client.initialInstanceInfoReplicationIntervalSeconds 配置;
其实上面说的每隔多长时间执行一次是不准确的,eureka用了TimedSupervisorTask(自动调节间隔的周期性任务),可以先了解下这个类再去看下面的代码。
TimedSupervisorTask类的作用:
https://blog.youkuaiyun.com/boling_cavalry/article/details/82795825
/**
* Initializes all scheduled tasks.
*/
private void initScheduledTasks() {
if (clientConfig.shouldFetchRegistry()) {
// registry cache refresh timer
int registryFetchIntervalSeconds = clientConfig.getRegistryFetchIntervalSeconds();
int expBackOffBound = clientConfig.getCacheRefreshExecutorExponentialBackOffBound();
cacheRefreshTask = new TimedSupervisorTask(
"cacheRefresh",
scheduler,
cacheRefreshExecutor,
registryFetchIntervalSeconds,
TimeUnit.SECONDS,
expBackOffBound,
new CacheRefreshThread()
);
//-----------cacheRefreshTask 同步server最新列表-----------
scheduler.schedule(
cacheRefreshTask,
registryFetchIntervalSeconds, TimeUnit.SECONDS);
}
if (clientConfig.shouldRegisterWithEureka()) {
int renewalIntervalInSecs = instanceInfo.getLeaseInfo().getRenewalIntervalInSecs();
int expBackOffBound = clientConfig.getHeartbeatExecutorExponentialBackOffBound();
logger.info("Starting heartbeat executor: " + "renew interval is: {}", renewalIntervalInSecs);
// Heartbeat timer
heartbeatTask = new TimedSupervisorTask(
"heartbeat",
scheduler,
heartbeatExecutor,
renewalIntervalInSecs,
TimeUnit.SECONDS,
expBackOffBound,
new HeartbeatThread()
);
//-----------cacheRefreshTask 心跳机制-----------
scheduler.schedule(
heartbeatTask,
renewalIntervalInSecs, TimeUnit.SECONDS);
// InstanceInfo replicator
instanceInfoReplicator = new InstanceInfoReplicator(
this,
instanceInfo,
clientConfig.getInstanceInfoReplicationIntervalSeconds(),
2); // burstSize
statusChangeListener = new ApplicationInfoManager.StatusChangeListener() {
@Override
public String getId() {
return "statusChangeListener";
}
@Override
public void notify(StatusChangeEvent statusChangeEvent) {
if (InstanceStatus.DOWN == statusChangeEvent.getStatus() ||
InstanceStatus.DOWN == statusChangeEvent.getPreviousStatus()) {
// log at warn level if DOWN was involved
logger.warn("Saw local status change event {}", statusChangeEvent);
} else {
logger.info("Saw local status change event {}", statusChangeEvent);
}
instanceInfoReplicator.onDemandUpdate();
}
};
if (clientConfig.shouldOnDemandUpdateStatusChange()) {
applicationInfoManager.registerStatusChangeListener(statusChangeListener);
}
//-----------instanceInfoReplicator里的run方法(向eureka server中注册自己)-----------
instanceInfoReplicator.start(clientConfig.getInitialInstanceInfoReplicationIntervalSeconds());
} else {
logger.info("Not registering with Eureka server per configuration");
}
}
3. 流程图总结
Eureka Client发起服务注册时,有两个地方会执行服务注册的任务
- 在Spring Boot启动时,通过refresh方法,最终调用StatusChangeListener.notify进行服务状态变更的监听,而这个监听的方法收到事件之后会去执行服务注册。
- 在Spring Boot启动时,由于自动装配机制将CloudEurekaClient注入到了容器,并且执行了构造方法,而在构造方法中有一个定时任务每40s会执行一次判断,判断实例信息是否发生了变化,如果是,则会发起服务注册的流程。
三、Eureka Server如何存储服务地址
上面说了Eureka客户端发送注册请求的代码位置
AbstractJerseyEurekaHttpClient#register:
@Override
public EurekaHttpResponse<Void> register(InstanceInfo info) {
String urlPath = "apps/" + info.getAppName();
ClientResponse response = null;
try {
Builder resourceBuilder = jerseyClient.resource(serviceUrl).path(urlPath).getRequestBuilder();
addExtraHeaders(resourceBuilder);
response = resourceBuilder
.header("Accept-Encoding", "gzip")
.type(MediaType.APPLICATION_JSON_TYPE)
.accept(MediaType.APPLICATION_JSON)
.post(ClientResponse.class, info);
return anEurekaHttpResponse(response.getStatus()).headers(headersOf(response)).build();
} finally {
if (logger.isDebugEnabled()) {
logger.debug("Jersey HTTP POST {}/{} with instance {}; statusCode={}", serviceUrl, urlPath, info.getId(),
response == null ? "N/A" : response.getStatus());
}
if (response != null) {
response.close();
}
}
}
1. Eureka Server收到请求之后的处理
请求入口在: com.netflix.eureka.resources.ApplicationResource.addInstance() 。
这里所提供的REST服务,采用的是jersey来实现的。其实可以把ApplicationResource看成是spring mvc的Controller来理解。
当EurekaClient调用register方法发起注册时,会调用ApplicationResource.addInstance方法。服务注册就是发送一个 POST 请求带上当前实例信息到类 ApplicationResource 的 addInstance方法进行服务注册。
@POST
@Consumes({"application/json", "application/xml"})
public Response addInstance(InstanceInfo info,
@HeaderParam(PeerEurekaNode.HEADER_REPLICATION) String isReplication) {
logger.debug("Registering instance {} (replication={})", info.getId(), isReplication);
// validate that the instanceinfo contains all the necessary required fields
if (isBlank(info.getId())) {
return Response.status(400).entity("Missing instanceId").build();
} else if (isBlank(info.getHostName())) {
return Response.status(400).entity("Missing hostname").build();
} else if (isBlank(info.getIPAddr())) {
return Response.status(400).entity("Missing ip address").build();
} else if (isBlank(info.getAppName())) {
return Response.status(400).entity("Missing appName").build();
} else if (!appName.equals(info.getAppName())) {
return Response.status(400).entity("Mismatched appName, expecting " + appName + " but was " + info.getAppName()).build();
} else if (info.getDataCenterInfo() == null) {
return Response.status(400).entity("Missing dataCenterInfo").build();
} else if (info.getDataCenterInfo().getName() == null) {
return Response.status(400).entity("Missing dataCenterInfo Name").build();
}
// handle cases where clients may be registering with bad DataCenterInfo with missing data
DataCenterInfo dataCenterInfo = info.getDataCenterInfo();
if (dataCenterInfo instanceof UniqueIdentifier) {
String dataCenterInfoId = ((UniqueIdentifier) dataCenterInfo).getId();
if (isBlank(dataCenterInfoId)) {
boolean experimental = "true".equalsIgnoreCase(serverConfig.getExperimental("registration.validation.dataCenterInfoId"));
if (experimental) {
String entity = "DataCenterInfo of type " + dataCenterInfo.getClass() + " must contain a valid id";
return Response.status(400).entity(entity).build();
} else if (dataCenterInfo instanceof AmazonInfo) {
AmazonInfo amazonInfo = (AmazonInfo) dataCenterInfo;
String effectiveId = amazonInfo.get(AmazonInfo.MetaDataKey.instanceId);
if (effectiveId == null) {
amazonInfo.getMetadata().put(AmazonInfo.MetaDataKey.instanceId.getName(), info.getId());
}
} else {
logger.warn("Registering DataCenterInfo of type {} without an appropriate id", dataCenterInfo.getClass());
}
}
}
registry.register(info, "true".equals(isReplication));
return Response.status(204).build(); // 204 to be backwards compatible
}
在 addInstance 方法中,registry.register最终调用的是PeerAwareInstanceRegistryImpl.register 方法。
- leaseDuration 表示租约过期时间,默认是90s,也就是当服务端超过90s没有收到客户端的心跳,则主动剔除该节点
- 调用super.register发起节点注册
- 将信息复制到Eureka Server集群中的其他机器上,同步的实现也很简单,就是获得集群中的所有节点,然后逐个发起注册。这里有个点需要注意,eureka发送同步请求时会在请求头中携带自定义的x-netflix-discovery-replication头,如果该值为true则不会再走同步请求,这样就解决了同步死循环的问题。
@Override
public void register(final InstanceInfo info, final boolean isReplication) {
int leaseDuration = Lease.DEFAULT_DURATION_IN_SECS;
//心跳超时时间默认为90秒,如果客户端有自己定义心跳超时时间,则采用客户端的时间
if (info.getLeaseInfo() != null && info.getLeaseInfo().getDurationInSecs() > 0) {
leaseDuration = info.getLeaseInfo().getDurationInSecs();
}
//节点注册
super.register(info, leaseDuration, isReplication);
//复制到Eureka Server集群中的其他节点
replicateToPeers(Action.Register, info.getAppName(), info.getId(), info, null, isReplication);
}
AbstractInstanceRegistry#register
注册过程核心逻辑,客户端地址信息就储存在了这个类的 private final ConcurrentHashMap<String, Map<String, Lease>> registry = new ConcurrentHashMap<String, Map<String, Lease>>() 属性中
/**
* Registers a new instance with a given duration.
*
* @see com.netflix.eureka.lease.LeaseManager#register(java.lang.Object, int, boolean)
*/
public void register(InstanceInfo registrant, int leaseDuration, boolean isReplication) {
try {
read.lock();
//根据appName从registry中获得当前实例信息(registry就是存储服务信息的容器)
Map<String, Lease<InstanceInfo>> gMap = registry.get(registrant.getAppName());
//增加注册次数到监控信息中
REGISTER.increment(isReplication);
if (gMap == null) {
//如果当前appName是第一次注册,则初始化一个ConcurrentHashMap
final ConcurrentHashMap<String, Lease<InstanceInfo>> gNewMap = new ConcurrentHashMap<String, Lease<InstanceInfo>>();
gMap = registry.putIfAbsent(registrant.getAppName(), gNewMap);
if (gMap == null) {
gMap = gNewMap;
}
}
//从gMap中查询已经存在的Lease信息,Lease中文翻译为租约,实际上它把服务提供者的实例信息包装成了一个lease,里面提供了对于改服务实例的租约管理
Lease<InstanceInfo> existingLease = gMap.get(registrant.getId());
// 当instance已经存在是,和客户端的instance的信息做比较,时间最新的那个,为有效instance信息
if (existingLease != null && (existingLease.getHolder() != null)) {
Long existingLastDirtyTimestamp = existingLease.getHolder().getLastDirtyTimestamp();
Long registrationLastDirtyTimestamp = registrant.getLastDirtyTimestamp();
logger.debug("Existing lease found (existing={}, provided={}", existingLastDirtyTimestamp, registrationLastDirtyTimestamp);
// this is a > instead of a >= because if the timestamps are equal, we still take the remote transmitted
// InstanceInfo instead of the server local copy.
if (existingLastDirtyTimestamp > registrationLastDirtyTimestamp) {
logger.warn("There is an existing lease and the existing lease's dirty timestamp {} is greater" +
" than the one that is being registered {}", existingLastDirtyTimestamp, registrationLastDirtyTimestamp);
logger.warn("Using the existing instanceInfo instead of the new instanceInfo as the registrant");
registrant = existingLease.getHolder();
}
} else {
//当lease不存在时,进入到这段代码
synchronized (lock) {
if (this.expectedNumberOfClientsSendingRenews > 0) {
// Since the client wants to register it, increase the number of clients sending renews
this.expectedNumberOfClientsSendingRenews = this.expectedNumberOfClientsSendingRenews + 1;
updateRenewsPerMinThreshold();
}
}
logger.debug("No previous lease information found; it is new registration");
}
//构建一个lease
Lease<InstanceInfo> lease = new Lease<InstanceInfo>(registrant, leaseDuration);
if (existingLease != null) {
// 当原来存在Lease的信息时,设置serviceUpTimestamp, 保证服务启动的时间一直是第一次注册的那个
lease.setServiceUpTimestamp(existingLease.getServiceUpTimestamp());
}
//储存
gMap.put(registrant.getId(), lease);
recentRegisteredQueue.add(new Pair<Long, String>(
System.currentTimeMillis(),
registrant.getAppName() + "(" + registrant.getId() + ")"));
// 检查实例状态是否发生变化,如果是并且存在,则覆盖原来的状态
if (!InstanceStatus.UNKNOWN.equals(registrant.getOverriddenStatus())) {
logger.debug("Found overridden status {} for instance {}. Checking to see if needs to be add to the "
+ "overrides", registrant.getOverriddenStatus(), registrant.getId());
if (!overriddenInstanceStatusMap.containsKey(registrant.getId())) {
logger.info("Not found overridden id {} and hence adding it", registrant.getId());
overriddenInstanceStatusMap.put(registrant.getId(), registrant.getOverriddenStatus());
}
}
InstanceStatus overriddenStatusFromMap = overriddenInstanceStatusMap.get(registrant.getId());
if (overriddenStatusFromMap != null) {
logger.info("Storing overridden status {} from map", overriddenStatusFromMap);
registrant.setOverriddenStatus(overriddenStatusFromMap);
}
// Set the status based on the overridden status rules
InstanceStatus overriddenInstanceStatus = getOverriddenInstanceStatus(registrant, existingLease, isReplication);
registrant.setStatusWithoutDirty(overriddenInstanceStatus);
// 得到instanceStatus,判断是否是UP状态
if (InstanceStatus.UP.equals(registrant.getStatus())) {
lease.serviceUp();
}
registrant.setActionType(ActionType.ADDED);
recentlyChangedQueue.add(new RecentlyChangedItem(lease));
registrant.setLastUpdatedTimestamp();
//让缓存失效
invalidateCache(registrant.getAppName(), registrant.getVIPAddress(), registrant.getSecureVipAddress());
logger.info("Registered instance {}/{} with status {} (replication={})",
registrant.getAppName(), registrant.getId(), registrant.getStatus(), isReplication);
} finally {
read.unlock();
}
}
PeerAwareInstanceRegistryImpl#replicateToPeers
2. Eureka Server多级缓存机制
参考:
https://www.cnblogs.com/shihaiming/p/11590748.html
https://www.shared-code.com/article/53
总结:
Eureka Server 存在三个变量:registry、readWriteCacheMap、readOnlyCacheMap 保存服务注册信息。
类 AbstractInstanceRegistry
private final ConcurrentHashMap<String, Map<String, Lease<InstanceInfo>>> registry
= new ConcurrentHashMap<String, Map<String, Lease<InstanceInfo>>>();
类 ResponseCacheImpl
private final ConcurrentMap<Key, Value> readOnlyCacheMap = new ConcurrentHashMap<Key, Value>();
private final LoadingCache<Key, Value> readWriteCacheMap;
当存在大规模的服务注册和更新时,如果只是修改 ConcurrentHashMap 里的数据,那么势必因为锁的存在导致竞争,影响性能。而 Eureka又是AP模型,只需要满足最终可用就行。所以它在这里用到多级缓存来实现读写分离。
registry: 服务下线,过期,注册,状态变更时会更新registry里面的数据。eureka server web页面查出的服务信息是从registry里获取的
readWriteCacheMap :
1.当服务下线,过期,注册,状态变更,都会来清除这个缓存里面的数据。
2.默认180秒会自动失效。
3.readOnlyCacheMap每30秒会自动同步readWriteCacheMap 里面的数据,当readWriteCacheMap里面没有数据时会自动更新数据(从registry里获取),即30秒的那个自动任务可以使已经失效被清除的readWriteCacheMap重新加载。
readOnlyCacheMap : 这是一个JVM的CurrentHashMap只读缓存,这个主要是为了供客户端获取注册信息时使用,其缓存更新,依赖于定时器(默认30秒执行一次),通过和readWriteCacheMap 的值做对比,如果数据不一致,则以readWriteCacheMap 的数据为准。
四、总结
我自己去看源码时很多地方也是很懵,越看越迷。其实正常来说知道大概逻辑和流程也是不错的,没必要非得把很多地方都弄懂,看了eureka源码其实感觉也没什么好总结的,这里就大致写一下吧。
eureka客户端
eureka客户端在启动时会把自己注册到eureka服务端上,其实就是发个http请求,server端在接收到请求时把客户端发来的信息储存在一个ConcurrentHashMap里。客户端停止时也是如此罢了。
在客户端启动时除了注册到服务端,剩下比较重要的就是通过自动装配启动了一些定时任务(TimedSupervisorTask(自动调节间隔的周期性任务)),比如,定时续约(心跳机制)、定时拉去服务端的注册信息、定时推送信息到服务端。
eureka服务端
eureka服务端比较重要的就是把各个实例的信息储存在一个ConcurrentHashMap里。为了解决竞争引起的性能问题又引入了多级缓存的概念。
还有eureka服务端集群是如何同步注册信息的,以及同步引起的死循环问题是如果解决的。
还有服务端的自动保护机制:https://blog.youkuaiyun.com/qq_35080214/article/details/109443045