题主的问题描述太绕了,我们先把集群中的角色定义下:
Eureka架构
比较细节的架构图如下所示:
在配置多个EurekaServer的Service Provider,每次Service Provider启动的时候会选择一个Eureka Server,之后如果这个Eureka Server挂了,才会切换Eureka Server,在当前使用的Eureka Server挂掉之前,不会切换。
被Service Provider选择用来发送请求Eureka Server其实比其他Server多了一项工作,就是发客户端发来的请求,转发到集群中其他的Eureka Server。其实这个压力并没有太大,但是如果集群中实例个数比较多,或者心跳间隔比较短的情况下,的确有不小的压力。可以考虑每个服务配置的Eureka Server顺序不一样。
但是其实仔细想想,只是个请求转发,能有多大压力啊。。。。
最后,我们详细分析下服务注册与取消的源代码(可以直接参考下我的博客关于Eureka的系列分析张哈希的博客 - CSDN博客blog.csdn.net
):
关于服务注册开启/关闭服务注册配置:eureka.client.register-with-eureka = true (默认)
什么时候注册?应用第一次启动时,初始化EurekaClient时,应用状态改变:从STARTING变为UP会触发这个Listener,调用instanceInfoReplicator.onDemandUpdate(); 可以推测出,实例状态改变时,也会通过注册接口更新实例状态信息
statusChangeListener = new ApplicationInfoManager.StatusChangeListener() {
@Override
public String getId() {
return "statusChangeListener";
}
@Override
public void notify(StatusChangeEvent statusChangeEvent) {
if (InstanceStatus.DOWN == statusChangeEvent.getStatus() ||
InstanceStatus.DOWN == statusChangeEvent.getPreviousStatus()) {
// log at warn level if DOWN was involved
logger.warn("Saw local status change event {}", statusChangeEvent);
} else {
logger.info("Saw local status change event {}", statusChangeEvent);
}
instanceInfoReplicator.onDemandUpdate();
}
};定时任务,如果InstanceInfo发生改变,也会通过注册接口更新信息
public void run() {
try {
discoveryClient.refreshInstanceInfo();
//如果实例信息发生改变,则需要调用register更新InstanceInfo
Long dirtyTimestamp = instanceInfo.isDirtyWithTime();
if (dirtyTimestamp != null) {
discoveryClient.register();
instanceInfo.unsetIsDirty(dirtyTimestamp);
}
} catch (Throwable t) {
logger.warn("There was a problem with the instance info replicator", t);
} finally {
Future next = scheduler.schedule(this, replicationIntervalSeconds, TimeUnit.SECONDS);
scheduledPeriodicRef.set(next);
}
}在定时renew时,如果renew接口返回404(代表这个实例在EurekaServer上面找不到),可能是之前注册失败或者注册过期导致的。这时需要调用register重新注册
boolean renew() {
EurekaHttpResponse httpResponse;
try {
httpResponse = eurekaTransport.registrationClient.sendHeartBeat(instanceInfo.getAppName(), instanceInfo.getId(), instanceInfo, null);
logger.debug("{} - Heartbeat status: {}", PREFIX + appPathIdentifier, httpResponse.getStatusCode());
//如果renew接口返回404(代表这个实例在EurekaServer上面找不到),可能是之前注册失败或者注册过期导致的
if (httpResponse.getStatusCode() == 404) {
REREGISTER_COUNTER.increment();
logger.info("{} - Re-registering apps/{}", PREFIX + appPathIdentifier, instanceInfo.getAppName());
long timestamp = instanceInfo.setIsDirtyWithTime();
boolean success = register();
if (success) {
instanceInfo.unsetIsDirty(timestamp);
}
return success;
}
return httpResponse.getStatusCode() == 200;
} catch (Throwable e) {
logger.error("{} - was unable to send heartbeat!", PREFIX + appPathIdentifier, e);
return false;
}
}
向Eureka发送注册请求EurekaServer发生了什么?
主要有两个存储,一个是之前提到过的registry,还有一个最近变化队列,后面我们会知道,这个最近变化队列里面就是客户端获取增量实例信息的内容:
# 整体注册信息缓存
private final ConcurrentHashMap>> registry = new ConcurrentHashMap>>();
# 最近变化队列
private ConcurrentLinkedQueue recentlyChangedQueue = new ConcurrentLinkedQueue();
EurekaServer收到实例注册主要分两步:调用父类方法注册
同步到其他EurekaServer实例
public void register(InstanceInfo info, boolean isReplication) {
int leaseDuration = 90;
if (info.getLeaseInfo() != null && info.getLeaseInfo().getDurationInSecs() > 0) {
leaseDuration = info.getLeaseInfo().getDurationInSecs();
}
//调用父类方法注册
super.register(info, leaseDuration, isReplication);
//同步到其他EurekaServer实例
this.replicateToPeers(PeerAwareInstanceRegistryImpl.Action.Register, info.getAppName(), info.getId(), info, (InstanceStatus)null, isReplication);
}
我们先看同步到其他EurekaServer实例
其实就是,注册到的EurekaServer再依次调用其他集群内的EurekaServer的Register方法将实例信息同步过去
private void replicateToPeers(Action action, String appName, String id,
InstanceInfo info /* optional */,
InstanceStatus newStatus /* optional */, boolean isReplication) {
Stopwatch tracer = action.getTimer().start();
try {
if (isReplication) {
numberOfReplicationsLastMin.increment();
}
// If it is a replication already, do not replicate again as this will create a poison replication
if (peerEurekaNodes == Collections.EMPTY_LIST || isReplication) {
return;
}
for (final PeerEurekaNode node : peerEurekaNodes.getPeerEurekaNodes()) {
// If the url represents this host, do not replicate to yourself.
if (peerEurekaNodes.isThisMyUrl(node.getServiceUrl())) {
continue;
}
replicateInstanceActionsToPeers(action, appName, id, info, newStatus, node);
}
} finally {
tracer.stop();
}
}
private void replicateInstanceActionsToPeers(Action action, String appName,
String id, InstanceInfo info, InstanceStatus newStatus,
PeerEurekaNode node) {
try {
InstanceInfo infoFromRegistry = null;
CurrentRequestVersion.set(Version.V2);
switch (action) {
case Cancel:
node.cancel(appName, id);
break;
case Heartbeat:
InstanceStatus overriddenStatus = overriddenInstanceStatusMap.get(id);
infoFromRegistry = getInstanceByAppAndId(appName, id, false);
node.heartbeat(appName, id, infoFromRegistry, overriddenStatus, false);
break;
case Register:
node.register(info);
break;
case StatusUpdate:
infoFromRegistry = getInstanceByAppAndId(appName, id, false);
node.statusUpdate(appName, id, newStatus, infoFromRegistry);
break;
case DeleteStatusOverride:
infoFromRegistry = getInstanceByAppAndId(appName, id, false);
node.deleteStatusOverride(appName, id, infoFromRegistry);
break;
}
} catch (Throwable t) {
logger.error("Cannot replicate information to {} for action {}", node.getServiceUrl(), action.name(), t);
}
}
然后看看调用父类方法注册:
public void register(InstanceInfo registrant, int leaseDuration, boolean isReplication) {
try {
//register虽然看上去好像是修改,但是这里用的是读锁,后面会解释
read.lock();
//从registry中查看这个app是否存在
Map> gMap = registry.get(registrant.getAppName());
//不存在就创建
if (gMap == null) {
final ConcurrentHashMap> gNewMap = new ConcurrentHashMap>();
gMap = registry.putIfAbsent(registrant.getAppName(), gNewMap);
if (gMap == null) {
gMap = gNewMap;
}
}
//查看这个app的这个实例是否已存在
Lease existingLease = gMap.get(registrant.getId());
if (existingLease != null && (existingLease.getHolder() != null)) {
//如果已存在,对比时间戳,保留比较新的实例信息......
} else {
// 如果不存在,证明是一个新的实例
//更新自我保护监控变量的值的代码.....
}
Lease lease = new Lease(registrant, leaseDuration);
if (existingLease != null) {
lease.setServiceUpTimestamp(existingLease.getServiceUpTimestamp());
}
//放入registry
gMap.put(registrant.getId(), lease);
//加入最近修改的记录队列
recentlyChangedQueue.add(new RecentlyChangedItem(lease));
//初始化状态,记录时间等相关代码......
//主动让Response缓存失效
invalidateCache(registrant.getAppName(), registrant.getVIPAddress(), registrant.getSecureVipAddress());
} finally {
read.unlock();
}
}
总结起来,就是主要三件事:
1.将实例注册信息放入或者更新registry
2.将实例注册信息加入最近修改的记录队列
3.主动让Response缓存失效
我们来类比下服务取消
服务取消CANCEL
protected boolean internalCancel(String appName, String id, boolean isReplication) {
try {
//cancel虽然看上去好像是修改,但是这里用的是读锁,后面会解释
read.lock();
//从registry中剔除这个实例
Map> gMap = registry.get(appName);
Lease leaseToCancel = null;
if (gMap != null) {
leaseToCancel = gMap.remove(id);
}
if (leaseToCancel == null) {
logger.warn("DS: Registry: cancel failed because Lease is not registered for: {}/{}", appName, id);
return false;
} else {
//改变状态,记录状态修改时间等相关代码......
if (instanceInfo != null) {
instanceInfo.setActionType(ActionType.DELETED);
//加入最近修改的记录队列
recentlyChangedQueue.add(new RecentlyChangedItem(leaseToCancel));
}
//主动让Response缓存失效
invalidateCache(appName, vip, svip);
logger.info("Cancelled instance {}/{} (replication={})", appName, id, isReplication);
return true;
}
} finally {
read.unlock();
}
}
总结起来,也是主要三件事:
1.从registry中剔除这个实例
2.将实例注册信息加入最近修改的记录队列
3.主动让Response缓存失效
这里我们注意到了这个最近修改队列,我们来详细看看
最近修改队列
这个最近修改队列和消费者定时获取服务实例列表有着密切的关系
private TimerTask getDeltaRetentionTask() {
return new TimerTask() {
@Override
public void run() {
Iterator it = recentlyChangedQueue.iterator();
while (it.hasNext()) {
if (it.next().getLastUpdateTime() <
System.currentTimeMillis() - serverConfig.getRetentionTimeInMSInDeltaQueue()) {
it.remove();
} else {
break;
}
}
}
};
}
这个RetentionTimeInMSInDeltaQueue默认是180s(配置是eureka.server.retention-time-in-m-s-in-delta-queue,默认是180s,官网写错了),可以看出这个队列是一个长度为180s的滑动窗口,保存最近180s以内的应用实例信息修改,后面我们会看到,客户端调用获取增量信息,实际上就是从这个queue中读取,所以可能一段时间内读取到的信息都是一样的。
关于服务与实例列表获取
EurekaClient端
我们从Ribbon说起:EurekaClient也存在缓存,应用服务实例列表信息在每个EurekaClient服务消费端都有缓存。一般的,Ribbon的LoadBalancer会读取这个缓存,来知道当前有哪些实例可以调用,从而进行负载均衡。这个loadbalancer同样也有缓存。
首先看这个LoadBalancer的缓存更新机制,相关类是PollingServerListUpdater:
final Runnable wrapperRunnable = new Runnable() {
@Override
public void run() {
if (!isActive.get()) {
if (scheduledFuture != null) {
scheduledFuture.cancel(true);
}
return;
}
try {
//从EurekaClient缓存中获取服务实例列表,保存在本地缓存
updateAction.doUpdate();
lastUpdated = System.currentTimeMillis();
} catch (Exception e) {
logger.warn("Failed one update cycle", e);
}
}
};
//定时调度
scheduledFuture = getRefreshExecutor().scheduleWithFixedDelay(
wrapperRunnable,
initialDelayMs,
refreshIntervalMs,
TimeUnit.MILLISECONDS
);
这个updateAction.doUpdate();就是从EurekaClient缓存中获取服务实例列表,保存在BaseLoadBalancer的本地缓存:
protected volatile List allServerList = Collections.synchronizedList(new ArrayList());
public void setServersList(List lsrv) {
//写入allServerList的代码,这里略
}
@Override
public List getAllServers() {
return Collections.unmodifiableList(allServerList);
}
这里的getAllServers会在每个负载均衡规则中被调用,例如RoundRobinRule:
public Server choose(ILoadBalancer lb, Object key) {
if (lb == null) {
log.warn("no load balancer");
return null;
}
Server server = null;
int count = 0;
while (server == null && count++ < 10) {
List reachableServers = lb.getReachableServers();
//获取服务实例列表,调用的就是刚刚提到的getAllServers
List allServers = lb.getAllServers();
int upCount = reachableServers.size();
int serverCount = allServers.size();
if ((upCount == 0) || (serverCount == 0)) {
log.warn("No up servers available from load balancer: " + lb);
return null;
}
int nextServerIndex = incrementAndGetModulo(serverCount);
server = allServers.get(nextServerIndex);
if (server == null) {
/* Transient. */
Thread.yield();
continue;
}
if (server.isAlive() && (server.isReadyToServe())) {
return (server);
}
// Next.
server = null;
}
if (count >= 10) {
log.warn("No available alive servers after 10 tries from load balancer: "
+ lb);
}
return server;
}
这个缓存需要注意下,有时候我们只修改了EurekaClient缓存的更新时间,但是没有修改这个LoadBalancer的刷新本地缓存时间,就是ribbon.ServerListRefreshInterval,这个参数可以设置的很小,因为没有从网络读取,就是从一个本地缓存刷到另一个本地缓存(如何配置缓存配置来实现服务实例快速下线快速感知快速刷新,可以参考我的另一篇文章)。
然后我们来看一下EurekaClient本身的缓存,直接看关键类DiscoveryClient的相关源码,我们这里只关心本地Region的,多Region配置我们先忽略:
//本地缓存,可以理解为是一个软链接
private final AtomicReference localRegionApps = new AtomicReference();
private void initScheduledTasks() {
//如果配置为需要拉取服务列表,则设置定时拉取任务,这个配置默认是需要拉取服务列表
if (clientConfig.shouldFetchRegistry()) {
// registry cache refresh timer
int registryFetchIntervalSeconds = clientConfig.getRegistryFetchIntervalSeconds();
int expBackOffBound = clientConfig.getCacheRefreshExecutorExponentialBackOffBound();
scheduler.schedule(
new TimedSupervisorTask(
"cacheRefresh",
scheduler,
cacheRefreshExecutor,
registryFetchIntervalSeconds,
TimeUnit.SECONDS,
expBackOffBound,
new CacheRefreshThread()
),
registryFetchIntervalSeconds, TimeUnit.SECONDS);
}
//其他定时任务初始化的代码,忽略
}
//定时从EurekaServer拉取服务列表的任务
class CacheRefreshThread implements Runnable {
public void run() {
refreshRegistry();
}
}
void refreshRegistry() {
try {
//多Region配置处理代码,忽略
boolean success = fetchRegistry(remoteRegionsModified);
if (success) {
registrySize = localRegionApps.get().size();
lastSuccessfulRegistryFetchTimestamp = System.currentTimeMillis();
}
//日志代码,忽略
} catch (Throwable e) {
logger.error("Cannot fetch registry from server", e);
}
}
//定时从EurekaServer拉取服务列表的核心方法
private boolean fetchRegistry(boolean forceFullRegistryFetch) {
Stopwatch tracer = FETCH_REGISTRY_TIMER.start();
try {
Applications applications = getApplications();
//判断,如果是第一次拉取,或者app列表为空,就进行全量拉取,否则就会进行增量拉取
if (clientConfig.shouldDisableDelta()
|| (!Strings.isNullOrEmpty(clientConfig.getRegistryRefreshSingleVipAddress()))
|| forceFullRegistryFetch
|| (applications == null)
|| (applications.getRegisteredApplications().size() == 0)
|| (applications.getVersion() == -1)) //Client application does not have latest library supporting delta
{
getAndStoreFullRegistry();
} else {
getAndUpdateDelta(applications);
}
applications.setAppsHashCode(applications.getReconcileHashCode());
logTotalInstances();
} catch (Throwable e) {
logger.error(PREFIX + appPathIdentifier + " - was unable to refresh its cache! status = " + e.getMessage(), e);
return false;
} finally {
if (tracer != null) {
tracer.stop();
}
}
//缓存更新完成,发送个event给观察者,目前没啥用
onCacheRefreshed();
// 检查下远端的服务实例列表里面包括自己,并且状态是否对,这里我们不关心
updateInstanceRemoteStatus();
// registry was fetched successfully, so return true
return true;
}
//全量拉取代码
private void getAndStoreFullRegistry() throws Throwable {
long currentUpdateGeneration = fetchRegistryGeneration.get();
Applications apps = null;
//访问/eureka/apps接口,拉取所有服务实例信息
EurekaHttpResponse httpResponse = clientConfig.getRegistryRefreshSingleVipAddress() == null
? eurekaTransport.queryClient.getApplications(remoteRegionsRef.get())
: eurekaTransport.queryClient.getVip(clientConfig.getRegistryRefreshSingleVipAddress(), remoteRegionsRef.get());
if (httpResponse.getStatusCode() == Status.OK.getStatusCode()) {
apps = httpResponse.getEntity();
}
logger.info("The response status is {}", httpResponse.getStatusCode());
if (apps == null) {
logger.error("The application is null for some reason. Not storing this information");
} else if (fetchRegistryGeneration.compareAndSet(currentUpdateGeneration, currentUpdateGeneration + 1)) {
localRegionApps.set(this.filterAndShuffle(apps));
logger.debug("Got full registry with apps hashcode {}", apps.getAppsHashCode());
} else {
logger.warn("Not updating applications as another thread is updating it already");
}
}
//增量拉取代码
private void getAndUpdateDelta(Applications applications) throws Throwable {
long currentUpdateGeneration = fetchRegistryGeneration.get();
Applications delta = null;
//访问/eureka/delta接口,拉取所有服务实例增量信息
EurekaHttpResponse httpResponse = eurekaTransport.queryClient.getDelta(remoteRegionsRef.get());
if (httpResponse.getStatusCode() == Status.OK.getStatusCode()) {
delta = httpResponse.getEntity();
}
if (delta == null) {
//如果delta为空,拉取增量失败,就全量拉取
logger.warn("The server does not allow the delta revision to be applied because it is not safe. "
+ "Hence got the full registry.");
getAndStoreFullRegistry();
} else if (fetchRegistryGeneration.compareAndSet(currentUpdateGeneration, currentUpdateGeneration + 1)) {
//这里设置原子锁的原因是怕某次调度网络请求时间过长,导致同一时间有多线程拉取到增量信息并发修改
//拉取增量成功,检查hashcode是否一样,不一样的话也会全量拉取
logger.debug("Got delta update with apps hashcode {}", delta.getAppsHashCode());
String reconcileHashCode = "";
if (fetchRegistryUpdateLock.tryLock()) {
try {
updateDelta(delta);
reconcileHashCode = getReconcileHashCode(applications);
} finally {
fetchRegistryUpdateLock.unlock();
}
} else {
logger.warn("Cannot acquire update lock, aborting getAndUpdateDelta");
}
// There is a diff in number of instances for some reason
if (!reconcileHashCode.equals(delta.getAppsHashCode()) || clientConfig.shouldLogDeltaDiff()) {
reconcileAndLogDifference(delta, reconcileHashCode); // this makes a remoteCall
}
} else {
logger.warn("Not updating application delta as another thread is updating it already");
logger.debug("Ignoring delta update with apps hashcode {}, as another thread is updating it already", delta.getAppsHashCode());
}
}
以上就是对于EurekaClient拉取服务实例信息的源代码分析,总结EurekaClient 重要缓存如下:EurekaClient第一次全量拉取,定时增量拉取应用服务实例信息,保存在缓存中。
EurekaClient增量拉取失败,或者增量拉取之后对比hashcode发现不一致,就会执行全量拉取,这样避免了网络某时段分片带来的问题。
同时对于服务调用,如果涉及到ribbon负载均衡,那么ribbon对于这个实例列表也有自己的缓存,这个缓存定时从EurekaClient的缓存更新
EurekaServer端
在EurekaServer端,所有的读取请求都是读的ReadOnlyMap(这个可以配置) 有定时任务会定时从ReadWriteMap同步到ReadOnlyMap这个时间配置是:
#eureka server刷新readCacheMap的时间,注意,client读取的是readCacheMap,这个时间决定了多久会把readWriteCacheMap的缓存更新到readCacheMap上
#默认30s
eureka.server.responseCacheUpdateInvervalMs=3000
相关代码:
if (shouldUseReadOnlyResponseCache) {
timer.schedule(getCacheUpdateTask(),
new Date(((System.currentTimeMillis() / responseCacheUpdateIntervalMs) * responseCacheUpdateIntervalMs)
+ responseCacheUpdateIntervalMs),
responseCacheUpdateIntervalMs);
}
private TimerTask getCacheUpdateTask() {
return new TimerTask() {
@Override
public void run() {
logger.debug("Updating the client cache from response cache");
for (Key key : readOnlyCacheMap.keySet()) {
if (logger.isDebugEnabled()) {
Object[] args = {key.getEntityType(), key.getName(), key.getVersion(), key.getType()};
logger.debug("Updating the client cache from response cache for key : {} {} {} {}", args);
}
try {
CurrentRequestVersion.set(key.getVersion());
Value cacheValue = readWriteCacheMap.get(key);
Value currentCacheValue = readOnlyCacheMap.get(key);
if (cacheValue != currentCacheValue) {
readOnlyCacheMap.put(key, cacheValue);
}
} catch (Throwable th) {
logger.error("Error while updating the client cache from response cache", th);
}
}
}
};
}
ReadWriteMap是一个LoadingCache,将Registry中的服务实例信息封装成要返回的http响应(分别是经过gzip压缩和非压缩的),同时还有两个特殊key,ALL_APPS和ALL_APPS_DELTA ALL_APPS就是所有服务实例信息 ALL_APPS_DELTA就是之前讲注册说的RecentlyChangedQueue里面的实例列表封装的http响应信息