在第十篇中我们大概分析了一下,使用http长轮询的方式进行数据同步。这一篇主要分析一下,其中的技术细节。
废话不多说直接上源码。
背景是HttpSyncDataService是进行数据同步的实现类,在初始化构造函数的时候调用了对象方法start()。
我们就先从start方法开始,探讨一下值得分析的技术点。
代码片段1——start方法
private void start() {
// It could be initialized multiple times, so you need to control that.
if (RUNNING.compareAndSet(false, true)) { //1
// fetch all group configs.
this.fetchGroupConfig(ConfigGroupEnum.values());
int threadSize = serverList.size();
this.executor = new ThreadPoolExecutor(threadSize, threadSize, 60L, TimeUnit.SECONDS,
new LinkedBlockingQueue<>(),
SoulThreadFactory.create("http-long-polling", true)); //2
// start long polling, each server creates a thread to listen for changes.
this.serverList.forEach(server -> this.executor.execute(new HttpLongPollingTask(server)));
} else {
log.info("soul http long polling was started, executor=[{}]", executor);
}
}
代码片段1中的2个技术点:
- 使用AtomicBoolean原子数据类型来处理并发开关控制,当需要开启和暂停的服务或者线程的时候可以这么使用。这里也可以使用synchronizd加锁的方式实现,保证开关的操作是同步且线程可见的即可。但原子数据类型由于是采用的cas技术所以性能更加。
- 创建一个线程数量和admin服务数量相等的线程池,且自定义了线程工厂来创建线程对象。并且把创建的线程设置为了守护线程,守护线程在非守护线程都结束时将自动销毁(联想一下jvm的垃圾回收线程如何退出)。
public static ThreadFactory create(final String namePrefix, final boolean daemon, final int priority) { return new SoulThreadFactory(namePrefix, daemon, priority); } @Override public Thread newThread(final Runnable runnable) { Thread thread = new Thread(THREAD_GROUP, runnable, THREAD_GROUP.getName() + "-" + namePrefix + "-" + THREAD_NUMBER.getAndIncrement()); thread.setDaemon(daemon); thread.setPriority(priority); return thread; }
代码片段2——fetchGroupConfig
private void fetchGroupConfig(final ConfigGroupEnum... groups) throws SoulException {
for (int index = 0; index < this.serverList.size(); index++) {
String server = serverList.get(index);
try {
this.doFetchGroupConfig(server, groups);
break;
} catch (SoulException e) {
// no available server, throw exception.
if (index >= serverList.size() - 1) {
throw e;
}
log.warn("fetch config fail, try another one: {}", serverList.get(index + 1));
}
}
}
private void doFetchGroupConfig(final String server, final ConfigGroupEnum... groups) {
StringBuilder params = new StringBuilder();
for (ConfigGroupEnum groupKey : groups) {
params.append("groupKeys").append("=").append(groupKey.name()).append("&");
}
String url = server + "/configs/fetch?" + StringUtils.removeEnd(params.toString(), "&");
log.info("request configs: [{}]", url);
String json = null;
try {
json = this.httpClient.getForObject(url, String.class);
} catch (RestClientException e) {
String message = String.format("fetch config fail from server[%s], %s", url, e.getMessage());
log.warn(message);
throw new SoulException(message, e);
}
// update local cache
boolean updated = this.updateCacheWithJson(json);
if (updated) {
log.info("get latest configs: [{}]", json);
return;
}
// not updated. it is likely that the current config server has not been updated yet. wait a moment.
log.info("The config of the server[{}] has not been updated or is out of date. Wait for 30s to listen for changes again.", server);
ThreadUtils.sleep(TimeUnit.SECONDS, 30);
}
private boolean updateCacheWithJson(final String json) {
JsonObject jsonObject = GSON.fromJson(json, JsonObject.class);
JsonObject data = jsonObject.getAsJsonObject("data");
// if the config cache will be updated?
return factory.executor(data);
}
public boolean executor(final JsonObject data) {
final boolean[] success = {false};
ENUM_MAP.values().parallelStream().forEach(dataRefresh -> success[0] = dataRefresh.refresh(data));//1
return success[0];
}
代码片段2中的1个技术点:
- excutor方法中使用使用了java8的流式语法parallelStream并行的处理数据刷新,**但整个处理流程是阻塞的,换句话说parallelStream.foreach每条线之间是并行的,但success会同步流处理的结果。**这里parram由于是异步线程处理,并且将返回的数据使用final数组进行接收,这里由于跨线程进行数据同步,必须得使用通过改变final对象的值来进行同步(因为对象的所在的堆空间是线程共享的。),而基础数据类型boolean的值由于保存在栈中所以不具备可见性。
代码片段3——soul端doLongPolling处理逻辑
@Override
public void run() {
while (RUNNING.get()) {
for (int time = 1; time <= retryTimes; time++) { //1
try {
doLongPolling(server);
} catch (Exception e) {
// print warnning log.
if (time < retryTimes) {
log.warn("Long polling failed, tried {} times, {} times left, will be suspended for a while! {}",
time, retryTimes - time, e.getMessage());
ThreadUtils.sleep(TimeUnit.SECONDS, 5);
continue;
}
// print error, then suspended for a while.
log.error("Long polling failed, try again after 5 minutes!", e);
ThreadUtils.sleep(TimeUnit.MINUTES, 5);
}
}
}
log.warn("Stop http long polling.");
}
@SuppressWarnings("unchecked")
private void doLongPolling(final String server) {
MultiValueMap<String, String> params = new LinkedMultiValueMap<>(8);
//2
for (ConfigGroupEnum group : ConfigGroupEnum.values()) {
ConfigData<?> cacheConfig = factory.cacheConfigData(group);
String value = String.join(",", cacheConfig.getMd5(), String.valueOf(cacheConfig.getLastModifyTime()));
params.put(group.name(), Lists.newArrayList(value));
}
HttpHeaders headers = new HttpHeaders();
headers.setContentType(MediaType.APPLICATION_FORM_URLENCODED);
HttpEntity httpEntity = new HttpEntity(params, headers);
String listenerUrl = server + "/configs/listener";
log.debug("request listener configs: [{}]", listenerUrl);
JsonArray groupJson = null;
try {
String json = this.httpClient.postForEntity(listenerUrl, httpEntity, String.class).getBody();
log.debug("listener result: [{}]", json);
groupJson = GSON.fromJson(json, JsonObject.class).getAsJsonArray("data");
} catch (RestClientException e) {
...
}
}
代码片段2中的技术点:
- 在fetchGroupConfig方法时我们已经拉去了全量配置数据,接下来又用线程池执行了异步任务去每个admin服务单独同步数据,具体就是run方法中的逻辑。这里run方法采用了while轮训的方式,并且使用了一个for循环进行了3次doLongPolling操作。
- DoLongPolling操作,我们可以看到它将本地的缓存数据塞到map里发送到了admin的/configs/listener接口,再一次获取了admin的配置数据,这个好理解。问题是问什么要执行3次doLongPolling操作呢?需要看代码片段4,admin服务端的/configs/listener接口逻辑
代码片段4——admin端doLongPolling处理逻辑
public void doLongPolling(final HttpServletRequest request, final HttpServletResponse response) {
// compare group md5
List<ConfigGroupEnum> changedGroup = compareChangedGroup(request); //1
String clientIp = getRemoteIp(request);
// response immediately.
if (CollectionUtils.isNotEmpty(changedGroup)) {
this.generateResponse(response, changedGroup);
log.info("send response with the changed group, ip={}, group={}", clientIp, changedGroup);
return;
}
// listen for configuration changed.
final AsyncContext asyncContext = request.startAsync(); //2
// AsyncContext.settimeout() does not timeout properly, so you have to control it yourself
asyncContext.setTimeout(0L);
// block client's thread.
scheduler.execute(new LongPollingClient(asyncContext, clientIp, HttpConstants.SERVER_MAX_HOLD_TIMEOUT)); //3
}
public HttpLongPollingDataChangedListener(final HttpSyncProperties httpSyncProperties) {
this.clients = new ArrayBlockingQueue<>(1024);
this.scheduler = new ScheduledThreadPoolExecutor(1,
SoulThreadFactory.create("long-polling", true));
this.httpSyncProperties = httpSyncProperties;
}
class LongPollingClient implements Runnable {
...
@Override
public void run() {
this.asyncTimeoutFuture = scheduler.schedule(() -> {
clients.remove(LongPollingClient.this);
List<ConfigGroupEnum> changedGroups = compareChangedGroup((HttpServletRequest) asyncContext.getRequest());
sendResponse(changedGroups);
}, timeoutTime, TimeUnit.MILLISECONDS);
clients.add(this);
}
...
}
代码片段4的3个技术点:
- compareChangedGroup 比较了soul网关发送过来的数据,简单的说就说就是检测数据是否一致,如果不一致则生成新的数据立刻同步给soul请求方。
- 如果不一样,则异步获取servlet请求的上下文。这个特性需要servlet3.0版本以上才支持。这样做的目的是支持将servlet上下文的信息同步到异步线程进行处理,使异步线程可以获取request上下信息。(思考一下它的技术原理是什么?)
- 从构造函数中我们可以看到,异步线程使用了只有1个核心线程数的定时线程执行器来执行。具体的任务逻辑我们从run方法中,它的逻辑是使用定时任务执行器executor来执行一个定时任务(所以定时任务的实现原理是?),然后将本次调用方的请求client添加到阻塞队列(最大1024个)。定时任务的逻辑是延迟60秒执行:从阻塞队列中移除60秒前放入的LongPollingClient对象,然后从asyncContext中再次执行compareChangedGroup获取全量配置,并发送给soul调用方。这里就实现了当请求过于频繁,导致阻塞队列塞满后,多出的请求会抛出异常,达到控流的效果。
代码片段5——compareChangedGroup
private List<ConfigGroupEnum> compareChangedGroup(final HttpServletRequest request) {
List<ConfigGroupEnum> changedGroup = new ArrayList<>(ConfigGroupEnum.values().length);
for (ConfigGroupEnum group : ConfigGroupEnum.values()) {
// md5,lastModifyTime
String[] params = StringUtils.split(request.getParameter(group.name()), ',');
if (params == null || params.length != 2) {
throw new SoulException("group param invalid:" + request.getParameter(group.name()));
}
String clientMd5 = params[0];
long clientModifyTime = NumberUtils.toLong(params[1]); //1
ConfigDataCache serverCache = CACHE.get(group.name());
// do check.
if (this.checkCacheDelayAndUpdate(serverCache, clientMd5, clientModifyTime)) {
changedGroup.add(group);
}
}
return changedGroup;
}
private boolean checkCacheDelayAndUpdate(final ConfigDataCache serverCache, final String clientMd5, final long clientModifyTime) {
// is the same, doesn't need to be updated
if (StringUtils.equals(clientMd5, serverCache.getMd5())) { //2
return false;
}
// if the md5 value is different, it is necessary to compare lastModifyTime.
long lastModifyTime = serverCache.getLastModifyTime();
if (lastModifyTime >= clientModifyTime) {
// the client's config is out of date.
return true;
}
// the lastModifyTime before client, then the local cache needs to be updated.
// Considering the concurrency problem, admin must lock,
// otherwise it may cause the request from soul-web to update the cache concurrently, causing excessive db pressure
boolean locked = false;
try {
locked = LOCK.tryLock(5, TimeUnit.SECONDS); //3
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
return true;
}
if (locked) {
try {
ConfigDataCache latest = CACHE.get(serverCache.getGroup());
if (latest != serverCache) {
// the cache of admin was updated. if the md5 value is the same, there's no need to update.
return !StringUtils.equals(clientMd5, latest.getMd5());
}
// load cache from db.
this.refreshLocalCache();
latest = CACHE.get(serverCache.getGroup());
return !StringUtils.equals(clientMd5, latest.getMd5());
} finally {
LOCK.unlock();
}
}
// not locked, the client need to be updated.
return true;
}
代码片段5的3两个技术点:
代码片段5中主要是进行请求方数据与服务端数据是否一致,如果不一致将进行数据更新,将新数据发送到soul请求发。
- 获取soul请求方的数据md5和clientModifyTime值
- 通过将admin服务缓存的数据做md5值与请求方的md5值进行比较,如果一致则返回false。如果md5值不同,再比较最后一次修改时间,如果admin服务器的修改时间大于等于client的时间说明不需要再同步了,否则就需要同步admin服务端的数据给client端了。
- 使用ReentrantLock加锁来保证数据更新的同步性,避免导致并发问题。
总结
综上,soul-web和admin之间采取了纯粹的基于http的方式进行的数据同步。因为http不是全双工通信,所以不能服务端主动推送数据给客户端,要依靠客户端拉取的方式来进行数据同步就必定会存在数据不实时同步的情况。那么想基于http进行数据同步,就只能使用一个while轮训+控制好请求时间间隔来实现。
不仅如此,还需要考虑到很多并发和异常情况,比如:
- 请求失败后的重拾处理策略。代码片段3——异常重拾
- 客户端请求过于频繁,服务端的流控策略。代码片段4——使用阻塞队列
- 服务端如何决定是否同步数据。代码片段5——md5+更新时间双重校验