如何保证数据的一致性-raft算法
要想在分布式系统中保证服务的高可用,则必须用集群来,不可能是单节点,这样服务宕机会导致服务不可用,会造成严重的损失。
1.raft算法、paxos算法。
raft算法的组件应用有 redis,sentinel,etcd。paxos算法的组件应用有:zk。
1.1 leader选举
代码如下`RaftCore.java
/**
* Init raft core.
*
* @throws Exception any exception during init
*/
@PostConstruct//bean加载完后会被执行
public void init() throws Exception {
Loggers.RAFT.info("initializing Raft sub-system");
final long start = System.currentTimeMillis();
//加载本地日志的数据
raftStore.loadDatums(notifier, datums);
/*加载当前的term*/ setTerm(NumberUtils.toLong(raftStore.loadMeta().getProperty("term"), 0L));
Loggers.RAFT.info("cache loaded, datum count: {}, current term: {}", datums.size(), peers.getTerm());
initialized = true;
Loggers.RAFT.info("finish to load data from disk, cost: {} ms.", (System.currentTimeMillis() - start));
//每隔500ms检查leader的状态,进行选举,来看MasterElection这个类
masterTask = GlobalExecutor.registerMasterElection(new MasterElection());
//建立集群下面的心跳机制
heartbeatTask = GlobalExecutor.registerHeartbeat(new HeartBeat());
versionJudgement.registerObserver(isAllNewVersion -> {
stopWork = isAllNewVersion;
if (stopWork) {
try {
shutdown();
} catch (NacosException e) {
throw new NacosRuntimeException(NacosException.SERVER_ERROR, e);
}
}
}, 100);
NotifyCenter.registerSubscriber(notifier);
Loggers.RAFT.info("timer started: leader timeout ms: {}, heart-beat timeout ms: {}",
GlobalExecutor.LEADER_TIMEOUT_MS, GlobalExecutor.HEARTBEAT_INTERVAL_MS);
}
1.1.1 启动的时候进行选举
每个节点都有一个term期数。投票的过程期数会增加,数据同步时期数也会增加,每次投票时,会随机生成150ms-300ms时间的倒计时,那个节点时间最先小于0,则发起投票,用来投自己。期数+1,当票数过半时,会主动成为leader,然后建立心跳检测。
//选举,会启动一个线程
public class MasterElection implements Runnable {
@Override
public void run() {
try {
if (stopWork) {
return;
}
if (!peers.isReady()) {
return;
}
//得到本地的信息
RaftPeer local = peers.local();
//发欺投票的倒计时
//0-15000ms随机生成的一个值,每一次减去500ms
local.leaderDueMs -= GlobalExecutor.TICK_PERIOD_MS;
//没办法选举
if (local.leaderDueMs > 0) {
return;//不具备leader的条件
}
// reset timeout
//重置时间,leader
local.resetLeaderDue();
//重置心跳时间
local.resetHeartbeatDue();
sendVote();//发起投票
} catch (Exception e) {
Loggers.RAFT.warn("[RAFT] error while master election {}", e);
}
}
private void sendVote() {
//得到本机的任务
RaftPeer local = peers.get(NetUtils.localServer());
Loggers.RAFT.info("leader timeout, start voting,leader: {}, term: {}", JacksonUtils.toJson(getLeader()),
local.term);
peers.reset();//重置
local.term.incrementAndGet();//增加term的
local.voteFor = local.ip;//当前要投我
local.state = RaftPeer.State.CANDIDATE;//候选状态
Map<String, String> params = new HashMap<>(1);
params.put("vote", JacksonUtils.toJson(local));
//遍历除了本机外的其他节点
for (final String server : peers.allServersWithoutMySelf()) {
final String url = buildUrl(server, API_VOTE);
try {
//异步发送http请求
HttpClient.asyncHttpPost(url, null, params, new Callback<String>() {
@Override
public void onReceive(RestResult<String> result) {
if (!result.ok()) {
Loggers.RAFT.error("NACOS-RAFT vote failed: {}, url: {}", result.getCode(), url);
return;
}
RaftPeer peer = JacksonUtils.toObj(result.getData(), RaftPeer.class);
Loggers.RAFT.info("received approve from peer: {}", JacksonUtils.toJson(peer));
//决策谁时leader
peers.decideLeader(peer);
}
@Override
public void onError(Throwable throwable) {
Loggers.RAFT.error("error while sending vote to server: {}", server, throwable);
}
@Override
public void onCancel() {
}
});
} catch (Exception e) {
Loggers.RAFT.warn("error while sending vote to server: {}", server);
}
}
}
}
接下来看时如何发起投票选举的,HttpClient.asyncHttpPost(url, null, params, new Callback(),其中url是要传的地址,url=vote,则访问的类为raftcontroller.vote方法
public synchronized RaftPeer receivedVote(RaftPeer remote) {
if (stopWork) {
throw new IllegalStateException("old raft protocol already stop work");
}
//是否包含包含远程传过来的ip
if (!peers.contains(remote)) {
throw new IllegalStateException("can not find peer: " + remote.ip);
}
//得到本机的信息
RaftPeer local = peers.get(NetUtils.localServer());
//远程期数<本机期数做比较 说明其他节点的票据过期
if (remote.term.get() <= local.term.get()) {
//
String msg = "received illegitimate vote" + ", voter-term:" + remote.term + ", votee-term:" + local.term;
Loggers.RAFT.info(msg);
if (StringUtils.isEmpty(local.voteFor)) {//如果传过来的位空
local.voteFor = local.ip;//把票投给自己
}
//A(term 1),B(term 2),C
//当前节点为B,收到的票A,把votefor=b
return local;//返回A,Vorefor=b;
}
local.resetLeaderDue();
local.state = RaftPeer.State.FOLLOWER;
local.voteFor = remote.ip;//voterfor=A
//期数同步为最大的期数
local.term.set(remote.term.get());
Loggers.RAFT.info("vote {} as leader, term: {}", remote.ip, remote.term);
return local;
}
投票结束后,接下来看是如何选举leader的也就是决策,peers.decideLeader(peer);
public RaftPeer decideLeader(RaftPeer candidate) {
peers.put(candidate.ip, candidate);
SortedBag ips = new TreeBag();
int maxApproveCount = 0;
String maxApprovePeer = null;
/**
* A,B,C
* 第一轮,Peer:A-a,B->b,C->c
* 第一轮,Peer:A-a,B->a,C->c
*/
for (RaftPeer peer : peers.values()) {
if (StringUtils.isEmpty(peer.voteFor)) {
continue;
}
ips.add(peer.voteFor);
//第一次和第二次遍历
if (ips.getCount(peer.voteFor) > maxApproveCount) {
maxApproveCount = ips.getCount(peer.voteFor);
maxApprovePeer = peer.voteFor;
}
}
//maxApproveCount=1,maxApprovePeer=a
//判断是不是大于过半
if (maxApproveCount >= majorityCount()) {
//此时最大的节点作为leader
RaftPeer peer = peers.get(maxApprovePeer);
peer.state = RaftPeer.State.LEADER;//把当前节点设置为leader
if (!Objects.equals(leader, peer)) {
leader = peer;
ApplicationUtils.publishEvent(new LeaderElectFinishedEvent(this, leader, local()));
Loggers.RAFT.info("{} has become the LEADER", leader.ip);
}
}
return leader;//如果不满足,Leader=null,继续进行下一次的选举
}
1.1.2 leader宕机时进行选举
每个节点都有一个term期数。投票的过程期数会增加,数据同步时期数也会增加,每次投票时,会随机生成150ms-300ms时间的倒计时,那个节点时间最先小于0,则发起投票,用来投自己。期数+1,当票数过半时,会主动成为leader,然后建立心跳检测。注意的是节点投票时不接受期数比自己小的节点
1.2 数据的一致性同步
在数据同步时,采用2pc协议,当数据发送到leader节点上时,则会默认返回给客户端成功,然后leader节点先把数据写入本地log中,再发起同步给follower节点,过半节点写入log成功,则leader发起commit提交,期数+1;
1.2.1保证集群节点中所有的节点数据的一致性
接下来看数据一致性的源码解析,一致性是再注册上节点时写的操作。
public void addInstance(String namespaceId, String serviceName, boolean ephemeral, Instance... ips)
throws NacosException {
String key = KeyBuilder.buildInstanceListKey(namespaceId, serviceName, ephemeral);
Service service = getService(namespaceId, serviceName);
synchronized (service) {
List<Instance> instanceList = addIpAddresses(service, ephemeral, ips);
Instances instances = new Instances();
instances.setInstanceList(instanceList);
//这个是一致性同步
consistencyService.put(key, instances);
}
}
public void signalPublish(String key, Record value) throws Exception {
if (stopWork) {
throw new IllegalStateException("old raft protocol already stop work");
}
/*如果当前请求不是leader,则会转发给leader节点去同步*/
if (!isLeader()) {
ObjectNode params = JacksonUtils.createEmptyJsonNode();
params.put("key", key);
params.replace("value", JacksonUtils.transferToJsonNode(value));
Map<String, String> parameters = new HashMap<>(1);
parameters.put("key", key);
//获取leader节点
final RaftPeer leader = getLeader();
//把数据转发到leader节点上
raftProxy.proxyPostLarge(leader.ip, API_PUB, params.toString(), parameters);
return;
}
OPERATE_LOCK.lock();
try {
final long start = System.currentTimeMillis();
final Datum datum = new Datum();
datum.key = key;
datum.value = value;
if (getDatum(key) == null) {
datum.timestamp.set(1L);
} else {
datum.timestamp.set(getDatum(key).timestamp.incrementAndGet());
}
ObjectNode json = JacksonUtils.createEmptyJsonNode();
json.replace("datum", JacksonUtils.transferToJsonNode(datum));
json.replace("source", JacksonUtils.transferToJsonNode(peers.local()));
//发布1.更新,
// 2.内容更新(写入本地日志)
//
onPublish(datum, peers.local());
final String content = json.toString();
final CountDownLatch latch = new CountDownLatch(peers.majorityCount());
for (final String server : peers.allServersIncludeMyself()) {//广播,
if (isLeader(server)) {
latch.countDown();
continue;
}
final String url = buildUrl(server, API_ON_PUB);
HttpClient.asyncHttpPostLarge(url, Arrays.asList("key", key), content, new Callback<String>() {
@Override
public void onReceive(RestResult<String> result) {
if (!result.ok()) {
Loggers.RAFT
.warn("[RAFT] failed to publish data to peer, datumId={}, peer={}, http code={}",
datum.key, server, result.getCode());
return;
}
latch.countDown();
}
@Override
public void onError(Throwable throwable) {
Loggers.RAFT.error("[RAFT] failed to publish data to peer", throwable);
}
@Override
public void onCancel() {
}
});
}
if (!latch.await(UtilsAndCommons.RAFT_PUBLISH_TIMEOUT, TimeUnit.MILLISECONDS)) {
// only majority servers return success can we consider this update success
Loggers.RAFT.error("data publish failed, caused failed to notify majority, key={}", key);
throw new IllegalStateException("data publish failed, caused failed to notify majority, key=" + key);
}
long end = System.currentTimeMillis();
Loggers.RAFT.info("signalPublish cost {} ms, key: {}", (end - start), key);
} finally {
OPERATE_LOCK.unlock();
}
}
2.raft 原理
基于2pc 弱一致性原理,保证高可用
2.1 节点状态
- leader:领导
- follower:
- candidate:竞选状态