DARE: High-Performance State Machine Replication on RDMA Networks

DARE是一种在RDMA网络上实现高吞吐量状态机复制的方法。当领导者故障被所有非故障服务器检测到后,会触发领导者选举。通过使用随机超时确保最终选举出新的领导者。每个服务器定期检查心跳,如果其任期较小,则表明领导权变更,服务器更新自己的任期以表示支持。领导者通过可靠连接(RC)传输机制的队列对(QP)超时来检测失败的服务器。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

The log is described by four dynamic pointers
  • commit points to the first not-committed log entry; it is updated by the leader during log replication
struct dare_log_t
{
    uint64_t write;
    
    uint64_t len;
};

static int
rc_memory_reg()
{
    /* Register memory for local log */    
    IBDEV->lcl_mr[LOG_QP] = ibv_reg_mr(IBDEV->rc_pd,
            SRV_DATA->log, sizeof(dare_log_t) + SRV_DATA->log->len, 
            IBV_ACCESS_REMOTE_WRITE | IBV_ACCESS_REMOTE_ATOMIC | 
            IBV_ACCESS_REMOTE_READ | IBV_ACCESS_LOCAL_WRITE);
    
    /* !!length = sizeof(dare_log_t) + SRV_DATA->log->len */
    
    return 0;
}


When the leader issuspected to have failed, the servers elect another leader. Each election causes the beginning of a new term—a periodof time in which at most one leader exits. A server that wins an election during a term becomes the leader of that term.

Imagine you have a tcp connection and you want a so-called idle timeout, that is, you want to be called when there have been, say, 60 seconds of inactivity on the socket. The easiest way to do this is to configure an ev_timer with a repeat value of 60 and then call ev_timer_again each time you successfully read or write some data.
/** Example: Create a timeout timer that times out after 10 seconds of inactivity. */
static void timeout_cb (struct ev_loop *loop, struct ev_timer *w, int revents)
{
    .. ten seconds without any activity
}

struct ev_timer mytimer;
ev_timer_init (&mytimer, timeout_cb, 0., 10.); /* note, only repeat used */
ev_timer_again (&mytimer); /* start timer */
ev_loop (loop, 0);
// and in some piece of code that gets executed on any "activity":
// reset the timeout to start ticking again at 10 seconds
ev_timer_again (&mytimer);

The candidate sends vote requests to the other servers: It updates its corresponding entry in the vote request array (one of the control data arrays) at all other servers by issuing RDMA write operations.
struct ctrl_data_t {
    /* State identified (SID) */
    uint64_t    sid;
    
    /* DARE arrays */
    vote_req_t    vote_req[MAX_SERVER_COUNT];       /* vote requests */
};

/* Set remote offset */
uint32_t offset = (uint32_t) (offsetof(ctrl_data_t, vote_req) + sizeof(vote_req_t) * idx);
Servers not aware of a leader periodically check the vote request array for incoming requests. They only consider requests for the leadership of a higher (more recent) term than their own.
static void poll_vote_requests()
{
    if (SID_GET_L(data.cached_sid)) {
        /* Active leader known; just ignore vote requests */
        return;
    }
    
    /* No leader known; make sure about this. */

    ..

    /* Okay, so there is no known leader... 
    ...look for vote requests. */
}

  • A faulty-leader is eventually detected by all the non-faulty servers; thus, a leader election starts. By using randomized timeouts [1] for restarting the election, DARE ensures that a leader is eventually elected.
  • Every other server checks its heartbeat array regularly, with a period ∆: If its own term is smaller, then a change in leadership occurred; thus, the server updates its own term to indicate its support.
  • Removing a server: The leader detects failed servers by using the Queue Pair (QP) timeouts provided by the Reliable Connection (RC) transport mechanism.

[1] Ongaro, Diego, and John Ousterhout. "In search of an understandable consensus algorithm." Proc. USENIX Annual Technical Conference. 2014.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值