zookeeper源码分析之恢复事务日志_zk故障恢复源码-优快云博客

本文链接：https://blog.youkuaiyun.com/weixin_42442768/article/details/112008635

zookeeper源码分析之恢复事务日志

前言
源码分析
查看事务日志命令
总结

前言

本文是基于zookeeper集群启动过程分析（https://blog.youkuaiyun.com/weixin_42442768/article/details/109247622），对zk从磁盘中读取文件并恢复为内存中的zk数据结构这一过程进行源码分析，snapshot的恢复过程见上一篇（https://blog.youkuaiyun.com/weixin_42442768/article/details/110134663），本文主要分析事务日志的恢复过程。

源码分析

首先定位到FileTxnSnapLog类的restore方法，该方法主要功能是将磁盘中的snapshots文件和事务日志文件恢复到内存中的ZKDatabase结构中，从而进行正常的工作。

    public long restore(DataTree dt, Map<Long, Integer> sessions,
            PlayBackListener listener) throws IOException {
   
        long deserializeResult = snapLog.deserialize(dt, sessions);
        FileTxnLog txnLog = new FileTxnLog(dataDir);
        if (-1L == deserializeResult) {
   
            /* this means that we couldn't find any snapshot, so we need to
             * initialize an empty database (reported in ZOOKEEPER-2325) */
            if (txnLog.getLastLoggedZxid() != -1) {
   
                throw new IOException(
                        "No snapshot found, but there are log entries. " +
                        "Something is broken!");
            }
            /* TODO: (br33d) we should either put a ConcurrentHashMap on restore()
             *       or use Map on save() */
            save(dt, (ConcurrentHashMap<Long, Integer>)sessions);
            /* return a zxid of zero, since we the database is empty */
            return 0;
        }
        return fastForwardFromEdits(dt, sessions, listener);
    }

上一篇内容已经分析了快照文件的恢复过程，我们直接从fastForwardFromEdits方法开始分析事务日志的恢复过程。

    public long fastForwardFromEdits(DataTree dt, Map<Long, Integer> sessions,
                                     PlayBackListener listener) throws IOException {
   
        //1. 快速读取事务日志，并创建日志文件迭代器`TxnIterator`
        TxnIterator itr = txnLog.read(dt.lastProcessedZxid+1);
        long highestZxid = dt.lastProcessedZxid;
        TxnHeader hdr;
        try {
   
        	//2. 按事务日志的zxid顺序解析所有文件
            while (true) {
   
                // iterator points to
                // the first valid txn when initialized
                hdr = itr.getHeader();
                if (hdr == null) {
   
                    //empty logs
                    return dt.lastProcessedZxid;
                }
                //3. 更新zxid并处理事务
                if (hdr.getZxid() < highestZxid && highestZxid != 0) {
   
                    LOG.error("{}(highestZxid) > {}(next log) for type {}",
                            highestZxid, hdr.getZxid(), hdr.getType());
                } else {
   
                    highestZxid = hdr.getZxid();
                }
                try {
   
                    processTransaction(hdr,dt,sessions, itr.getTxn());
                } catch(KeeperException.NoNodeException e) {
   
                   throw new IOException("Failed to process transaction type: " +
                         hdr.getType() + " error: " + e.getMessage(), e);
                }
                //4. 监听器监听事务日志恢复信息
                listener.onTxnLoaded(hdr, itr.getTxn());
                if (!itr.next())
                    break;
            }
        } finally {
   
            if (itr != null) {
   
                itr.close();
            }
        }
        //5. 返回最新zxid
        return highestZxid;
    }

整个事务日志的恢复流程如下：

快速读取事务日志，并创建日志文件迭代器TxnIterator
按事务日志的zxid顺序解析所有文件
更新zxid并处理事务
监听器监听事务日志恢复信息
返回最新zxid

下面对过程1（TxnIterator的创建）、3（processTransaction处理事务）、4（PlayBackListener监听器）进行详细说明。

TxnIterator的创建

txnLog是事务文件的存储目录，这里的参数是DataTree结构中的lastProcessedZxid+1，而DataTree是从snapshot文件恢复的内存中的数据结构，直接进入read方法：

    public TxnIterator read(long zxid) throws IOException {
   
        return read(zxid, true);
    }
    
    public TxnIterator read(long zxid, boolean fastForward) throws IOException {
   
        return new FileTxnIterator(logDir, zxid, fastForward);
    }

这里的fastForward置为true体现在从指定的zxid文件开始恢复事务文件，接着创建一个FileTxnIterator对象，先来看类定义和成员变量：

    public static class FileTxnIterator implements TxnLog.TxnIterator {
   
        File logDir;	//事务日志文件目录
        long zxid;		//指定要恢复的事务日志的起始zxid
        TxnHeader hdr;	//文件头解析类
        Record record;	//解析的文件信息
        File logFile;	//事务日志文件
        InputArchive ia;	//反序列化接口
        static final String CRC_ERROR="CRC check failed";

        PositionInputStream inputStream=null;	//input流
        //stored files is the list of files greater than
        //the zxid we are looking for.
        private ArrayList<File> storedFiles;	//存储要恢复的事务文件集合

构造方法如下：

        public FileTxnIterator(File logDir, long zxid, boolean fastForward)
                throws IOException {
   
            this.logDir = logDir;
            this.zxid = zxid;
            init();

            if (fastForward && hdr != null) {
   
                while