Redis 7.0 Multi Part AOF 源码解析

Redis的AOF(AppendOnlyFile)持久化通过不断追加命令到文件确保数据安全。在重写过程中,Redis创建新的AOF文件,通过子进程完成,避免阻塞主线程。重写期间,新命令写入临时文件,最后原子地替换旧AOF,确保数据一致性。重写完成后,缓冲区数据会被追加到新AOF中。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

AOF

首先咱们需要开启aof配置

appendonly yes  开启aof
appendfsync always   每次都执行

redis这里可以设置三种策略模式,根据模式的不同决定不同的aof文件的写入和同步时机

  • always:每次事件循环会执行一次,写入aof文件之后会执行一次同步。缺点速度最慢,优点故障停机也丢失一个事件循环数据
  • everysec:每次事件循环会执行一次写,最少间隔一秒执行一次同步,在执行sync则阻塞2秒进行写入。
  • no:每次事件循环执行一次,至于何时同步取决于操作系统。

1):写入

话不多说直接看源码aof.c#flushAppendOnlyFile

  1. aof缓冲区没有任何数据处理
    1. 当策略为每秒执行一次,且sync偏移量不等aof缓冲区当前大小,且当前时间>上次同步时间,且没有执行fsync,执行一次fsync
  2. 当策略为每秒执行一次,且不强制刷新,且有fsync在后台执行。这里操作主要保证推迟2秒才允许后续进行aof写入操作。
    1. aof延迟时间为0,则初始化为当前时间,并返回。
    2. 以延迟时间为基准,后续2秒时间内,进入此方法直接返回。
  3. 写入aof文件
  4. 对于写入数据低于aof缓存预期处理
    1. 写入失败则记录日志,标记失败状态。
    2. 低于预期,尝试移除内容到上次aof长度,也就是移除添加的数据
    3. 如果为ALWAYS策略,则退出。
    4. 其他策略,尝试移除添加的数据失败,则剔除aof缓冲区写的数据,并返回。
      1. 这里就是更新可读数据下标,表示nwritten后面的数据可读。没有这一步,则后续写入aof缓存的数据将丢失。
  5. 写入成功,如果上次写入状态为错误,则更新为成功
  6. 如果aof缓冲区小则重用,否则释放重新分配一个。
  7. 如果重写在进行时不需要同步开启,且有子进程在工作,则不进行sync
  8. 同步数据
    1. 策略为ALWAYS,执行fsync,并更新fsync偏移和时间。
    2. 策略为everysec,且过了上次fsync1秒钟,且fsync没有运行,则异步同步,更新fsync偏移和时间。
void flushAppendOnlyFile(int force) {
   
   
    ssize_t nwritten;
    int sync_in_progress = 0;
    mstime_t latency;

    //1:aof缓冲区没有任何数据
    if (sdslen(server.aof_buf) == 0) {
   
   
        /* Check if we need to do fsync even the aof buffer is empty,
         * because previously in AOF_FSYNC_EVERYSEC mode, fsync is
         * called only when aof buffer is not empty, so if users
         * stop write commands before fsync called in one second,
         * the data in page cache cannot be flushed in time. */
        if (server.aof_fsync == AOF_FSYNC_EVERYSEC &&    //策略为每秒执行一次
            server.aof_fsync_offset != server.aof_current_size &&  //sync偏移量不等aof缓冲区当前大小
            server.unixtime > server.aof_last_fsync &&  //当前时间>上次同步时间
            !(sync_in_progress = aofFsyncInProgress())) {
   
    //没有执行fsync
            goto try_fsync; //执行一次fsync
        } else {
   
   
            return;
        }
    }
    //2:针对每次间隔1s同步的处理
    if (server.aof_fsync == AOF_FSYNC_EVERYSEC) {
   
   
        // 是否有 SYNC 正在后台进行?
        sync_in_progress = aofFsyncInProgress();
    }
    //每次间隔1S,且不强制刷新
    if (server.aof_fsync == AOF_FSYNC_EVERYSEC && !force) {
   
   
        /* With this append fsync policy we do background fsyncing.
         * If the fsync is still in progress we can try to delay
         * the write for a couple of seconds. */
        //有sync在后台运行
        if (sync_in_progress) {
   
   
            //延迟执行aof为0
            if (server.aof_flush_postponed_start == 0) {
   
   
                /* No previous write postponing, remember that we are
                 * postponing the flush and return. */
                //记录上次延迟执行aof时间
                server.aof_flush_postponed_start = server.unixtime;
                return;
            } else if (server.unixtime - server.aof_flush_postponed_start < 2) {
   
   
                //上次因为sync而推迟write操作,小于2秒则直接返回。
                /* We were already waiting for fsync to finish, but for less
                 * than two seconds this is still ok. Postpone again. */
                return;
            }
            /* Otherwise fall through, and go write since we can't wait
             * over two seconds. */
            //走到这里说明,后台有sync执行,且write操作延迟>=2秒了,执行write操作,这里write操作会阻塞在sync操作后面。
            server.aof_delayed_fsync++;
            serverLog(LL_NOTICE,"Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.");
        }
    }
    /* We want to perform a single write. This should be guaranteed atomic
     * at least if the filesystem we are writing is a real physical one.
     * While this will save us against the server being killed I don't think
     * there is much to do about the whole server stopping for power problems
     * or alike */

    if (server.aof_flush_sleep && sdslen(server.aof_buf)) {
   
   
        usleep(server.aof_flush_sleep);
    }

    latencyStartMonitor(latency);
    //3:写入数据,并返回写入总个数
    nwritten = aofWrite(server.aof_fd,server.aof_buf,sdslen(server.aof_buf));
    latencyEndMonitor(latency);
    /* We want to capture different events for delayed writes:
     * when the delay happens with a pending fsync, or with a saving child
     * active, and when the above two conditions are missing.
     * We also use an additional event name to save all samples which is
     * useful for graphing / monitoring purposes. */
    if (sync_in_progress) {
   
   
        latencyAddSampleIfNeeded("aof-write-pending-fsync",latency);
    } else if (hasActiveChildProcess()) {
   
   //是否有子进程工作
        latencyAddSampleIfNeeded("aof-write-active-child",latency);
    } else {
   
   
        latencyAddSampleIfNeeded("aof-write-alone",latency);
    }
    latencyAddSampleIfNeeded("aof-write",latency);

    /* We performed the write so reset the postponed flush sentinel to zero. */
    //重置因sync而延迟等待的时间
    server.aof_flush_postponed_start = 0;
    //4:写入数据低于aof缓存预期处理
    if (nwritten != (ssize_t)sdslen(server.aof_buf)) {
   
   
        static time_t last_write_error_log = 0;
        int can_log = 0;

        /* Limit logging rate to 1 line per AOF_WRITE_LOG_ERROR_RATE seconds. */
        //4.1:将日志的记录频率限制在每行 AOF_WRITE_LOG_ERROR_RATE 秒
        if ((server.unixtime - last_write_error_log) > AOF_WRITE_LOG_ERROR_RATE) {
   
   
            can_log = 1;
            last_write_error_log = server.unixtime;
        }

        /* Log the AOF write error and record the error code. */
        //4.2:写入失败,则记录日志,且更新上次aof写入状态
        if (nwritten == -1) {
   
   
            if (can_log) {
   
   
                serverLog(LL_WARNING,"Error writing to the AOF file: %s",
                    strerror(errno));
            }
            //标记写入失败
            server.aof_last_write_errno = errno;
        } else {
   
   
            if (can_log) {
   
   
                serverLog(LL_WARNING,"Short write while writing to "
                                       "the AOF file: (nwritten=%lld, "
                                       "expected=%lld)",
                                       (long long)nwritten,
                                       (long long)sdslen(server.aof_buf));
            }
            //4.3:移除内容到上次aof长度,也就是移除添加的数据
            if (ftruncate(server.aof_fd, server.aof_last_incr_size) == -1) {
   
   
                if (can_log) {
   
   
                    serverLog(LL_WARNING, "Could not remove short write "
                             "from the append-only file.  Redis may refuse "
                             "to load the AOF the next time it starts.  "
                             "ftruncate: %s", strerror(errno));
                }
            } else {
   
   
                /* If the ftruncate() succeeded we can set nwritten to
                 * -1 since there is no longer partial data into the AOF. */
                nwritten = -1;
            }
            //标记写入数据量不一致错误
            server.aof_last_write_errno = ENOSPC;
        }

        /* Handle the AOF write error. */
        //4.4:处理aof写入错误
        if (server.aof_fsync == AOF_FSYNC_ALWAYS) {
   
   
            /* We can't recover when the fsync policy is ALWAYS since the reply
             * for the client is already in the output buffers (both writes and
             * reads), and the changes to the db can't be rolled back. Since we
             * have a contract with the user that on acknowledged or observed
             * writes are is synced on disk, we must exit. */
            serverLog(LL_WARNING,"Can't recover from AOF write error when the AOF fsync policy is 'always'. Exiting...");
            exit(1);
        } else {
   
   
            /* Recover from failed write leaving data into the buffer. However
             * set an error to stop accepting writes as long as the error
             * condition is not cleared. */
            server.aof_last_write_status = C_ERR;

            /* Trim the sds buffer if there was a partial write, and there
             * was no way to undo it with ftruncate(2). */
            if (nwritten > 0) {
   
    //走到这里说明将缓冲区数据写入aof文件了,但是写入长度和aof缓冲长度不一致,这说明有并发写入。
                server.aof_current_size += nwritten; //将aof文件内容截取到上次写入数据失败,这里便保留aof文件数据
                server.aof_last_incr_size += nwritten;
                sdsrange(server.aof_buf,nwritten,-1);//同时将缓冲区数据从nwritten截取数据并覆盖原来字符串。
            }
            return; /* We'll try again on the next call... */
        }
    } else {
   
   
        /* Successful write(2). If AOF was in error state, restore the
         * OK state and log the event. */
        //5:写入成功,如果上次写入状态为错误,则更新为成功
        if (server.aof_last_write_status == C_ERR) {
   
   
            serverLog(LL_WARNING,
                "AOF write error looks solved, Redis can write again.");
            server.aof_last_write_status = C_OK;
        }
    }
    //更新aof当前大小
    server.aof_current_size += nwritten;
    //增量记录aof写入个数
    server.aof_last_incr_size += nwritten;

    /* Re-use AOF buffer when it is small enough. The maximum comes from the
     * arena size of 4k minus some overhead (but is otherwise arbitrary). */
    //6:如果aof缓冲区小则重用
    if ((sdslen(server.aof_buf)+sdsavail(server.aof_buf)) < 4000) {
   
   
        //清除缓存内容
        sdsclear(server.aof_buf);
    } else {
   
   
        //释放缓存内容
        sdsfree(server.aof_buf);
        //重新申请一个缓存
        server.aof_buf = sdsempty();
    }

try_fsync:
    /* Don't fsync if no-appendfsync-on-rewrite is set to yes and there are
     * children doing I/O in the background. */
    //7:如果重写在进行时不需要同步开启,且有子进程在工作,则不进行sync
    if (server.aof_no_fsync_on_rewrite && hasActiveChildProcess())
        return;

    /* Perform the fsync if needed. */
    //8.策略为每次事件循环执行一次
    if (server.aof_fsync == AOF_FSYNC_ALWAYS) {
   
   
        /* redis_fsync is defined as fdatasync() for Linux in order to avoid
         * flushing metadata. */
        latencyStartMonitor(latency);
        /* Let's try to get this data on the disk. To guarantee data safe when
         * the AOF fsync policy is 'always', we should exit if failed to fsync
         * AOF (see comment next to the exit(1) after write error above). */
        //执行sync
        if (redis_fsync(server.aof_fd) == -1) {
   
   
            //失败,记录失败且退出
            serverLog(LL_WARNING,"Can't persist AOF for fsync error when the "
              "AOF fsync policy is 'always': %s. Exiting...", strerror(errno));
            exit(1);
        }
        latencyEndMonitor(latency);
        latencyAddSampleIfNeeded("aof-fsync-always",latency);
        //记录aof执行fsync的偏移量
        server.aof_fsync_offset = server.aof_current_size;
        //更新最近一次fsync时间
        server.aof_last_fsync = server.unixtime;
    } else if ((server.aof_fsync == AOF_FSYNC_EVERYSEC &&
                server.unixtime > server.aof_last_fsync)) {
   
   
        //9:策略为每秒执行一次,且过了上次fsync1秒钟,则同步
        //fsync没有后台执行
        if (!sync_in_progress) {
   
   
            //后台执行
            aof_background_fsync(server.aof_fd);
            //记
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值