ckpt的工作机制

最新推荐文章于 2023-02-07 20:54:16 发布

转载最新推荐文章于 2023-02-07 20:54:16 发布 · 1w 阅读

文章标签：

#工作 #oracle #thread #buffer #数据库 #file

数据库专栏收录该内容

34 篇文章

订阅专栏

检查点是一个数据库事件，它把修改数据从高速缓存写入磁盘，并更新控制文件和数据文件。

检查点分为三类：
1）局部检查点：单个实例执行数据库所有数据文件的一个检查点操作，属于此实例的全部脏缓存区写入数据文件。
触发命令：
svmrgrl>alter system checkpoint local;
这条命令显示的触发一个局部检查点。
2）全局检查点：所有实例（对应并行数据服务器）执行数据库所有所有数据文件的一个检查点操作，属于此实例的全部脏缓存区写入数据文件。
触发命令
svrmgrl>alter system checkpoint global;
这条命令显示的触发一个全局检查点。
3）文件检查点：所有实例需要执行数据文件集的一个检查点操作，如使用热备份命令alter tablespace USERS begin backup，或表空间脱机命令alter tablespace USERS offline，将执行属于USERS表空间的所有数据文件的一个检查点操作。

检查点处理步骤：
1）获取实例状态队列：实例状态队列是在实例状态转变时获得，ORACLE获得此队列以保证检查点执行期间，数据库处于打开状态；
2）获取当前检查点信息：获取检查点记录信息的结构，此结构包括当前检查点时间、活动线程、进行检查点处理的当前线程、日志文件中恢复截止点的地址信息；
3）缓存区标识：标识所有脏缓存区，当检查点找到一个脏缓存区就将其标识为需进行刷新，标识的脏缓存区由系统进程DBWR进行写操作，将脏缓存区的内容写入数据文件；
4）脏缓存区刷新：DBWR进程将所有脏缓存区写入磁盘后，设置一标志，标识已完成脏缓存区至磁盘的写入操作。系统进程LGWR与CKPT进程将继续进行检查，直至DBWR进程结束为止；
5）更新控制文件与数据文件。
注：控制文件与数据文件头包含检查点结构信息。

在两种情况下，文件头中的检查点信息（获取当前检查点信息时）将不做更新：
1）数据文件不处于热备份方式，此时ORACLE将不知道操作系统将何时读文件头，而备份拷贝在拷贝开始时必须具有检查点SCN；
ORACLE在数据文件头中保留一个检查点的记数器，在正常操作中保证使用数据文件的当前版本，在恢复时防止恢复数据文件的错误版本；即使在热备份方式下，计数器依然是递增的；每个数据文件的检查点计数器，也保留在控制文件相对应数据文件项中。
2）检查SCN小于文件头中的检查点SCN的时候，这表明由检查点产生的改动已经写到磁盘上，在执行全局检查点的处理过程中，如果一个热备份快速检查点在更新文件头时，则可能发生此种情况。应该注意的是，ORACLE是在实际进行检查点处理的大量工作之前捕获检查SCN的，并且很有可能被一条象热备份命令 alter tablespace USERS begin backup进行快速检查点处理时的命令打断。
ORACLE在进行数据文件更新之前，将验证其数据一致性，当验证完成，即更新数据文件头以反映当前检查点的情况；未经验证的数据文件与写入时出现错误的数据文件都被忽略；如果日志文件被覆盖，则这个文件可能需要进行介质恢复，在这种情况下，ORACLE系统进程DBWR将此数据文件脱机。

检查点算法描述：
脏缓存区用一个新队列链接，称为检查点队列。对缓存区的每一个改动，都有一个与其相关的重做值。检查点队列包含脏的日志缓存区，这些缓存区按照它们在日志文件中的位置排序，即在检查点队列中，缓存区按照它们的低重做值进行排序。需要注意的是，由于缓存区是依照第一次变脏的次序链接到队列中的，所以，如果在缓存区写出之前对它有另外的改动，链接不能进行相应变更，缓存区一旦被链接到检查点队列，它就停留在此位置，直到将它被写出为止。

ORACLE系统进程DBWR在响应检查点请求时，按照这个队列的低重做值的升序写出缓存区。每个检查点请求指定一个重做值，一旦DBWR写出的缓存区重做值等于或大雨检查点的重做值，检查点处理即完成，并将记录到控制文件与数据文件。
由于检查点队列上的缓存区按照低重做值进行排序，而DBWR也按照低重做值顺序写出检查点缓存区，故可能有多个检查点请求处于活动状态，当DBWR写出缓存区时，检查位于检查点队列前端的缓存区重做值与检查点重做值的一致性，如果重做值小于检查点队列前缓存区的低重做值的所有检查点请求，即可表示处理完成。当存在未完成的活动检查点请求时，DBWR继续写出检查点缓存区。

这里应该是只更新控制文件，每3秒不是更新数据文件
说记录 checkpoint 的执行情况，这个说法，没错，但不够详细，应该说，由于增量检查点和 checkpoint queue 的原理，ckpt 进程每次只是告诉 dbwr ，写dirty buffer将要一直写到最新这个位置，仅仅是告诉 dbwr 一个 checkpoint queue 中的结束点，而 ckpt 每3秒中，在控制文件中报告一下 dbwr 最新写入的位置。这样使得，比如数据库要做恢复的时候（instance recovery）可以从这个最新位置开始做恢复，而不是从数据文件中的 checkpoint scn 开始做恢复，这样将缩短恢复时间，尤其是 instance crash 的情况下启动更快

另外要注意的是，检查点发生的时候，ckpt 去更新数据文件头和控制文件，并不是把当前检查点发生时候的 scn 更新进去，而是把上一次dbwr写入已经完成的检查点发生时候的 scn 更新进去，也就是说，更新控制文件和数据文件头是滞后于检查点的发生的，这个从恢复的原理也很容易理解，因为检查点发生的时候 dirty buffer还没有写入，自然不能立即更新成当前的 scn 了。

检查点的作用一是建立数据的一致性。二是为数据库作一个标记。表示数据库可以恢复的最大限度。
由他触发dbwn，把数据写入DATAFILE。再由dbwn触发LGWn把scn写入datafile和control file。

incremental checkpoint的是否应用和间隔时间或间隔的块数应该由FAST_START_IO_TARGET 或FAST_START_MTTR_TARGET参数决定吧。
只是默认的情况是3秒。
但如果指定了FAST_START_IO_TARGET 或LOG_CHECKPOINT_INTERVAL 这两个参数。则FAST_START_MTTR_TARGET就失效了。。

增量检查点本身并不是 3 秒，3秒也和增量检查点不是一个概念
3秒只是在控制文件中，ckpt 进程去更新当前 dbwr写到哪里了，这个对于 ckpt 进程来说叫 heartbeat
3秒可以看作不停的检查并记录检查点执行情况（DBWR的写进度）

something about checkpoint queue latch

wanghai, the following is just for your reference. :)

Firstly we have two queue structures associated with checkpoints - the checkpoint queue - or thread queue - (CKPTQ), and the file queue.For each buffer to be checkpointed, it is linked to these two queues. The CKPTQ contains all buffers that need to be checkpointed for this instance. The File queue contains all buffers that belong to a specific file that need to be checkpointed. There is one file queue per file. The file queues are used by tablespace checkpoint requests.

Both these queues constitute a set of checkpoint queues.

Before a process can put a buffer on a checkpoint queue, it must make sure that the queue is not being used. There is one checkpoint queue latch per set of checkpoint queues, that is used to control access to these queues. To reduce contention on this latch, the set of thread and file queues is replicated as per the number of working sets for the instance.

The determination of the number of working sets has changed across the DB versions. For 9.2, the default number is calculated
internally as #CPUs / 2 * 8.

The maximum of DBWR's (db_writer_processes) you can have at 9.2 is 20. Working sets are assigned to DBWR's in a round-robin
fashion at startup.
贴出这篇文章补充上面的说法

Oracle Internals Notes
Redo Byte Address (RBA)
Recent entries in the redo thread of an Oracle instance are addressed using a 3-part redo byte address, or RBA. An RBA is comprised of
the log file sequence number (4 bytes)
the log file block number (4 bytes)
the byte offset into the block at which the redo record starts (2 bytes)
RBAs are not necessarily unique within their thread, because the log file sequence number may be reset to 1 in all threads if a database is opened with the RESETLOGS option.
RBAs are used in the following important ways.

With respect to a dirty block in the buffer cache, the low RBA is the address of the redo for the first change that was applied to the block since it was last clean, and the high RBA is the address of the redo for the most recent change to have been applied to the block.
Dirty buffers are maintained on the buffer cache checkpoint queues in low RBA order. The checkpoint RBA is the point up to which DBWn has written buffers from the checkpoint queues if incremental checkpointing is enabled -- otherwise it is the RBA of last full thread checkpoint. The checkpoint RBA is copied into the checkpoint progress record of the controlfile by the checkpoint heartbeat once every 3 seconds. Instance recovery, when needed, begins from the checkpoint RBA recorded in the controlfile. The target RBA is the point up to which DBWn should seek to advance the checkpoint RBA to satisfy instance recovery objectives.

The on-disk RBA is the point up to which LGWR has flushed the redo thread to the online log files. DBWn may not write a block for which the high RBA is beyond the on-disk RBA. Otherwise transaction recovery (rollback) would not be possible, because the redo needed to undo a change is always in the same redo record as the redo for the change itself.

The term sync RBA is sometimes used to refer to the point up to which LGWR is required to sync the thread. However, this is not a full RBA -- only a redo block number is used at this point.

The low and high RBAs for dirty buffers can be seen in X$BH. (There is also a recovery RBA which is used to record the progress of partial block recovery by PMON.) The incremental checkpoint RBA, the target RBA and the on-disk RBA can all be seen in X$TARGETRBA. The incremental checkpoint RBA and the on-disk RBA can also be seen in X$KCCCP. The full thread checkpoint RBA can be seen in X$KCCRT.