由percona5.5参数innodb_adaptive_flushing_method想到的....

最新推荐文章于 2021-06-29 10:47:43 发布

原创最新推荐文章于 2021-06-29 10:47:43 发布 · 4.5k 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#buffer #list #asynchronous #磁盘 #thread #dictionary

MySQL 专栏收录该内容

118 篇文章

订阅专栏

本文深入探讨了InnoDB中缓存管理和日志处理的机制，包括参数设置、刷新策略、checkpoint工作原理及如何优化性能。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

以下是本人凌乱的记录、杂乱无章，不堪入目啊................

-------------------

参数：innodb_adaptive_flushing_method.

控制脏页的刷新，可以动态修改

包括以下三个值：

0(native)

This setting causes checkpointing to operate exactly asit does in native InnoDB

1(estimate)

If the oldest modified age exceeds 1/2 of the maximum agecapacity, InnoDB starts flushing blocks every second. The number of blocksflushed is determined by [number of modified blocks], [LSN progress speed] and[average age of all modified blocks]. So, this behavior is independent of the innodb_io_capacityvariable.

2(keep_average)

This method attempts to keep the I/O rate constant byusing a much shorter loop cycle (0.1second) than that of the other methods (1.0second). It is designed for use with SSD cards.

在Percona5.5.18里并没有reflex选项

根据该选项的值，在线程函数srv_master_thread里会作出不同的选择分支。

Srv0srv.c:3286
if (UNIV_UNLIKELY(buf_get_modified_ratio_pct()  
                > srv_max_buf_pool_modified_pct)) {
    当脏页超过75%时，调用n_pages_flushed =buf_flush_list(            PCT_IO(100), IB_ULONGLONG_MAX)刷脏数据
} else if (srv_adaptive_flushing &&srv_adaptive_flushing_method == 0) {
设置为默认值native，则首先计算脏页产生的速度（buf_flush_get_desired_flush_rate），然后在进行刷盘操作
} else if(srv_adaptive_flushing && srv_adaptive_flushing_method == 1) {
设置为estimate
} else if(srv_adaptive_flushing && srv_adaptive_flushing_method == 2) {
设置为keep_average
}

从代码里可以看出来很多比例值是被定死了的。

几个分支所做的事情，就是确定刷的脏页LSN范围，然后调用buf_flush_list

一个有趣的宏：PCT_IO，展开来看看：

#define PCT_IO(p) ((ulong) (srv_io_capacity * ((double) p/ 100.0)))

其中srv_io_capacity默认值为200，代表服务器硬盘的IOPS能力，如果你的硬盘很牛叉，那就大胆的把innodb_io_capacity改的更大吧。

另外一个问题是在innodb里checkpoint是如何工作的呢。

以下摘自网络做了些蹩脚的翻译：

/*----------------------------------------------------------------begin--------------------------------------------------------------------------------------*/

我们知道，有两种类型的checkpoint，一种是sharp checkpoint，一种是fuzzy checkpoint。sharp checkpoint 会把所有提交的事务修改的页刷新到磁盘，并记录最近一次提交的事务LSN，没有提交的事务的修改页不会被刷新到磁盘。这样在crash恢复的时，我们可以从checkpoint记录的LSN开始。

A sharp checkpoint is called “sharp” because everythingthat is flushed to disk for the checkpoint is consistent as of a single pointin time — the checkpoint LSN

Fuzzy Checkpoint比SharpCheckpoint更加复杂。它会记录两个LSN：checkpoint的起始和结束的LSN号。Fuzzy CheckPoint是这么描述的：

A fuzzy checkpoint is morecomplex. It flushes pages as time passes, until it has flushed all pages that asharp checkpoint would have done. It completes by writing down two LSNs: whenthe checkpoint started and when it ended. But the pages it flushed might notall be consistent with each other as of a single point in time, which is whyit’s called “fuzzy.” A page that got flushed early might have been modifiedsince then, and a page that got flushed late might have a newer LSN than thestarting LSN. A fuzzy checkpoint can conceptually be converted into a sharpcheckpoint by performing REDO from the starting LSN to the ending LSN. Uponrecovery, then, REDO can begin from the LSN at which the checkpoint started

Innodb在shutdown的时候做sharp checkpoint，在正常操作时，做fuzzy checkpoint，并且并跟理论上的描述也有出入。

Innodb将文件页维持到一个大的bufferpool里，并且页面被修改后，不是立刻被写入到磁盘中。而是将脏页保持在内存中，以期待能够合并多次的修改。Innodb通过几个链表来跟踪buffer pool中的页：

the free list notes which pages are available to be used;

the LRU list notes which pages have been used leastrecently;

the flush list contains all of the dirty pages in LSNorder, least-recently-modified first.

当需要从磁盘读取页，而buffer pool中已没有空闲位置时，需要把脏页刷到磁盘来腾出空间，这是一种很慢的操作。

it flushes the oldest-modified pages from the flush liston a regular basis, trying to keep from hitting certain high-water marks. Itchooses the pages based on their physical locations on disk and their LSN(which is their modification time).

除了避免接近高水位，同样也要避免接触低水位，以免更高的I/O开销。Innodb循环的将日志写到固定大小的日志文件中。

当Innodb刷新脏页到磁盘中时，找到最老的LSN作为checkpoint的低水位。然后将该lsn写到事务头（log_checkpoint_margin()或log_checkpoint()函数）

Therefore, every time InnoDB flushes dirty pages from thehead of the flush list, it is actually making a checkpoint by advancing the oldestLSN in the system. And that is how continual fuzzy checkpointing isimplemented without ever “doing a checkpoint” as a separate event. If there isa server crash, then recovery simply proceeds from the oldest LSN onwards.

当innodb shut down的时，会首先停止所有对事务的更新，然后把所有的脏页刷新到磁盘，然后将当前的LSN写入到事务日志头。额外的还会将LSN写到每个数据文件的头部

/*----------------------------------------------------------------------------------------end--------------------------------------------------------------------------------------*/

Srv_master_thread会调用log_free_check来检查是否刷新logbuffer或更新checkpoint，注解如下：

/*Checks if there is need for a log buffer flush or a newcheckpoint, and does this if yes. Any database operation should call this whenit has modified more than about 4 pages. NOTE that thisfunction may only be called when the OS thread owns no synchronization objectsexcept the dictionary mutex.*/
UNIV_INLINE
void
log_free_check(void)
/*================*/
{
 
    if (log_sys->check_flush_or_checkpoint) {
        log_check_margins();
    }  
}

Log_sys->check_flush_or_checkpoint需要为true才会触发；

Log_sys是一个全局结构体（log_struct）

check_flush_or_checkpoint注释如下：

this is set to TRUE when there may be need to flush thelog buffer, or preflush buffer pool pages, or make a checkpoint; this MUST beTRUE when lsn - last_checkpoint_lsn > max_checkpoint_age;this flag is peeked at by log_free_check(), which does not reserve the logmutex

在以下几个函数里，check_flush_or_checkpoint可能会被设置为TRUE：

log_init(void)

log_close(void)

log_checkpoint_margin

在函数log_checkpoint_margin里会被设为FALSE。

log_check_margins()会做两件事情：

--------刷日志log：

log_flush_margin();

如果当前存在flush操作，则什么也不做，否则，执行flush

lsn =log->lsn

log_write_up_to(lsn,LOG_NO_WAIT, FALSE);

-------设置checkpoint：

log_checkpoint_margin();主要做两件事：

刷脏的数据页

写checkpoint

oldest_lsn = log_buf_pool_get_oldest_modification();

首先从buf pool里找到最老的lsn，实际调用的函数是buf_pool_get_oldest_modification

for (i = 0; i < srv_buf_pool_instances; i++) {
         buf_pool_t*   buf_pool;
         buf_pool =buf_pool_from_array(i);
         buf_flush_list_mutex_enter(buf_pool);
         bpage =UT_LIST_GET_LAST(buf_pool->flush_list);
         if (bpage!= NULL) {
              ut_ad(bpage->in_flush_list);
              lsn =bpage->oldest_modification;
         }
         buf_flush_list_mutex_exit(buf_pool);
         if(!oldest_lsn || oldest_lsn > lsn) {
              oldest_lsn= lsn;
          }
     }

这部分的逻辑很简单，就是从所有的Buffer pool实例中找到最老的lsn。我们回到函数 log_checkpoint_margin函数，继续分析：

if (age >log->max_modified_age_sync) {

/*A flush is urgent: we have to do a synchronous preflush */

sync = TRUE;

advance = 2 *(age - log->max_modified_age_sync);

当前log->lsn - oldest_lsn >(日志空间大小 * 15/16)时，强制将2*(Buf age-Buf async)的脏页刷盘，此时事务停止执行

} else if (age> log_max_modified_age_async()) {
/* A flush is not urgent: we do an asynchronous preflush*/

advance= age - log_max_modified_age_async();

当age>7/8(min(日志空间大小，参数srv_checkpoint_age_target))时，异步刷盘，无需阻塞事务。

} else {

advance = 0;

}

首先计算需要刷新的 LSN范围(advance)

if (checkpoint_age > log->max_checkpoint_age) {
       /* Acheckpoint is urgent: we do it synchronously */
       checkpoint_sync= TRUE;
       do_checkpoint= TRUE;
 
    } else if(checkpoint_age > log_max_checkpoint_age_async()) {
       /* Acheckpoint is not urgent: do it asynchronously */
 
       do_checkpoint= TRUE;
 
       log->check_flush_or_checkpoint= FALSE;
    } else {
       log->check_flush_or_checkpoint= FALSE;
    }

类似的，也要判断是否做checkpoint

然后，再做实际的操作：

    ib_uint64_t   new_oldest = oldest_lsn + advance;
    success =log_preflush_pool_modified_pages(new_oldest, sync);

刷日志文件

if (do_checkpoint) {
    log_checkpoint(checkpoint_sync,FALSE);

写checkpoint

在函数log_preflush_pool_modified_pages里调用buf_flush_list->buf_flush_batch->buf_flush_buffered_writes将当前最老的lsn刷到新的位置。如果sync为true，则会阻塞直到刷新完成（buf_flush_wait_batch_end）

在函数log_checkpoint()里执行记录checkpoint。该函数会检查在buffer pool中最早执行修改的LSN,然后将该LSN的信息写入到日志文件中。

题外话，以下摘自网络：

--------------------------------

从MySQL5.5.4开始增加了一个变量innodb_buffer_pool_instances，用来指定独立Buffer pool的数量

MySQL 5.5引入了innodb_buffer_pool_instances参数，设置该参数后InnoDB会将一个缓冲池划分为多个小的缓冲池，每个小缓冲池都有独立的LRU列表，空闲列表，刷新列表。以此来降低缓冲池资源的竞争。。

该参数通过HASH的方式来降低资源的竞争，然而有时我们可能知道大部分的竞争集中于一张表上，这时innodb_buffer_pool_instances就显得无能为力了。InnoDB independent buffer pool可以指定将某几张表放入指定大小的独立缓冲池中，以此来降低某几张具体表的资源竞争。目前使用independent buffer pool时，必须设置innodb_file_per_table为ON。