以下是本人凌乱的记录、杂乱无章,不堪入目啊................
-------------------
参数:innodb_adaptive_flushing_method.
控制脏页的刷新,可以动态修改
包括以下三个值:
0(native)
This setting causes checkpointing to operate exactly asit does in native InnoDB
1(estimate)
If the oldest modified age exceeds 1/2 of the maximum agecapacity, InnoDB starts flushing blocks every second. The number of blocksflushed is determined by [number of modified blocks], [LSN progress speed] and[average age of all modified blocks]. So, this behavior is independent of the innodb_io_capacityvariable.
2(keep_average)
This method attempts to keep the I/O rate constant byusing a much shorter loop cycle (0.1second) than that of the other methods (1.0second). It is designed for use with SSD cards.
在Percona5.5.18里并没有reflex选项
根据该选项的值,在线程函数srv_master_thread里会作出不同的选择分支。
Srv0srv.c:3286
if (UNIV_UNLIKELY(buf_get_modified_ratio_pct()
> srv_max_buf_pool_modified_pct)) {
当脏页超过75%时,调用n_pages_flushed =buf_flush_list( PCT_IO(100), IB_ULONGLONG_MAX)刷脏数据
} else if (srv_adaptive_flushing &&srv_adaptive_flushing_method == 0) {
设置为默认值native,则首先计算脏页产生的速度(buf_flush_get_desired_flush_rate),然后在进行刷盘操作
} else if(srv_adaptive_flushing && srv_adaptive_flushing_method == 1) {
设置为estimate
} else if(srv_adaptive_flushing && srv_adaptive_flushing_method == 2) {
设置为keep_average
}
从代码里可以看出来很多比例值是被定死了的。
几个分支所做的事情,就是确定刷的脏页LSN范围,然后调用buf_flush_list
一个有趣的宏:PCT_IO,展开来看看:
#define PCT_IO(p) ((ulong) (srv_io_capacity * ((double) p/ 100.0)))
其中srv_io_capacity默认值为200,代表服务器硬盘的IOPS能力,如果你的硬盘很牛叉,那就大胆的把innodb_io_capacity改的更大吧。
另外一个问题是在innodb里checkpoint是如何工作的呢。
以下摘自网络做了些蹩脚的翻译:
/*----------------------------------------------------------------begin--------------------------------------------------------------------------------------*/
我们知道,有两种类型的checkpoint,一种是sharp checkpoint,一种是fuzzy checkpoint。sharp checkpoint 会把所有提交的事务修改的页刷新到磁盘,并记录最近一次提交的事务LSN,没有提交的事务的修改页不会被刷新到磁盘。这样在crash恢复的时,我们可以从checkpoint记录的LSN开始。
A sharp checkpoint is called “sharp” because everythingthat is flushed to disk for the checkpoint is consistent as of a single pointin time — the checkpoint LSN
Fuzzy Checkpoint比SharpCheckpoint更加复杂。它会记录两个LSN:checkpoint的起始和结束的LSN号。Fuzzy CheckPoint是这么描述的:
A fuzzy checkpoint is morecomplex. It flushes pages as time passes, until it has flushed all pages that asharp checkpoint would have done. It completes by writing down two LSNs: whenthe checkpoint started and when it ended. But the pages it flushed might notall be consistent with each other as of a single point in time, which is whyit’s called “fuzzy.” A page that got flushed early might have been modifiedsince then, and a page that got flushed late might have a newer LSN than thestarting LSN. A fuzzy checkpoint can conceptually be converted into a sharpcheckpoint by performing REDO from the starting LSN to the ending LSN. Uponrecovery, then, REDO can begin from the LSN at which the checkpoint started
Innodb在shutdown的时候做sharp checkpoint,在正常操作时,做fuzzy checkpoint,并且并跟理论上的描述也有出入。
Innodb将文件页维持到一个大的bufferpool里,并且页面被修改后,不是立刻被写入到磁盘中。而是将脏页保持在内存中,以期待能够合并多次的修改。Innodb通过几个链表来跟踪buffer pool中的页:
the free list notes which pages are available to be used;
the LRU list notes which pages have been used leastrecently;
the flush list contains all of the dirty pages in LSNorder, least-recently-modified first.
当需要从磁盘读取页,而buffer pool中已没有空闲位置时,需要把脏页刷到磁盘来腾出空间,这是一种很慢的操作。
it flushes the oldest-modified pages from the flush liston a regular basis, trying to keep from hitting certain high-water marks. Itchooses the pages based on their physical locations on disk and their LSN(which is their modification time).
除了避免接近高水位,同样也要避免接触低水位,以免更高的I/O开销。Innodb循环的将日志写到固定大小的日志文件中。
当Innodb刷新脏页到磁盘中时,找到最老的LSN作为checkpoint的低水位。然后将该lsn写到事务头(log_checkpoint_margin()或log_checkpoint()函数)
Therefore, every time InnoDB flushes dirty pages from thehead of the flush list, it is actually making a checkpoint by advancing the oldestLSN in the system. And that is how continual fuzzy checkpointing isimplemented without ever “doing a checkpoint” as a separate event. If there isa server crash, then recovery simply proceeds from the oldest LSN onwards.
当innodb shut down的时,会首先停止所有对事务的更新,然后把所有的脏页刷新到磁盘,然后将当前的LSN写入到事务日志头。额外的还会将LSN写到每个数据文件的头部
/*----------------------------------------------------------------------------------------end--------------------------------------------------------------------------------------*/
Srv_master_thread会调用log_free_check来检查是否刷新logbuffer或更新checkpoint,注解如下:
/*Checks if there is need for a log buffer flush or a newcheckpoint, and does this if yes. Any database operation should call this whenit has modified more than about 4 pages. NOTE that thisfunction may only be called when the OS thread owns no synchronization objectsexcept the dictionary mutex.*/
UNIV_INLINE
void
log_free_check(void)
/*================*/
{
if (log_sys->check_flush_or_checkpoint) {
log_check_margins();
}
}
Log_sys->check_flush_or_checkpoint需要为true才会触发;
Log_sys是一个全局结构体(log_struct)
check_flush_or_checkpoint注释如下:
this is set to TRUE when there may be need to flush thelog buffer, or preflush buffer pool pages, or make a checkpoint; this MUST beTRUE when lsn - last_checkpoint_lsn > max_checkpoint_age;this flag is peeked at by log_free_check(), which does not reserve the logmutex
在以下几个函数里,check_flush_or_checkpoint可能会被设置为TRUE:
log_init(void)
log_close(void)
log_checkpoint_margin
在函数log_checkpoint_margin里会被设为FALSE。
log_check_margins()会做两件事情:
--------刷日志log:
如果当前存在flush操作,则什么也不做,否则,执行flush
lsn =log->lsn
log_write_up_to(lsn,LOG_NO_WAIT, FALSE);
-------设置checkpoint:
log_checkpoint_margin();主要做两件事:
刷脏的数据页
写checkpoint
oldest_lsn = log_buf_pool_get_oldest_modification();
首先从buf pool里找到最老的lsn,实际调用的函数是buf_pool_get_oldest_modification
for (i = 0; i < srv_buf_pool_instances; i++) {
buf_pool_t* buf_pool;
buf_pool =buf_pool_from_array(i);
buf_flush_list_mutex_enter(buf_pool);
bpage =UT_LIST_GET_LAST(buf_pool->flush_list);
if (bpage!= NULL) {
ut_ad(bpage->in_flush_list);
lsn =bpage->oldest_modification;
}
buf_flush_list_mutex_exit(buf_pool);
if(!oldest_lsn || oldest_lsn > lsn) {
oldest_lsn= lsn;
}
}
这部分的逻辑很简单,就是从所有的Buffer pool实例中找到最老的lsn。我们回到函数log_checkpoint_margin函数,继续分析:
if (age >log->max_modified_age_sync) {
/*A flush is urgent: we have to do a synchronous preflush */
sync = TRUE;
advance = 2 *(age - log->max_modified_age_sync);
当前log->lsn - oldest_lsn >(日志空间大小 * 15/16)时,强制将2*(Buf age-Buf async)的脏页刷盘,此时事务停止执行
} else if (age> log_max_modified_age_async()) {
/* A flush is not urgent: we do an asynchronous preflush*/
advance= age - log_max_modified_age_async();
当age>7/8(min(日志空间大小,参数srv_checkpoint_age_target))时,异步刷盘,无需阻塞事务。
} else {
advance = 0;
}
首先计算需要刷新的 LSN范围(advance)
if (checkpoint_age > log->max_checkpoint_age) {
/* Acheckpoint is urgent: we do it synchronously */
checkpoint_sync= TRUE;
do_checkpoint= TRUE;
} else if(checkpoint_age > log_max_checkpoint_age_async()) {
/* Acheckpoint is not urgent: do it asynchronously */
do_checkpoint= TRUE;
log->check_flush_or_checkpoint= FALSE;
} else {
log->check_flush_or_checkpoint= FALSE;
}
类似的,也要判断是否做checkpoint
然后,再做实际的操作:
ib_uint64_t new_oldest = oldest_lsn + advance;
success =log_preflush_pool_modified_pages(new_oldest, sync);
刷日志文件
if (do_checkpoint) {
log_checkpoint(checkpoint_sync,FALSE);
写checkpoint
在函数log_preflush_pool_modified_pages里调用buf_flush_list->buf_flush_batch->buf_flush_buffered_writes将当前最老的lsn刷到新的位置。如果sync为true,则会阻塞直到刷新完成(buf_flush_wait_batch_end)
在函数log_checkpoint()里执行记录checkpoint。该函数会检查在buffer pool中最早执行修改的LSN,然后将该LSN的信息写入到日志文件中。
题外话,以下摘自网络:
--------------------------------
从MySQL5.5.4开始增加了一个变量innodb_buffer_pool_instances,用来指定独立Buffer pool的数量
MySQL 5.5引入了innodb_buffer_pool_instances参数,设置该参数后InnoDB会将一个缓冲池划分为多个小的缓冲池,每个小缓冲池都有独立的LRU列表,空闲列表,刷新列表。以此来降低缓冲池资源的竞争。。
该参数通过HASH的方式来降低资源的竞争,然而有时我们可能知道大部分的竞争集中于一张表上,这时innodb_buffer_pool_instances就显得无能为力了。InnoDB independent buffer pool可以指定将某几张表放入指定大小的独立缓冲池中,以此来降低某几张具体表的资源竞争。目前使用independent buffer pool时,必须设置innodb_file_per_table为ON。