Innodb Double Write

InnoDB使用双写机制确保数据安全,通过两次写入数据防止部分页面写入失败导致的数据不一致。此机制适用于表空间写入操作,首次将数据写入临时日志文件,确保数据同步到磁盘后,再写入实际位置。这种方式可以有效避免因电源故障或操作系统崩溃导致的部分写入问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >


本文转自:http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/


Innodb Double Write

One of very interesting techniques Innodb uses is technique called “doublewrite” It means Innodb will write data twice when it performs table space writes – writes to log files are done only once.

So why doublewrite is needed ? It is needed to archive data safety in case of partial page writes. Innodb does not log full pages to the log files, but uses what is called “physiological” logging which means log records contain page number for the operation as well as operation data (ie update the row) and log sequence information. Such logging structure is geat as it require less data to be written to the log, however it requires pages to be internally consistent. It does not matter which page version it is – it could be “current” version in which case Innodb will skip page upate operation or “former” in which case Innodb will perform update. If page is inconsistent recovery can’t proceed.

Now lets talk a bit about partial page writes – what are they and why are they happening. Partial page writes is when page write request submited to OS completes only partially. For example out of 16K Innodb page only first 4KB are updated and other parts remain in their former state. Most typically partial page writes happen when power failure happens. It also can happen on OS crash – there is a chance operation system will split your 16K write into several writes and failure happens just between their execution. Reasons for splitting could be file fragmentation – most file systems use 4K block sizes by default so 16K could use more than one fragment. Also if software RAID is used page may come on the stripe border requiring multiple IO requests. Same happens with Hardware RAID on power failure if it does not have battery backed up cache. If there is single write issued to the disk itself it should be in theory completed even if power goes down as there should be enough power accomulated inside the drive to complete it. I honestly do not know if this is always the case – it is hard to check as it is not the only reason for partial page writes. I just know they tend to happen and before Innodb doublewirite was implemented I had couple of data corruptions due to it.

So how does double write works ? You can think about it as about one more short term log file allocated inside Innodb tablespace – it contains space for 100 pages. When Innodb flushes pages from Innodb buffer pool it does so by multiple pages. So several pages will be written to double write buffer (sequentially), fsync() called to ensure they make it to the disk, then pages written to their real location and fsync() called the second time. Now on recovery Innodb checks doublewrite buffer contents and pages in their original location. If page is inconsistent in double write buffer it is simply discarded, if it is inconsistent in the tablespace it is recovered from double write buffer.

How much does double write buffer affect MySQL Performance ? Even though double write requires each page written twice its overhead is far less than double. Write to double write buffer is sequential so it is pretty cheap. It also allows Innodb to save on fsync()s – instead of calling fsync() for each page write Innodb submits multiple page writes and calls fsync() which allows Operating System to optimize in which order writes are executed and use multiple devices in parallel. This optimization could be used without doublewrite though, it was just implemented at the same time. So in general I would expect no more than 5-10% performance loss due to use of doublewrite.

Can you disable doublewrite ? If you do not care about your data (ie slaves on RAID0) or if your file system guarantees you no partial page writes could exist you can disable doublewrite by setting innodb_doublewrite=0 It is however not worth the trouble in most cases.


注:近期参加MySQL运维学习,老师推荐该文章作为学习和技术提高的扩展阅读,先记录到自己的博客中,随后慢慢消化、学习、提高。


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值