innodb 存储布局

Data Organization in InnoDB

Introduction

This article will explain how the data is organized in InnoDB storage engine. First we will look at the various files that are created by InnoDB, then we look at the logical data organization like tablespaces, pages, segments and extents. We will explore each of them in some detail and discuss about their relationship with each other. At the end of this article, the reader will have a high level view of the data layout within the InnoDB storage engine.

The Files

MySQL will store all data within the data directory. The data directory can be specified using the command line option –data-dir or in the configuration file as datadir. Refer to the Server Command Options for complete details.

By default, when InnoDB is initialized, it creates 3 important files in the data directory – ibdata1, ib_logfile0 and ib_logfile1. The ibdata1 is the data file in which system and user data will be stored. The ib_logfile0 and ib_logfile1 are the redo log files. The location and size of these files are configurable. Refer to Configuring InnoDB for more details.

The data file ibdata1 belongs to the system tablespace with tablespace id (space_id) of 0. The system tablespace can contain more than 1 data file. As of MySQL 5.6, only the system tablespace can contain more than 1 data file. All other tablespaces can contain only one data file. Also, only the system tablespace can contain more than one table, while all other tablespaces can contain only one table.

The data files and the redo log files are represented in the memory by the C structure fil_node_t.

Tablespaces

By default, InnoDB contains only one tablespace called the system tablespace whose identifier is 0. More tablespaces can be created indirectly using the innodb_file_per_table configuration parameter. In MySQL 5.6, this configuration parameter is ON by default. When it is ON, each table will be created in its own tablespace in a separate data file.

The relationship between the tablespace and data files is explained in the InnoDB source code comment (storage/innobase/fil/fil0fil.cc) which is quoted here for reference:

A tablespace consists of a chain of files. The size of the files does not have to be divisible by the database block size, because we may just leave the last incomplete block unused. When a new file is appended to the tablespace, the maximum size of the file is also specified. At the moment, we think that it is best to extend the file to its maximum size already at the creation of the file, because then we can avoid dynamically extending the file when more space is needed for the tablespace.”

The last statement about avoiding dynamic extension is applicable only to the redo log files and not the data files. Data files are dynamically extended, but redo log files are pre-allocated. Also, as already mentioned earlier, only the system tablespace can have more than one data file.

It is also clearly mentioned that even though the tablespace can have multiple files, they are thought of as one single large file concatenated together. So the order of files within the tablespace is important.

Pages

A data file is logically divided into equal sized pages. The first page of the first data file is identified with page number of 0, and the next page would be 1 and so on. A page within a tablespace is uniquely identified by the page identifier or page number (page_no). And each tablespace is uniquely identified by the tablespace identifier (space_id). So a page is uniquely identified throughout InnoDB by using the (space_id, page_no) combination. And any location within InnoDB can be uniquely identified by the (space_id, page_no, page_offset) combination, where page_offset is the number of bytes within the given page.

How the pages from different data files relate to one another is explained in another source code comment: “A block's position in the tablespace is specified with a 32-bit unsigned integer. The files in the chain are thought to be catenated, and the block corresponding to an address n is the nth block in the catenated file (where the first block is named the 0th block, and the incomplete block fragments at the end of files are not taken into account). A tablespace can be extended by appending a new file at the end of the chain.” This means that the first page of all the data files will not have page_no of 0 (zero). Only the first page of the first data file in a tablespace will have the page_no as 0 (zero).

Also in the above comment it is mentioned that the page_no is a 32-bit unsigned integer. This is the size of the page_no when stored on the disk.

Every page has a page header (page_header_t). For more details on this please refer to the Jeremy Cole's blog “The basics of InnoDB space file layout. 

Extents

An extent is 1MB of consecutive pages. The size of one extent is defined as follows (1048576 bytes = 1MB):

#define FSP_EXTENT_SIZE (1048576U / UNIV_PAGE_SIZE)

The macro UNIV_PAGE_SIZE used to be a compile time constant. From mysql-5.6 onwards it is a global variable. The number of pages in an extent depends on the page size used. If the page size is 16K (the default), then an extent would contain 64 pages.

Page Types

One page can be used for many purposes. The page type will identify the purpose for which the page is being used. The page type of each page will be stored in the page header. The page types are available in the header file storage/innobase/include/fil0fil.h. The following table provides a brief description of the page types.

Page Type

Description

FIL_PAGE_INDEX

The page is a B-tree node

FIL_PAGE_UNDO_LOG

The page stores undo logs

FIL_PAGE_INODE

contains an array of fseg_inode_t objects.

FIL_PAGE_IBUF_FREE_LIST

The page is in the free list of insert buffer or change buffer.

FIL_PAGE_TYPE_ALLOCATED

Freshly allocated page.

FIL_PAGE_IBUF_BITMAP

Insert buffer or change buffer bitmap

FIL_PAGE_TYPE_SYS

System page

FIL_PAGE_TYPE_TRX_SYS

Transaction system data

FIL_PAGE_TYPE_FSP_HDR

File space header

FIL_PAGE_TYPE_XDES

Extent Descriptor Page

FIL_PAGE_TYPE_BLOB

Uncompressed BLOB page

FIL_PAGE_TYPE_ZBLOB

First compressed BLOB page

FIL_PAGE_TYPE_ZBLOB2

Subsequent compressed BLOB page


Each page type is used for different purposes. It is beyond the scope of this article, to explore each page type. For now, it is sufficient to know that all pages have a page header (page_header_t) and they store the page type in it, and based on the page type the contents and the layout of the page would be decided.

Tablespace Header

Each tablespace will have a header of type fsp_header_t. This data structure is stored in the first page of a tablespace.

  • The table space identifier (space_id)

  • Current size of the table space in pages.

  • List of free extents

  • List of full extents not belonging to any segment.

  • List of partially full/free extents not belonging to any segment.

  • List of pages containing segment headers, where all the segment inode slots are reserved. (pages of type FIL_PAGE_INODE)

  • List of pages containing segment headers, where not all the segment inode slots are reserved. (pages of type FIL_PAGE_INODE).

InnoDB Tablespace Header Structure

From the tablespace header, we can access the list of segments available in the tablespace. The total space occupied by the tablespace header is given by the macro FSP_HEADER_SIZE, which is equal to 16*7 = 112 bytes.

Reserved Pages of Tablespace

As mentioned earlier, InnoDB will always contain one tablespace called the system tablespace with identifier 0. This is a special tablespace and is always kept open as long as the MySQL server is running. The first few pages of the system tablespace is reserved for internal usage. This information can be obtained from the header storage/innobase/include/fsp0types.h. They are listed below with a short description.


Page Number

The Page Name

Description

0

FSP_XDES_OFFSET

The extent descriptor page.

1

FSP_IBUF_BITMAP_OFFSET

The insert buffer bitmap page.

2

FSP_FIRST_INODE_PAGE_NO

The first inode page number.

3

FSP_IBUF_HEADER_PAGE_NO

Insert buffer header page in system tablespace.

4

FSP_IBUF_TREE_ROOT_PAGE_NO

Insert buffer B-tree root page in system tablespace.

5

FSP_TRX_SYS_PAGE_NO

Transaction system header in system tablespace.

6

FSP_FIRST_RSEG_PAGE_NO

First rollback segment page, in system tablespace.

7

FSP_DICT_HDR_PAGE_NO

Data dictionary header page in system tablespace.


As can be noted from above, the first 3 pages will be there in any tablespace. But the last 5 pages are reserved only in the case of system tablespace. In the case of other tablespaces only 3 pages are reserved.

When the option innodb_file_per_table is enabled, then for each table a separate tablespace with one data file would be created. The source code comment in the function dict_build_table_def_step() states the following:

                /* We create a new single-table tablespace for the table. 
                We initially let it be 4 pages: 
                - page 0 is the fsp header and an extent descriptor page, 
                - page 1 is an ibuf bitmap page, 
                - page 2 is the first inode page, 
                - page 3 will contain the root of the clustered index of the 
                table we create here. */ 

File Segments

A tablespace can contain many file segments. File segments (or just segments) is a logical entity. Each segment has a segment header (fseg_header_t), which points to the inode (fseg_inode_t) describing the file segment. The file segment header contains the following information:

  • The space to which the inode belongs

  • The page_no of the inode

  • The byte offset of the inode

  • The length of the file segment header (in bytes).

Note: It would have been really more readable (at source code level) if fseg_header_t and fseg_inode_t had proper C-style structures defined for them.

The fseg_inode_t object contains the following information:

  • The segment id to which it belongs.

  • List of full extents.

  • List of free extents of this segment.

  • List of partially full/free extents

  • Array of individual pages belonging to this segment. The size of this array is half an extent.

When a segment wants to grow, it will get free extents or pages from the tablespace to which it belongs.

Table

In InnoDB, when a table is created, a clustered index (B-tree) is created internally. This B-tree contains two file segments, one for the non-leaf pages and the other for the leaf pages. From the source code documentation:

In the root node of a B-tree there are two file segment headers. The leaf pages of a tree are allocated from one file segment, to make them consecutive on disk if possible. From the other file segment we allocate pages for the non-leaf levels of the tree.”

For a given table, the root page of a B-tree will be obtained from the data dictionary. So in InnoDB, each table exists within a tablespace, and contains one B-tree (the clustered index), which contains 2 file segments. Each file segment can contain many extents, and each extent contains 1MB of consecutive pages.

Conclusion

This article discussed the details about the data organization within InnoDB. We first looked at the files created by InnoDB, and then discussed about the various logical entities like tablespaces, pages, page types, extents, segments and tables. We also looked at the relationship between each one of them.

Send in your comments and feedback to annamalai.gurusami@oracle.

【资源说明】 1.项目代码功能经验证ok,确保稳定可靠运行。欢迎下载使用!在使用过程中,如有问题或建议,请及时私信沟通。 2.主要针对各个计算机相关专业,包括计科、信息安全、数据科学与大数据技术、人工智能、通信、物联网等领域的在校学生、专业教师或企业员工使用。 3.项目具有丰富的拓展空间,不仅可作为入门进阶,也可直接作为毕设、课程设计、大作业、初期项目立项演示等用途。 4.当然也鼓励大家基于此进行二次开发。 5.期待你能在项目中找到乐趣和灵感,也欢迎你的分享和反馈! 本文介绍了基于QEM(Quadric Error Metrics,二次误差度量)的优化网格简化算法的C和C++实现源码及其相关文档。这一算法主要应用于计算机图形学领域,用于优化三维模型的多边形数量,使之在保持原有模型特征的前提下实现简化。简化的目的是为了提高渲染速度,减少计算资源消耗,以及便于网络传输等。 本项目的核心是网格简化算法的实现,而QEM作为该算法的核心,是一种衡量简化误差的数学方法。通过计算每个顶点的二次误差矩阵来评估简化操作的误差,并以此来指导网格简化过程。QEM算法因其高效性和准确性在计算机图形学中广泛应用,尤其在实时渲染和三维打印领域。 项目代码包含C和C++两种语言版本,这意味着它可以在多种开发环境中运行,增加了其适用范围。对于计算机相关专业的学生、教师和行业从业者来说,这个项目提供了丰富的学习和实践机会。无论是作为学习编程的入门材料,还是作为深入研究计算机图形学的项目,该项目都具有实用价值。 此外,项目包含的论文文档为理解网格简化算法提供了理论基础。论文详细介绍了QEM算法的原理、实施步骤以及与其他算法的对比分析。这不仅有助于加深对算法的理解,也为那些希望将算法应用于自己研究领域的人员提供了参考资料。 资源说明文档强调了项目的稳定性和可靠性,并鼓励用户在使用过程中提出问题或建议,以便不断地优化和完善项目。文档还提醒用户注意查看,以获取使用该项目的所有必要信息。 项目的文件名称列表中包含了加水印的论文文档、资源说明文件和实际的项目代码目录,后者位于名为Mesh-Simplification-master的目录下。用户可以将这些资源用于多种教学和研究目的,包括课程设计、毕业设计、项目立项演示等。 这个项目是一个宝贵的资源,它不仅提供了一个成熟的技术实现,而且为进一步的研究和学习提供了坚实的基础。它鼓励用户探索和扩展,以期在计算机图形学领域中取得更深入的研究成果。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值