《MySQL技术内幕:InnoDB存储引擎》4. 索引

最新推荐文章于 2022-09-05 20:01:46 发布

larva_s

最新推荐文章于 2022-09-05 20:01:46 发布

阅读量378

点赞数

CC 4.0 BY-SA版权

分类专栏： MySQL

本文链接：https://blog.youkuaiyun.com/larva_s/article/details/87870534

MySQL 专栏收录该内容

8 篇文章

订阅专栏

本文详细探讨了数据库索引的工作原理，包括B+树结构、覆盖索引、聚集索引和辅助索引的作用，以及分裂、基数、联合索引、MRR优化、ICP优化和哈希算法的应用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

not null的索引为聚集索引，null的索引为辅助索引?

1. 索引

1.1 索引I/O次数

索引是B+树结构，目的是为了减少I/O次数，叶子节点也就是最后一层存放的是数据，非叶子节点应该保存的是下一层的地址，所以每次经过索引查找所需的I/O次数其实就是该B+树的高度的，要是三层B+树，则每次查找都需要经过三次的I/O。

1.2 覆盖索引为什么就不需要回表

覆盖索引就是索引中已经有对应所需的字段，则不需要回表查询其字段值了，比如联合索引 (a,b)，select a,b from table where a = ? and b = ?;这样就不需要回表查询了，因为找到的索引中有a,b值了，而select * from table where a = ? and b = ?；这样就需要再回一次表查询，虽然在索引中找到了对应a和b的值，但其他字段也要查询出来，就需要回表查询了。

2. 聚集（聚簇）索引

叶子节点存放的不止是索引还有行记录数据，即数据页。
1. 物理上不需要连续，逻辑上连续即可：通过双向链表连接实现逻辑连续
2. 范围查询：通过主键查找某一范围的数据，通过叶子节点的上层中间节点就能得到页的范围，直接读取数据页即可

3. 辅助（非聚集）索引

索引页存放的是键值以及书签（bookmark），书签能够找到索引对应的行数据。

1. 每张表可以有多个辅助索引，辅助索引可以指向聚集索引，若是在一颗高度为3的辅助索引树查找数据，则最多3次IO，然后再去聚集索引树查找页，可能总共需要6次IO
2. 辅助索引的分析：找图片

辅助索引使用不了:如下数据中，通过select * 查找的数据，使用辅助索引时，却发现是type是all，即全表扫描，
这是因为辅助索引并不包含整行记录的所有数据，则*是想找到其他列的数据，此时优化器就会使用聚集索引了，
当使用select creation_date时，就能够使用到辅助索引了，或者通过force index强制指定索引

mysql> show index from jiashi_guess;
+--------------+------------+------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table        | Non_unique | Key_name   | Seq_in_index | Column_name       | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------+------------+------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| jiashi_guess |          0 | PRIMARY    |            1 | user_id           | A         |       26010 |     NULL | NULL   |      | BTREE      |         |               |
| jiashi_guess |          0 | PRIMARY    |            2 | guess_type_id     | A         |       58456 |     NULL | NULL   |      | BTREE      |         |               |
| jiashi_guess |          1 | idx_create |            1 | creation_date     | A         |       50279 |     NULL | NULL   | YES  | BTREE      |         |               |
| jiashi_guess |          1 | idx_modifi |            1 | modification_date | A         |           2 |     NULL | NULL   | YES  | BTREE      |         |               |
+--------------+------------+------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

mysql> explain select * from jiashi_guess where creation_date > '2018-07-10';
+----+-------------+--------------+------------+------+---------------+------+---------+------+-------+----------+-------------+
| id | select_type | table        | partitions | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra       |
+----+-------------+--------------+------------+------+---------------+------+---------+------+-------+----------+-------------+
|  1 | SIMPLE      | jiashi_guess | NULL       | ALL  | idx_create    | NULL | NULL    | NULL | 58456 |    49.61 | Using where |
+----+-------------+--------------+------------+------+---------------+------+---------+------+-------+----------+-------------+
1 row in set, 1 warning (0.11 sec)

mysql> explain select creation_date from jiashi_guess where creation_date > '2018-07-10';
+----+-------------+--------------+------------+-------+---------------+------------+---------+------+-------+----------+--------------------------+
| id | select_type | table        | partitions | type  | possible_keys | key        | key_len | ref  | rows  | filtered | Extra                    |
+----+-------------+--------------+------------+-------+---------------+------------+---------+------+-------+----------+--------------------------+
|  1 | SIMPLE      | jiashi_guess | NULL       | range | idx_create    | idx_create | 6       | NULL | 29000 |   100.00 | Using where; Using index |
+----+-------------+--------------+------------+-------+---------------+------------+---------+------+-------+----------+--------------------------+
1 row in set, 1 warning (0.02 sec)

mysql> explain select * from jiashi_guess force index(idx_create) where creation_date > '2018-07-10';
+----+-------------+--------------+------------+-------+---------------+------------+---------+------+-------+----------+-----------------------+
| id | select_type | table        | partitions | type  | possible_keys | key        | key_len | ref  | rows  | filtered | Extra                 |
+----+-------------+--------------+------------+-------+---------------+------------+---------+------+-------+----------+-----------------------+
|  1 | SIMPLE      | jiashi_guess | NULL       | range | idx_create    | idx_create | 6       | NULL | 29000 |   100.00 | Using index condition |
+----+-------------+--------------+------------+-------+---------------+------------+---------+------+-------+----------+-----------------------+

4. 分裂

增加值之后，若是节点满了则会分裂，重新构造B+树，InnoDB的page header中能决定分裂是向左或者向右，若是只能向左则会导致空间的浪费，因为最左的节点一直不用分裂

只索引一个列的指定长度字段：alter table t add key idx_b （b（100））,只索引b列的前100个字段

5. cardinality：基数

第一次查看
mysql> show index from jiashi_guess;
+--------------+------------+----------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table        | Non_unique | Key_name | Seq_in_index | Column_name   | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------+------------+----------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| jiashi_guess |          0 | PRIMARY  |            1 | user_id       | A         |       58624 |     NULL | NULL   |      | BTREE      |         |               |
| jiashi_guess |          0 | PRIMARY  |            2 | guess_type_id | A         |       58624 |     NULL | NULL   |      | BTREE      |         |               |
+--------------+------------+----------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
第二次查看
mysql> show index from jiashi_guess;
+--------------+------------+----------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table        | Non_unique | Key_name | Seq_in_index | Column_name   | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------+------------+----------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| jiashi_guess |          0 | PRIMARY  |            1 | user_id       | A         |       58994 |     NULL | NULL   |      | BTREE      |         |               |
| jiashi_guess |          0 | PRIMARY  |            2 | guess_type_id | A         |       58994 |     NULL | NULL   |      | BTREE      |         |               |
+--------------+------------+----------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

jiashi_guess是联合主键，上表说明其cardinality值很高，总行数数据也是，58624
则cardinality/rows_in_table = 1，越接近1则辨识度越高，效果越好。

①cardinality是采样预估的：cardinality是根据采样去计算得出的一个预估值，在update以及insert时就会发生cardinality采样的操作，不过一般计算cardinality的时间成本消耗大，所以就采用采样的方式了，InnoDB默认是8页数据作为样本，若A为所有页数，则cardinality = 8页数据的不同记录个数总和*A/8，所以有可能每次cardinality的值都不一样。

②cardinality是策略性更新：并不是每次update或insert之后，都会计算cardinality的值，而是有两个策略，
策略一：表中1/16的数据改变了，不过这种情况有可能是同一行数据频繁变更，所以有了第二个策略
策略二：stat_modified_count>2 000 000 000,stat_modified_count是InnoDB内部的一个计数器，表示发生变化的次数

6. 联合索引

6.1 键是排序的

联合索引的特质：已经对键进行排序了，包括第二个键（在第一个键值一样的前提下才是排序的），此时，若是联合索引为下表的user_id,day，则语句select * from buy where user_id=1 order by day desc;节省了一次排序，虽然用了order by，当explain结果能知道，extra中没有用到filter sort，这是联合索引的一个好处。

create table buy( user_id int , day date)engine=innodb;
alter table buy add key idx_u(user_id);
alter table buy add key idx_u2(user_id,day);
explain select * from buy where user_id=1;
-- 使用到的索引是idx_u,
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key   | key_len | ref   | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | buy   | NULL       | ref  | idx_u,idx_u2  | idx_u | 5       | const |    1 |   100.00 | NULL  |
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+-------+

explain select * from buy where user_id=1 order by day desc;
+----+-------------+-------+------------+------+---------------+--------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key    | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+--------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | buy   | NULL       | ref  | idx_u,idx_u2  | idx_u2 | 5       | const |    3 |   100.00 | Using index |
+----+-------------+-------+------------+------+---------------+--------+---------+-------+------+----------+-------------+

6.2 第二个键无法用到索引的原因

联合索引：第二个键没办法用到索引的原因，如下图所示，联合索引后，第一个键是顺序排列的，所以能够用到索引：select * from table where 第一个键；而单独的第二个键，则不是顺序的，1，2，1，4，1，2，所以无法用到索引：select * from table where 第二个键=；而在第一个键确定的情况下，比如是1，则第二个键是顺序的，也就是能用到索引：1，2，顺序 select * from table where 第一个键= and 第二个键；

7. 覆盖索引

覆盖索引：即从辅助索引中得到查询的记录，而不需要查询聚集索引中的记录

好处1：覆盖索引不包含整行记录的所有信息，大小远小于聚集索引，能够减少IO操作

好处2: 对于某些统计信息而言，优化器会选择使用覆盖索引而不是聚集索引，如下数据，possible_key未null，而
key却是idx_u，Extra也表明使用到了索引，这就是优化器自动选择覆盖索引的例子。

mysql> explain select count(*) from buy;
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key   | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | buy   | NULL       | index | NULL          | idx_u | 5       | NULL |    6 |   100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+-------------+

8. MRR优化:multi-range read

MMR优化目的：减少磁盘的随机访问，并将随机访问转换成较为顺序的数据访问

select * from salary where key1>1000 and key1<2000 
and key2 = 1000
-- 若是没启动MMR时，是获取key1访问在1000-2000的所有数据后，再进行key2=1000的过滤，无用数据被获取到
-- 启用MMR：优化器将查询条件进行拆分，然后再进行数据查询，拆分：(1000,1000),(1001,1000)...(1999,1000)