索引长度与区分度
以material_data的表为例,这个表中有多大45万的数据;
对matertial_name的字段建立索引;
以matertial_name字段建模普通索引,
mysql> alter table material_data add index idx_material_name(material_name);
Query OK, 0 rows affected
Records: 0 Duplicates: 0 Warnings: 0
执行
select * from material_data where material_name='光缆交接箱改造';
mysql> explain select * from material_data where material_name='光缆交接箱改造';
+----+-------------+---------------+------------+------+-------------------+-------------------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+------+-------------------+-------------------+---------+-------+------+----------+-------+
| 1 | SIMPLE | material_data | NULL | ref | idx_material_name | idx_material_name | 363 | const | 6 | 100 | NULL |
+----+-------------+---------------+------------+------+-------------------+-------------------+---------+-------+------+----------+-------+
1 row in set
可以发现 索引的长度达到363个长度,这会影响索引的使用效率,那么多个长度合适呢?
统计各个长度索引的区分度:
mysql> SELECT count(distinct LEFT(t.material_name ,110))/count(1) from material_data t;
|
索引长度 |
区分度 |
|
1 |
0.0047 |
|
2 |
0.034 |
|
3 |
0.1082 |
|
4 |
0.1635 |
|
5 |
0.2166 |
|
6 |
0.2607 |
|
7 |
0.299 |
|
8 |
0.3335 |
|
9 |
0.3646 |
|
10 |
0.3936 |
|
11 |
0.419 |
|
12 |
0.4411 |
|
13 |
0.462 |
|
14 |
0.4812 |
|
15 |
0.4991 |
|
16 |
0.5158 |
|
17 |
0.5303 |
|
18 |
0.5426 |
|
19 |
0.5536 |
|
20 |
0.5651 |
|
25 |
0.6057 |
|
30 |
0.6303 |
|
40 |
0.6426 |
|
50 |
0.6426 |
|
60 |
0.6426 |
|
70 |
0.6426 |
|
80 |
0.6426 |
|
90 |
0.6426 |
|
100 |
0.6426 |
|
110 |
0.6426 |
|
120 |
0.6426 |
可以发现大概在15个长度左右比较高效;
mysql> alter table material_data add index idx_material_name(material_name(15));
mysql> explain select * from material_data where material_name='光缆交接箱改造';
+----+-------------+---------------+------------+------+-------------------+-------------------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+------+-------------------+-------------------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | material_data | NULL | ref | idx_material_name | idx_material_name | 48 | const | 6 | 100 | Using where |
+----+-------------+---------------+------------+------+-------------------+-------------------+---------+-------+------+----------+-------------+
1 row in set
本文探讨了在material_data表中对material_name字段建立索引的效率问题,通过对比不同长度索引的区分度,发现15个字符长度的索引在保持高区分度的同时,也提高了查询效率。
1226

被折叠的 条评论
为什么被折叠?



