对拥有一个几十万行表的 MySQL 性能优化的简单办法

最新推荐文章于 2022-04-24 17:04:09 发布

原创最新推荐文章于 2022-04-24 17:04:09 发布 · 1.3k 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#mysql #性能优化 #table #数据库 #blog #disk

数据库应用层专栏收录该内容

2 篇文章

订阅专栏

本文围绕数据库优化展开，以MySQL为例，讲述对拥有几十万行表的group by优化。排除系统等因素后，核心是配置参数调整。通过分析slow log和EXPLAIN结果，发现问题与临时表有关，调整tmp_table_size参数后，数据库性能显著提升，还指出TMP空间充足是系统通用原则。

由徐永久发表于 2005年05月12日 00:30。数据库的优化大概是在系统管理中最具有挑战性的了，因为其对人员的素质要求几乎是全方面的，好的 DBA 需要各种综合素质。在排除了操作系统，应用等引起的性能问题以外，优化数据库最核心的实际上就是配置参数的调整。本文通过一个简单的参数调整，实现了对拥有一个几十万行表的 group by 优化的例子。通过这个简单的调整，数据库性能有了突飞猛进的提升。
本例子是针对 MySQL 调整的，不像其他商业数据库，MySQL 没有视图，特别是 Oracle 可以利用固化视图来提升查询性能，没有存储过程，因此性能的调整几乎只能通过配置合适的参数来实现。

调整的具体步骤(例子针对 pLog 0.3x 的博客系统)：

发现最多的 slow log 是：
SELECT category_id, COUNT(*) AS 'count' FROM plog_articles WHERE blog_id = 2 AND status = 'PUBLISHED' group by category_id;
一般在 20s 以上，甚至 30s 。
而当 blog_id=1 或者其他时，都能很快的选出结果。
于是怀疑索引有问题，重新建立索引，但无济于事。 EXPLAIN 结果如下：
mysql> EXPLAIN SELECT category_id, COUNT(*) AS 'count' FROM plog_articles WHERE blog_id = 2 AND status = 'PUBLISHED' group by category_id;
+---------------+------+------------------+------------------+---------+-------------+------+----------------------------------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+---------------+------+------------------+------------------+---------+-------------+------+----------------------------------------------+
| plog_articles | ref | idx_article_blog | idx_article_blog | 5 | const,const | 4064 | Using where; Using temporary; Using filesort |
+---------------+------+------------------+------------------+---------+-------------+------+----------------------------------------------+
1 row in set (0.00 sec)

于是想到每次查看 blog_id = 2 的博客时，系统负载就提高，有较高的 swap 。于是查看 temporary table 有关的资料，果然有这样的说法：

If you create a lot of disk-based temporary tables, increase the size of tmp_table_size if you can do so safely. Keep in mind that setting the value too high may result in excessive swapping or MySQL running out of memory if too many threads attempt to allocate in-memory temporary tables at the same time. Otherwise, make sure that tmpdir points to a very fast disk that's not already doing lots of I/O.
Another problem that doesn't show up in the slow query log is an excessive use of disk-based temporary tables. In the output of EXPLAIN, you'll often see Using temporary. It indicates that MySQL must create a temporary table to complete the query. However, it doesn't tell you whether that temporary table will be in memory or on disk. That's controlled by the size of the table and MySQL's tmp_table_size variable.
If the space required to build the temporary table is less than or equal to tmp_table_size, MySQL keeps it in memory rather than incur the overhead and time required to write the data to disk and read it again. However, if the space required exceeds tmp_table_size, MySQL creates a disk-based table in its tmpdir directory (often /tmp on Unix systems.) The default tmp_table_size size is 32 MB.
To find out how often that happens, compare the relative sizes of the Created_tmp_tables and Created_tmp_disk_tables counters:

调整 tmp_table_size为 80M 左右后，以上语句 14s 即可解决。

这个参数是 DBA 很容易忽视的。

其实，不单单是数据库，就是操作系统，也是受 tmp 的影响巨大，例如安装软件到 d: 盘，如果 TMP 环境变量指向 c: 盘，而 c: 空间不够，照样可能导致安装失败。

因此让 TMP 有足够的空间可以说是计算机系统里一个普遍适用的原则(写程序也是一样)。