100x faster Postgres performance by changing 1 line

通过将ANY(ARRAY[...])替换为ANY(VALUES(...)),在PostgreSQL 9.0中,一个原本耗时20秒的查询被优化到仅需0.2秒。本文详细介绍了如何使用EXPLAIN ANALYZE来诊断和解决PostgreSQL性能问题。

 

A simple but non-obvious one-line change (ANY(ARRAY[...]) to ANY(VALUES(...))) in a (bad) PostgreSQL 9.0 query cuts query time from 20s to 0.2s. Starting with low-level metrics we make our way to your best friend: EXPLAIN ANALYZE. The amount of time invested will pay off a hundred times over. The Postgres community is your second best friend.

Monitoring slow Postgres queries with Postgres

Earlier this week the performance of one of our (many) databases was plagued by a few pathologically large, primary-key queries in a smallish table (10 GB, 15 million rows) used to feed our graph editor.

In 99.9% of accounts these queries would be zippy. In a few cases where the number of tags used to annotate metrics is large, these queries would take up to 20 seconds. That much time spent in a database meant that someone was waiting in front of their browser for the graph editor to respond. This is obviously not the best experience for the unlucky 0.1%.

Here is the surprisingly simple culprit:

Table X contains a few thousand rows, Table C contains 15 millions rows. Both “key” columns are primary keys with proper indexing. Pretty straightforward, it’s a simple primary key lookup. Where things get interesting is when you increase the number of keys. At 11,000 keys we get the following plan, by prefixing EXPLAIN (ANALYZE, BUFFERS) to the query:

As you can see at the bottom of the plan, the query took 22 seconds to execute. These 22 seconds can be visualized on the following graph as pure CPU execution split 90/10 between Postgres and the OS; very little disk I/O.

At the lowest level these queries would look like these spikes of CPU utilization. CPU graphs are seldom useful but in this case it confirmed a crucial point: the database was not waiting for the disks to read data. It was doing things like sorting, hashing and comparing rows.

The second interesting metric that tracks the spikes very closely is the number of rows that are “fetched” by Postgres (in this case not returned, just looked at and discarded).

Clearly something is regularly and methodically going through a lot of rows: our query.

The Postgres performance problem: Bitmap Heap Scan

The rows_fetched metric is consistent with the following part of the plan:

Postgres is reading Table C using a Bitmap Heap Scan. When the number of keys to check stays small, it can efficiently use the index to build the bitmap in memory. If the bitmap gets too large, the query optimizer changes the way it looks up data. In our case it has a large number of keys to check so it uses the more approximative way to retrieve the candidate rows and checks each row individually for a match on x_key and tags. All this “loading in memory” and “checking individual row” takes time (the Recheck Cond in the plan). Luckily for us the table is 30% loaded in RAM so it is not as bad as retrieving the rows from disk. It still has a very noticeable impact on Postgres performance. Remember that the query is quite simple. It’s a primary key lookup so there aren’t many obvious ways to fix it without dramatically re-architecting the database or the application. We turned to the community for help via the PGSQL-Performance mailing list.

The solution

This is yet another reason why we love open source and its culture of helping users. Tom Lane, one of the most prolific open source authors around, suggested to try this instead:

Can you spot the difference? ARRAY has been changed to VALUES.

The query optimizer is fooled by the use of ARRAY[...] to list all the primary keys to look up. VALUES (...), (...) lets the optimizer make full use of the primary key index instead. It is literally a one-line change, which makes no semantic difference.

Here is the plan of the new query. The 2 major differences are on lines 3 and 14.

From 22,000ms to 200ms. That’s a 100x speedup for a simple one-line change.

The new query in production

A short code deploy later:

And the database looks much happier.

The takeaway

The slow Postgres query is gone. The 0.1% unlucky few who would have been affected by the issue are happy too. Seeing the impact of the change using Datadog allowed us to instantly validate that altering that part of the query was the right thing to do.

Update: this change need only be applied on 9.0. As of Postgres 9.3, ANY(ARRAY[...]) is performing properly again.

You can get that visibility set up in minutes by signing up for a free Datadog account if you’re looking to try out this change to your Postgres queries.

基于可靠性评估序贯蒙特卡洛模拟法的配电网可靠性评估研究(Matlab代码实现)内容概要:本文围绕“基于可靠性评估序贯蒙特卡洛模拟法的配电网可靠性评估研究”,介绍了利用Matlab代码实现配电网可靠性的仿真分析方法。重点采用序贯蒙特卡洛模拟法对配电网进行长时间段的状态抽样与统计,通过模拟系统元件的故障与修复过程,评估配电网的关键可靠性指标,如系统停电频率、停电持续时间、负荷点可靠性等。该方法能够有效处理复杂网络结构与设备时序特性,提升评估精度,适用于含分布式电源、电动汽车等新型负荷接入的现代配电网。文中提供了完整的Matlab实现代码与案例分析,便于复现和扩展应用。; 适合人群:具备电力系统基础知识和Matlab编程能力的高校研究生、科研人员及电力行业技术人员,尤其适合从事配电网规划、运行与可靠性分析相关工作的人员; 使用场景及目标:①掌握序贯蒙特卡洛模拟法在电力系统可靠性评估中的基本原理与实现流程;②学习如何通过Matlab构建配电网仿真模型并进行状态转移模拟;③应用于含新能源接入的复杂配电网可靠性定量评估与优化设计; 阅读建议:建议结合文中提供的Matlab代码逐段调试运行,理解状态抽样、故障判断、修复逻辑及指标统计的具体实现方式,同时可扩展至不同网络结构或加入更多不确定性因素进行深化研究。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值