postgresq-自定义执行计划(custom plan)与generic plan(通用执行计划)

文章目录


之前写过一篇关于 PostgreSQL prepare sql的文章,但当时没有提到generic plan(通用计划)和custom plan(自定义计划)这两个概念。现在将通过举例介绍这两个概念。

创建测试表:

postgres=# create database demo;
CREATE DATABASE
postgres=# \c demo
You are now connected to database "demo" as user "postgres".
demo=# create table t1(a int,b text);
CREATE TABLE
demo=# insert into t1 select i,'aaa' from generate_series(1,100) i;
INSERT 0 100
demo=# insert into t1 select i,'bbb' from generate_series(101,200) i;
INSERT 0 100
demo=# insert into t1 select i,'ccc' from generate_series(201,300) i;
INSERT 0 100

在有了数据之后,可以创建一个预处理语句(prepared statement),这个语句的结构是固定的,但允许在执行时插入不同的参数值。这样可以实现代码重用,并提高查询的执行效率。

demo=# prepare pre_stmt as select * from t1 where b=$1;
PREPARE

在 PostgreSQL 中,预处理语句(prepared statements)会在当前会话中注册。如果想查看当前会话中有哪些预处理语句已经被创建并可用,可以通过查询系统视图 pg_prepared_statements 来获取这些信息。

   name   |                    statement                     |         prepare_time          | parameter_types |  result_types  | from_sql | generic_plans | custom_plans
----------+--------------------------------------------------+-------------------------------+-----------------+----------------+----------+---------------+--------------
 pre_stmt | prepare pre_stmt as select * from t1 where b=$1; | 2025-01-01 18:32:43.697846+07 | {text}          | {integer,text} | t        |             0 |            0
(1 row)

当我们对一个语句运行 EXPLAIN (ANALYZE) 时,可以看到 PostgreSQL 为该语句生成的实际执行计划以及执行的统计信息。这包括查询是如何被优化和执行的,例如是否使用了索引、查询的执行时间、返回的行数等。

demo=# explain (analyze) execute pre_stmt ('aaa');
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..5.75 rows=100 width=8) (actual time=0.032..0.252 rows=100 loops=1)
   Filter: (b = 'aaa'::text)
   Rows Removed by Filter: 200
 Planning Time: 0.568 ms
 Execution Time: 0.287 ms
(5 rows)

在执行计划的“Filter”行中,第一次执行时,执行计划会显示你传入的实际参数值(比如 ‘aaa’),这时是 自定义计划(custom plan)。但是,当你多次执行这个预处理语句时,执行计划中filter行的参数值会变成显示占位符(如 $1),这时是 通用计划(generic plan)。

demo=# explain (analyze) execute pre_stmt ('aaa');
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..5.75 rows=100 width=8) (actual time=0.107..0.255 rows=100 loops=1)
   Filter: (b = 'aaa'::text)
   Rows Removed by Filter: 200
 Planning Time: 0.106 ms
 Execution Time: 0.308 ms
(5 rows)

demo=# explain (analyze) execute pre_stmt ('aaa');
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..5.75 rows=100 width=8) (actual time=0.045..0.255 rows=100 loops=1)
   Filter: (b = 'aaa'::text)
   Rows Removed by Filter: 200
 Planning Time: 0.144 ms
 Execution Time: 0.297 ms
(5 rows)

demo=# explain (analyze) execute pre_stmt ('aaa');
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..5.75 rows=100 width=8) (actual time=0.049..0.259 rows=100 loops=1)
   Filter: (b = 'aaa'::text)
   Rows Removed by Filter: 200
 Planning Time: 0.293 ms
 Execution Time: 0.309 ms
(5 rows)

demo=# explain (analyze) execute pre_stmt ('aaa');
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..5.75 rows=100 width=8) (actual time=0.041..0.248 rows=100 loops=1)
   Filter: (b = 'aaa'::text)
   Rows Removed by Filter: 200
 Planning Time: 0.279 ms
 Execution Time: 0.289 ms
(5 rows)

demo=# explain (analyze) execute pre_stmt ('aaa');
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..5.75 rows=100 width=8) (actual time=0.034..0.215 rows=100 loops=1)
   Filter: (b = $1)
   Rows Removed by Filter: 200
 Planning Time: 0.101 ms
 Execution Time: 0.245 ms
(5 rows)

在执行查询时,最开始会看到执行计划中显示实际传入的参数值,但随着执行次数的增加,执行计划会变成使用占位符(例如 $1)来表示参数。此时,查询使用的是通用计划。通用计划一旦生成后,无论传入什么不同的参数值,它的执行计划都不会再发生变化,直到该预处理语句结束。

demo=# explain (analyze) execute pre_stmt ( 'bbb' );
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..5.75 rows=100 width=8) (actual time=0.111..0.367 rows=100 loops=1)
   Filter: (b = $1)
   Rows Removed by Filter: 200
 Planning Time: 0.181 ms
 Execution Time: 0.412 ms
(5 rows)

demo=# explain (analyze) execute pre_stmt ( 'ccc' );
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..5.75 rows=100 width=8) (actual time=0.165..0.249 rows=100 loops=1)
   Filter: (b = $1)
   Rows Removed by Filter: 200
 Planning Time: 0.025 ms
 Execution Time: 0.289 ms
(5 rows)

demo=# explain (analyze) execute pre_stmt ( null );
                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..5.75 rows=100 width=8) (actual time=0.232..0.233 rows=0 loops=1)
   Filter: (b = $1)
   Rows Removed by Filter: 300
 Planning Time: 0.022 ms
 Execution Time: 0.255 ms
(5 rows)

如果你查看 PostgreSQL 的源代码(特别是 src/backend/utils/cache/plancache.c 文件),你将能理解为什么在执行 5 次之后,执行计划会发生变化。
这个文件包含了与查询计划缓存相关的逻辑,解释了为什么预处理语句在执行多次后会从“自定义计划”变为“通用计划”。

/*
 * choose_custom_plan: choose whether to use custom or generic plan
 *
 * This defines the policy followed by GetCachedPlan.
 */
static bool
choose_custom_plan(CachedPlanSource *plansource, ParamListInfo boundParams)
{
        double          avg_custom_cost;
 
        /* One-shot plans will always be considered custom */
        if (plansource->is_oneshot)
                return true;
 
        /* Otherwise, never any point in a custom plan if there's no parameters */
        if (boundParams == NULL)
                return false;
        /* ... nor for transaction control statements */
        if (IsTransactionStmtPlan(plansource))
                return false;
 
        /* See if caller wants to force the decision */
        if (plansource->cursor_options & CURSOR_OPT_GENERIC_PLAN)
                return false;
        if (plansource->cursor_options & CURSOR_OPT_CUSTOM_PLAN)
                return true;
 
        /* Generate custom plans until we have done at least 5 (arbitrary) */
        if (plansource->num_custom_plans < 5)
                return true;

一旦 PostgreSQL 使用了通用计划,即使数据改变并重新分析表,执行计划也不会再改变。

demo=# insert into t1 select i,'ddd' from generate_series(201,210) i;
INSERT 0 10
demo=# insert into t1 select i,'eee' from generate_series(211,211) i;
INSERT 0 1
demo=# analyze t1;
ANALYZE
demo=# select b,count(*) from t1 group by b order by b;
  b  | count
-----+-------
 aaa |   100
 bbb |   100
 ccc |   100
 ddd |    10
 eee |     1
(5 rows)
demo=# explain (analyze) execute pre_stmt('ddd');
                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..5.89 rows=62 width=8) (actual time=0.059..0.070 rows=10 loops=1)
   Filter: (b = $1)
   Rows Removed by Filter: 301
 Planning Time: 0.021 ms
 Execution Time: 0.184 ms
(5 rows)

当数据量增加、数据分布不均匀,并且在某一列(比如“b”列)上建立了索引时,查询的执行计划可能会发生变化,PostgreSQL 会根据这些因素选择不同的执行策略。

demo=# insert into t1 select i, 'aaa' from generate_series (1,2000000) i;
INSERT 0 1000000
demo=# insert into t1 select i, 'bbb' from generate_series (1000001,3000000) i;
INSERT 0 1000000
demo=# insert into t1 select i, 'ccc' from generate_series (2000001,3000000) i;
INSERT 0 1000000
demo=# insert into t1 select i, 'eee' from generate_series (3000001,3000010) i;
INSERT 0 10
demo=# create index idx_b on t1(b);
CREATE INDEX
demo=#  select b,count(*) from t1 group by b order by b;
  b  |  count
-----+---------
 aaa | 1000000
 bbb | 1000000
 ccc | 1000000
 eee |      10
 
(4 rows)

无论我们执行这个查询多少次(查询的是参数‘eee’),PostgreSQL 都不会使用通用计划,而是会持续使用自定义计划。

demo=# explain (analyze) execute pre_stmt('eee');
                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 Index Scan using idx_b on t1  (cost=0.43..4.45 rows=1 width=8) (actual time=0.039..0.048 rows=10 loops=1)
   Index Cond: (b = 'eee'::text)
 Planning Time: 0.287 ms
 Execution Time: 0.076 ms
(4 rows)
------>重复执行多次,但至少执行 10 次。
demo=# explain (analyze) execute pre_stmt('eee');
                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 Index Scan using idx_b on t1  (cost=0.43..4.45 rows=1 width=8) (actual time=0.037..0.045 rows=10 loops=1)
   Index Cond: (b = 'eee'::text)
 Planning Time: 0.137 ms
 Execution Time: 0.066 ms
(4 rows)

在某些情况下,当数据分布不均匀时(比如某列中有很多重复值,只有少数值是稀有的,比如上面例子中出现的b=‘eee’),即使考虑到重新生成执行计划的开销,自定义计划的执行成本仍然比通用计划低。这样,PostgreSQL 会优先选择使用自定义计划,而不会使用通用计划,甚至在重新规划的开销考虑进去之后,通用计划也可能永远不会被使用。

顺便提一下啊,在postgresql 15后,explain增加了generic_plan选型来供我们方便的查看通用执行计划

demo=# EXPLAIN (GENERIC_PLAN) SELECT * FROM t1 WHERE t1 = $1;
                               QUERY PLAN
------------------------------------------------------------------------
 Gather  (cost=1000.00..31400.05 rows=15000 width=8)
   Workers Planned: 2
   ->  Parallel Seq Scan on t1  (cost=0.00..28900.05 rows=6250 width=8)
         Filter: (t1.* = $1)
(4 rows)

总的来说,PostgreSQL 15 引入了 EXPLAIN (GENERIC_PLAN) 选项,使得开发者可以明确地查看通用计划,而不受参数变化的影响。

如果觉得文章有点价值,请帮忙点关注,谢谢

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值