“当复杂的SQL不在需要特别的优化”，邪修研究PolarDB for PG 列式索引加速复杂SQL运行

最新推荐文章于 2025-11-13 08:01:07 发布

原创

最新推荐文章于 2025-11-13 08:01:07 发布 · 889 阅读

29 ·

CC 4.0 BY-SA版权

文章标签：

#sql #数据库

❝
开头还是介绍一下群，如果感兴趣PolarDB ,MongoDB ,MySQL ,PostgreSQL ,Redis, OceanBase, Sql Server等有问题，有需求都可以加群群内有各大数据库行业大咖，可以解决你的问题。加群请联系 liuaustin3 ，（共3300人左右 1 + 2 + 3 + 4 +5 + 6 + 7 + 8 +9）(1 2 3 4 5 6 7群均已爆满，开8群近400 9群 200+，开10群PolarDB专业学习群100+)

PolarDB for PostgreSQL 中包含了了一个Polar_csi的插件，通过在PolarDB for PostgreSQL 上安装插件的方式来使用向量引擎，列式索引。

这里需要提醒使用列式索引的前提条件

1 wal_level 参数必须设置为logical

2 一张表只能有一个列式索引

3 列式索引建立后不能修改，只能重建

4 安装polar_csi插件后，需要在数据库中执行create extension polar_csi 命令

5 是否启用polar_csi 有开关命令可以进行验证

同时要修改参数

polar_csi.enable_query

test=> create extension polar_csi;
CREATE EXTENSION
test=> 
test=> -- 创建 customers 表
test=> CREATE TABLE customers (
test(>     customer_id SERIAL PRIMARY KEY,
test(>     name VARCHAR(100),
test(>     email VARCHAR(100),
test(>     created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
test(> );
CREATE TABLE
test=> 
test=> -- 创建 products 表
test=> CREATE TABLE products (
test(>     product_id SERIAL PRIMARY KEY,
test(>     name VARCHAR(100),
test(>     price DECIMAL(10, 2),
test(>     created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
test(> );
CREATE TABLE
test=> 
test=> -- 创建 orders 表
test=> CREATE TABLE orders (
test(>     order_id SERIAL PRIMARY KEY,
test(>     customer_id INT REFERENCES customers(customer_id),
test(>     product_id INT REFERENCES products(product_id),
test(>     order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
test(>     quantity INT,
test(>     total_amount DECIMAL(10, 2)
test(> );

针对这三张表我们每张表插入100，200万数据。

test=> \d
                    List of relations
 Schema |           Name            |   Type   |  Owner   
--------+---------------------------+----------+----------
 public | customers                 | table    | dba_test
 public | customers_customer_id_seq | sequence | dba_test
 public | orders                    | table    | dba_test
 public | orders_order_id_seq       | sequence | dba_test
 public | products                  | table    | dba_test
 public | products_product_id_seq   | sequence | dba_test
(6 rows)

test=> select count(*) from customers;
  count  
---------
 2000000
(1 row)

test=> explain analyze select count(*) from customers;
                                                                               QUERY PLAN                                                                                
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=27472.22..27472.23 rows=1 width=8) (actual time=139.527..142.490 rows=1 loops=1)
   ->  Gather  (cost=27472.00..27472.21 rows=2 width=8) (actual time=136.042..142.482 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=26472.00..26472.01 rows=1 width=8) (actual time=125.038..125.038 rows=1 loops=3)
               ->  Parallel Index Only Scan using customers_pkey on customers  (cost=0.43..24388.62 rows=833354 width=0) (actual time=0.202..90.587 rows=666667 loops=3)
                     Heap Fetches: 0
 Planning Time: 0.040 ms
 Execution Time: 142.566 ms
(9 rows)

test=> select count(*) from orders;
  count  
---------
 1000000
(1 row)

test=> select count(*) from products;
  count  
---------
 2000000
(1 row)

test=>

同时在数据库查询中，无法优化的SQL聚合加子查询，且没有数据的过滤,即使建立了索引也无法使用还是要走全表扫描等。

test=> explain analyze SELECT 
test->     c.customer_id,
test->     c.name,
test->     COALESCE(order_counts.total_orders, 0) AS total_orders,
test->     COALESCE(order_totals.total_spent, 0) AS total_spent
test-> FROM 
test->     customers c
test-> LEFT JOIN (
test(>     SELECT 
test(>         customer_id,
test(>         COUNT(order_id) AS total_orders
test(>     FROM 
test(>         orders
test(>     GROUP BY 
test(>         customer_id
test(> ) AS order_counts ON c.customer_id = order_counts.customer_id
test-> LEFT JOIN (
test(>     SELECT 
test(>         customer_id,
test(>         SUM(total_amount) AS total_spent
test(>     FROM 
test(>         orders
test(>     GROUP BY 
test(>         customer_id
test(> ) AS order_totals ON c.customer_id = order_totals.customer_id
test-> ORDER BY 
test->     total_spent DESC;
                                                                          QUERY PLAN                                                                           
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather Merge  (cost=217678.61..412136.56 rows=1666666 width=77) (actual time=1407.023..2104.550 rows=2000000 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Sort  (cost=216678.59..218761.92 rows=833333 width=77) (actual time=1323.073..1454.480 rows=666667 loops=3)
         Sort Key: (COALESCE(order_totals.total_spent, '0'::numeric)) DESC
         Sort Method: external merge  Disk: 46088kB
         Worker 0:  Sort Method: external merge  Disk: 52096kB
         Worker 1:  Sort Method: external merge  Disk: 21328kB
         ->  Hash Left Join  (cost=47408.07..91348.40 rows=833333 width=77) (actual time=633.505..938.891 rows=666667 loops=3)
               Hash Cond: (c.customer_id = order_totals.customer_id)
               ->  Hash Left Join  (cost=23704.03..65456.86 rows=833333 width=45) (actual time=292.432..479.223 rows=666667 loops=3)
                     Hash Cond: (c.customer_id = order_counts.customer_id)
                     ->  Parallel Seq Scan on customers c  (cost=0.00..39565.33 rows=833333 width=37) (actual time=0.003..65.482 rows=666667 loops=3)
                     ->  Hash  (cost=23704.02..23704.02 rows=1 width=12) (actual time=292.407..292.409 rows=1 loops=3)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Subquery Scan on order_counts  (cost=23704.00..23704.02 rows=1 width=12) (actual time=292.398..292.400 rows=1 loops=3)
                                 ->  HashAggregate  (cost=23704.00..23704.01 rows=1 width=12) (actual time=292.397..292.398 rows=1 loops=3)
                                       Group Key: orders.customer_id
                                       Batches: 1  Memory Usage: 25kB
                                       Worker 0:  Batches: 1  Memory Usage: 25kB
                                       Worker 1:  Batches: 1  Memory Usage: 25kB