查询集群状态:
[Ruby@gs01 ~]$ cm_ctl query -Cv
[ CMServer State ]
node instance state
------------------------
1 gs01 1 Primary
2 gs02 2 Standby
3 gs03 3 Standby
[ ETCD State ]
node instance state
------------------------------
1 gs01 7001 StateLeader
2 gs02 7002 StateFollower
3 gs03 7003 StateFollower
[ Cluster State ]
cluster_state : Normal
redistributing : No
balanced : Yes
current_az : AZ_ALL
[ Coordinator State ]
node instance state
------------------------
1 gs01 5001 Normal
2 gs02 5002 Normal
3 gs03 5003 Normal
[ Central Coordinator State ]
node instance state
------------------------
2 gs02 5002 Normal
[ GTM State ]
node instance state sync_state
--------------------------------------------------------
1 gs01 1001 P Primary Connection ok Sync
2 gs02 1002 S Standby Connection ok Sync
3 gs03 1003 S Standby Connection ok Sync
[ Datanode State ]
node instance state | node instance state | node instance state
------------------------------------------------------------------------------------------------------------
1 gs01 6001 P Primary Normal | 2 gs02 6002 S Standby Normal | 3 gs03 6003 S Standby Normal
2 gs02 6004 P Primary Normal | 3 gs03 6005 S Standby Normal | 1 gs01 6006 S Standby Normal
3 gs03 6007 P Primary Normal | 1 gs01 6008 S Standby Normal | 2 gs02 6009 S Standby Normal
[Ruby@gs01 ~]$
一、关联列非分布列
构建测试表及测试数据,执行相同SQL,对比分布列不同对执行计划的影响。
1.登录数据库。
gsql -d postgres -p 8000 -r
2.设置 max_datanode_for_plan 以便后续查看具体执行计划。
SET max_datanode_for_plan = 3;
3.创建测试表。
DROP TABLE IF EXISTS test_2_1;
CREATE TABLE test_2_1(
cno int,
cname varchar(10),
tno int
)DISTRIBUTE BY hash(cno);
DROP TABLE IF EXISTS test_2_2;
CREATE TABLE test_2_2(
sno int,
cno int,
dgeree int
) DISTRIBUTE BY hash(sno);
4.插入测试数据,大约60s。
INSERT INTO test_2_1 SELECT generate_series(1,100000), generate_series(1,100000);
INSERT INTO test_2_2 SELECT generate_series(1,1000), generate_series(1,1000);
5.设置打印风格为pretty。
set explain_perf_mode=pretty;
6.关联条件非分布列,执行测试SQL。
EXPLAIN (costs off, ANALYZE on)
SELECT t1.cno, t2.sno
FROM test_2_1 t1, test_2_2 t2
WHERE t1.cno = t2.cno;
可以看出,此时计划中出现REDISTRIBUTE算子,该算子在数据量较大时会对性能造成较大影响,可以通过修改分布键消除该算子以提升效率。
gaussdb=# EXPLAIN (costs off, ANALYZE on)
gaussdb-# SELECT t1.cno, t2.sno
gaussdb-# FROM test_2_1 t1, test_2_2 t2
gaussdb-# WHERE t1.cno = t2.cno;
id | operation | A-time | A-rows | Peak Memory | A-width
----+-----------------------------------------+-----------------+--------+-------------+---------
1 | -> Streaming (type: GATHER) | 59.315 | 1000 | 82KB |
2 | -> Hash Join (3,5) | [12.769,50.412] | 1000 | [8KB,8KB] |
3 | -> Streaming(type: REDISTRIBUTE) | [0.150,38.386] | 1000 | [44KB,44KB] |
4 | -> Seq Scan on test_2_2 t2 | [0.082,0.094] | 1000 | [53KB,53KB] |
5 | -> Hash | [11.482,12.073] | 100000 | [1MB,1MB] | [20,20]
6 | -> Seq Scan on test_2_1 t1 | [5.817,6.285] | 100000 | [52KB,52KB] |
(6 rows)
Predicate Information (identified by plan id)
-----------------------------------------------
2 --Hash Join (3,5)
Hash Cond: (t2.cno = t1.cno)
(2 rows)
Memory Information (identified by plan id)
----------------------------------------------------------------------
Coordinator Query Peak Memory:
Query Peak Memory: 0MB
Datanode:
Max Query Peak Memory: 0MB
Min Query Peak Memory: 0MB
5 --Hash
Max Buckets: 32768 Max Batches: 1 Max Memory Usage: 1181kB
Min Buckets: 32768 Min Batches: 1 Min Memory Usage: 1163kB
(8 rows)
User Define Profiling
---------------------------------------------------------------------------
Segment Id: 2 Track name: Datanode build connection
(actual time=[0.357, 5.080], calls=[1, 1])
Plan Node id: 1 Track name: coordinator get datanode connection
(actual time=[2.641, 2.641], calls=[1, 1])
Plan Node id: 1 Track name: Coordinator check and update node definition
(actual time=[0.003, 0.003], calls=[1, 1])
Plan Node id: 1 Track name: Coordinator serialize plan
(actual time=[0.740, 0.740], calls=[1, 1])
Plan Node id: 1 Track name: Coordinator send query id with sync
(actual time=[0.257, 0.257], calls=[1, 1])
Plan Node id: 1 Track name: Coordinator send begin command
(actual time=[0.002, 0.002], calls=[1, 1])
Plan Node id: 1 Track name: Coordinator start transaction and send query
(actual time=[0.102, 0.102], calls=[1, 1])
Plan Node id: 2 Track name: Datanode start up stream thread
(actual time=[0.087, 0.203], calls=[1, 1])
(16 rows)
====== Query Summary =====
------------------------------------------------------------------------------------------
Datanode executor start time [dn_6001_6002_6003, dn_6007_6008_6009]: [2.671 ms,5.298 ms]
Datanode executor run time [dn_6007_6008_6009, dn_6001_6002_6003]: [12.902 ms,50.542 ms]
Datanode executor end time [dn_6007_6008_6009, dn_6004_6005_6006]: [0.103 ms,0.139 ms]
Coordinator executor start time: 0.371 ms
Coordinator executor run time: 59.377 ms
Coordinator executor end time: 0.022 ms
Planner runtime: 0.552 ms
Plan size: 4157 byte
Query Id: 72339069014829424
Total runtime: 59.783 ms
(10 rows)
我们在分布式场景下创建表时,可以联想以后可能会引用该表常用SQL,并且把常用sql中的关联列或者聚集列作为分布列,以此减少数据重分布带来的开销。
7.修改测试表test_2_2的分片键为cno
DROP TABLE test_2_2;
CREATE TABLE test_2_2(
sno int,
cno int,
dgeree int
) DISTRIBUTE BY hash(cno);
8.重新插入测试数据 -- 大约1s
INSERT INTO test_2_2 SELECT generate_series(1,1000), generate_series(1,1000);
9.执行测试SQL。
EXPLAIN ANALYZE
SELECT t1.cno, t2.sno
FROM test_2_1 t1, test_2_2 t2
WHERE t1.cno = t2.cno;
修改完成以后我们重新执行测试可以发现sql执行时间得到了优化。
按新分布键创建表后执行计划及执行时间如下:
gaussdb=# EXPLAIN ANALYZE
gaussdb-# SELECT t1.cno, t2.sno
gaussdb-# FROM test_2_1 t1, test_2_2 t2
gaussdb-# WHERE t1.cno = t2.cno;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Data Node Scan (cost=0.00..0.00 rows=0 width=0) (actual time=9.436..9.687 rows=1000 loops=1)
Node/s: All datanodes
Total runtime: 9.821 ms
(3 rows)
10.查看DN具体执行计划。
EXPLAIN VERBOSE
SELECT t1.cno, t2.sno
FROM test_2_1 t1, test_2_2 t2
WHERE t1.cno = t2.cno;
具体执行计划如下:
由于我们重新分配了表的分布列,现在表的关联列就是表的分布列。这也就意味着在执行join算子时不需要去其他DN节点获取数据,可以直接在本节点完成计算,减少了REDISTRIBUTE算子带来的额外传输开销。
gaussdb=# EXPLAIN VERBOSE
gaussdb-# SELECT t1.cno, t2.sno
gaussdb-# FROM test_2_1 t1, test_2_2 t2
gaussdb-# WHERE t1.cno = t2.cno;
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Data Node Scan (cost=0.00..0.00 rows=0 width=0)
Output: t1.cno, t2.sno
Node/s: All datanodes
Remote query: SELECT t1.cno, t2.sno FROM public.test_2_1 t1, public.test_2_2 t2 WHERE t1.cno = t2.cno
Remote SQL: SELECT t1.cno, t2.sno FROM public.test_2_1 t1, public.test_2_2 t2 WHERE t1.cno = t2.cno
Datanode Name: dn_6001_6002_6003
Hash Join (cost=53.76..2821.58 rows=209282 distinct=[200, 200] width=8)
Output: t1.cno, t2.sno
Hash Cond: (t1.cno = t2.cno)
-> Seq Scan on public.test_2_1 t1 (cost=0.00..352.20 rows=21520 width=4)
Output: t1.cno, t1.cname, t1.tno
-> Hash (cost=29.45..29.45 rows=1945 width=8)
Output: t2.sno, t2.cno
-> Seq Scan on public.test_2_2 t2 (cost=0.00..29.45 rows=1945 width=8)
Output: t2.sno, t2.cno
Datanode Name: dn_6004_6005_6006
Hash Join (cost=53.76..2821.58 rows=209282 distinct=[200, 200] width=8)
Output: t1.cno, t2.sno
Hash Cond: (t1.cno = t2.cno)
-> Seq Scan on public.test_2_1 t1 (cost=0.00..352.20 rows=21520 width=4)
Output: t1.cno, t1.cname, t1.tno
-> Hash (cost=29.45..29.45 rows=1945 width=8)
Output: t2.sno, t2.cno
-> Seq Scan on public.test_2_2 t2 (cost=0.00..29.45 rows=1945 width=8)
Output: t2.sno, t2.cno
Datanode Name: dn_6007_6008_6009
Hash Join (cost=53.76..2801.26 rows=207745 distinct=[200, 200] width=8)
Output: t1.cno, t2.sno
Hash Cond: (t1.cno = t2.cno)
-> Seq Scan on public.test_2_1 t1 (cost=0.00..349.62 rows=21362 width=4)
Output: t1.cno, t1.cname, t1.tno
-> Hash (cost=29.45..29.45 rows=1945 width=8)
Output: t2.sno, t2.cno
-> Seq Scan on public.test_2_2 t2 (cost=0.00..29.45 rows=1945 width=8)
Output: t2.sno, t2.cno
(39 rows)
分片表选取分布列需要考虑哪些因素?
回答:
1. 应使用取值较为离散的字段作为分布键,以便数据能够均匀分布到各个DN中。
2. 在满足1.应使用取值较为离散的字段作为分布键,以便数据...的情况下,存在常量过滤的字段不建议成为分布键,否则会使得所有的查询任务都会分发到唯一固定的DN上。
3. 在满足1.应使用取值较为离散的字段作为分布键,以便数据...和2.在满足1.应使用取值较为离散的字段作为分布键...情况下,尽量选择查询中的关联条件作为分布键,这样可保证JOIN任务的相关数据分布在相同的DN上,减少DN间数据的流动代价。
二、使用全局索引避免数据重分布。
构建测试表及测试数据,执行相同SQL,对比全局索引对执行计划的影响。
1.登录gaussdb数据库。
gsql -d postgres -p 8000 -r
2.设置max_datanode_for_plan以便后续查看具体执行计划。
SET max_datanode_for_plan = 3;
3.创建测试表。
DROP TABLE IF EXISTS test_3_1;
CREATE TABLE test_3_1(c1 int, c2 int) distribute by hash(c2);
DROP TABLE IF EXISTS test_3_2;
CREATE TABLE test_3_2(c3 int, c4 int) distribute by hash(c3);
4.插入测试数据, 大约10s。
INSERT INTO test_3_1 select x, x + 500 from generate_series(1,100000) AS x;
INSERT INTO test_3_2 select x, x + 500 from generate_series(1,10000) AS x;
5.设置打印风格为pretty。
set explain_perf_mode=pretty;
6.执行测试SQL。
EXPLAIN ANALYZE SELECT c2, c4 FROM test_3_1, test_3_2 WHERE c2 = c4;
执行计划如下:
gaussdb=# EXPLAIN ANALYZE SELECT c2, c4 FROM test_3_1, test_3_2 WHERE c2 = c4;
id | operation | A-time | A-rows | E-rows | Peak Memory | A-width | E-width | E-costs
----+-----------------------------------------+-----------------+--------+--------+-------------+---------+---------+---------
1 | -> Streaming (type: GATHER) | 24.446 | 10000 | 30 | 82KB | | 8 | 31.21
2 | -> Hash Join (3,5) | [14.222,15.096] | 10000 | 30 | [8KB,8KB] | | 8 | 29.96
3 | -> Streaming(type: REDISTRIBUTE) | [0.654,0.794] | 10000 | 30 | [68KB,68KB] | | 4 | 15.49
4 | -> Seq Scan on test_3_2 | [0.606,0.617] | 10000 | 30 | [53KB,53KB] | | 4 | 14.14
5 | -> Hash | [11.351,11.919] | 100000 | 29 | [1MB,1MB] | [20,20] | 4 | 14.14
6 | -> Seq Scan on test_3_1 | [5.977,6.247] | 100000 | 30 | [52KB,52KB] | | 4 | 14.14
(6 rows)
Predicate Information (identified by plan id)
------------------------------------------------
2 --Hash Join (3,5)
Hash Cond: (test_3_2.c4 = test_3_1.c2)
(2 rows)
Memory Information (identified by plan id)
----------------------------------------------------------------------
Coordinator Query Peak Memory:
Query Peak Memory: 0MB
Datanode:
Max Query Peak Memory: 0MB
Min Query Peak Memory: 0MB
5 --Hash
Max Buckets: 32768 Max Batches: 1 Max Memory Usage: 1181kB
Min Buckets: 32768 Min Batches: 1 Min Memory Usage: 1163kB
(8 rows)
User Define Profiling
---------------------------------------------------------------------------
Segment Id: 2 Track name: Datanode build connection
(actual time=[0.235, 0.401], calls=[1, 1])
Plan Node id: 1 Track name: coordinator get datanode connection
(actual time=[3.378, 3.378], calls=[1, 1])
Plan Node id: 1 Track name: Coordinator check and update node definition
(actual time=[0.004, 0.004], calls=[1, 1])
Plan Node id: 1 Track name: Coordinator serialize plan
(actual time=[0.676, 0.676], calls=[1, 1])
Plan Node id: 1 Track name: Coordinator send query id with sync
(actual time=[0.193, 0.193], calls=[1, 1])
Plan Node id: 1 Track name: Coordinator send begin command
(actual time=[0.001, 0.001], calls=[1, 1])
Plan Node id: 1 Track name: Coordinator start transaction and send query
(actual time=[0.103, 0.103], calls=[1, 1])
Plan Node id: 2 Track name: Datanode start up stream thread
(actual time=[0.074, 0.116], calls=[1, 1])
(16 rows)
====== Query Summary =====
------------------------------------------------------------------------------------------
Datanode executor start time [dn_6007_6008_6009, dn_6001_6002_6003]: [0.585 ms,0.669 ms]
Datanode executor run time [dn_6001_6002_6003, dn_6004_6005_6006]: [15.247 ms,16.553 ms]
Datanode executor end time [dn_6004_6005_6006, dn_6007_6008_6009]: [0.114 ms,0.136 ms]
Coordinator executor start time: 0.128 ms
Coordinator executor run time: 24.989 ms
Coordinator executor end time: 0.021 ms
Planner runtime: 0.419 ms
Plan size: 3999 byte
Query Id: 72339069014842974
Total runtime: 25.154 ms
和此前一样, 由于关联列非分布列,计划中出现了Streaming(type: REDISTRIBUTE)算子,触发了数据的重分布。
7.为test_3_2添加全局索引。
这里在分布式场景下全局索引的声明中需要带有分布键,基于该分布键将数据分布于DN间。在需要获取数据时,相较于使用Stream算子进行数据的重分布,我们可以通过使用全局索引在当前DN上通过Index Only Scan直接获取数据。如果是建立普通本地索引的话索引中的数据并不会按照分布键在各DN间分布,而是依据创建表时的分布键分配的数据,对其中索引语句中指定的列添加索引。
DROP INDEX IF EXISTS idx_1;
CREATE GLOBAL INDEX idx_1 ON test_3_2(c4) DISTRIBUTE BY HASH(c4);
8.执行测试SQL。
EXPLAIN ANALYZE SELECT c2, c4 FROM test_3_1, test_3_2 WHERE c2 = c4;
执行计划如下:
gaussdb=# EXPLAIN ANALYZE SELECT c2, c4 FROM test_3_1, test_3_2 WHERE c2 = c4;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Data Node Scan (cost=0.00..0.00 rows=0 width=0) (actual time=6.303..13.290 rows=10000 loops=1)
Node/s: All datanodes
Total runtime: 13.821 ms
(3 rows)
9.查看具体执行计划。
EXPLAIN VERBOSE SELECT c2, c4 FROM test_3_1, test_3_2 WHERE c2 = c4;
具体执行计划如下:
可以看到DN间的数据交互被消除, c4列的数据通过全局索引Index Only San获得。
gaussdb=# EXPLAIN VERBOSE SELECT c2, c4 FROM test_3_1, test_3_2 WHERE c2 = c4;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Data Node Scan (cost=0.00..0.00 rows=0 width=0)
Output: test_3_1.c2, idx_1.c4
Node/s: All datanodes
Remote query: SELECT test_3_1.c2, idx_1.c4 FROM public.test_3_1, ONLY public.idx_1 WHERE test_3_1.c2 = idx_1.c4
Remote SQL: SELECT test_3_1.c2, idx_1.c4 FROM public.test_3_1, ONLY public.idx_1 WHERE test_3_1.c2 = idx_1.c4
Datanode Name: dn_6001_6002_6003
Hash Join (cost=2.48..592.34 rows=1 distinct=[200, 200] width=8)
Output: test_3_1.c2, idx_1.c4
Hash Cond: (test_3_1.c2 = idx_1.c4)
-> Seq Scan on public.test_3_1 (cost=0.00..463.89 rows=33589 width=4)
Output: test_3_1.c1, test_3_1.c2
-> Hash (cost=2.46..2.46 rows=1 width=4)
Output: idx_1.c4
-> Index Only Scan using idx_1 on public.idx_1 (cost=0.00..2.46 rows=1 width=4)
Output: idx_1.c4
Datanode Name: dn_6004_6005_6006
Hash Join (cost=2.48..588.16 rows=1 distinct=[200, 200] width=8)
Output: test_3_1.c2, idx_1.c4
Hash Cond: (test_3_1.c2 = idx_1.c4)
-> Seq Scan on public.test_3_1 (cost=0.00..460.58 rows=33358 width=4)
Output: test_3_1.c1, test_3_1.c2
-> Hash (cost=2.46..2.46 rows=1 width=4)
Output: idx_1.c4
-> Index Only Scan using idx_1 on public.idx_1 (cost=0.00..2.46 rows=1 width=4)
Output: idx_1.c4
Datanode Name: dn_6007_6008_6009
Hash Join (cost=2.48..583.97 rows=1 distinct=[200, 200] width=8)
Output: test_3_1.c2, idx_1.c4
Hash Cond: (test_3_1.c2 = idx_1.c4)
-> Seq Scan on public.test_3_1 (cost=0.00..457.53 rows=33053 width=4)
Output: test_3_1.c1, test_3_1.c2
-> Hash (cost=2.46..2.46 rows=1 width=4)
Output: idx_1.c4
-> Index Only Scan using idx_1 on public.idx_1 (cost=0.00..2.46 rows=1 width=4)
Output: idx_1.c4
(39 rows)
在哪种场景下可以考虑使用全局索引?
分布列和关联列或聚集列不一致的时候。
三、优化建表方式
构建测试表及测试数据,执行相同SQL,对比分片表和复制表的表结构不同对执行计划的影响。
1.登录gaussdb数据库。
gsql -d postgres -p 8000 -r
2.设置 max_datanode_for_plan 以便后续查看具体执行计划。
SET max_datanode_for_plan = 3;
3.创建测试表。
DROP TABLE IF EXISTS test_4_1;
CREATE TABLE test_4_1(
id int,
num int
) DISTRIBUTE BY hash(id);
DROP TABLE IF EXISTS test_4_2;
CREATE TABLE test_4_2(
id int,
num int
) DISTRIBUTE BY hash(num);
4.插入测试数据,插入时间预计30s。
INSERT INTO test_4_1 select generate_series(1,100000), generate_series(1,100000);
INSERT INTO test_4_2 select generate_series(1,10000), generate_series(1,10000);
5.设置打印风格为pretty。
set explain_perf_mode=pretty;
6.分片表关联,执行测试。
EXPLAIN (costs off, ANALYZE on)
SELECT *
FROM test_4_1 t1,test_4_2 t2
WHERE t1.id=t2.id;
SQL执行计划及执行时间如下:
gaussdb=# EXPLAIN (costs off, ANALYZE on)
gaussdb-# SELECT *
gaussdb-# FROM test_4_1 t1,test_4_2 t2
gaussdb-# WHERE t1.id=t2.id;
id | operation | A-time | A-rows | Peak Memory | A-width
----+-----------------------------------------+-----------------+--------+-------------+---------
1 | -> Streaming (type: GATHER) | 22.581 | 10000 | 85KB |
2 | -> Hash Join (3,5) | [13.796,14.661] | 10000 | [10KB,10KB] |
3 | -> Streaming(type: REDISTRIBUTE) | [0.613,0.681] | 10000 | [52KB,52KB] |
4 | -> Seq Scan on test_4_2 t2 | [0.508,0.521] | 10000 | [53KB,53KB] |
5 | -> Hash | [10.874,11.266] | 100000 | [1MB,1MB] | [24,24]
6 | -> Seq Scan on test_4_1 t1 | [4.880,5.221] | 100000 | [53KB,53KB] |
(6 rows)
Predicate Information (identified by plan id)
-----------------------------------------------
2 --Hash Join (3,5)
Hash Cond: (t2.id = t1.id)
(2 rows)
Memory Information (identified by plan id)
----------------------------------------------------------------------
Coordinator Query Peak Memory:
Query Peak Memory: 0MB
Datanode:
Max Query Peak Memory: 0MB
Min Query Peak Memory: 0MB
5 --Hash
Max Buckets: 32768 Max Batches: 1 Max Memory Usage: 1312kB
Min Buckets: 32768 Min Batches: 1 Min Memory Usage: 1292kB
(8 rows)
User Define Profiling
---------------------------------------------------------------------------
Segment Id: 2 Track name: Datanode build connection
(actual time=[0.295, 0.502], calls=[1, 1])
Plan Node id: 1 Track name: coordinator get datanode connection
(actual time=[0.682, 0.682], calls=[1, 1])
Plan Node id: 1 Track name: Coordinator check and update node definition
(actual time=[0.003, 0.003], calls=[1, 1])
Plan Node id: 1 Track name: Coordinator serialize plan
(actual time=[0.784, 0.784], calls=[1, 1])
Plan Node id: 1 Track name: Coordinator send query id with sync
(actual time=[0.239, 0.239], calls=[1, 1])
Plan Node id: 1 Track name: Coordinator send begin command
(actual time=[0.001, 0.001], calls=[1, 1])
Plan Node id: 1 Track name: Coordinator start transaction and send query
(actual time=[0.084, 0.084], calls=[1, 1])
Plan Node id: 2 Track name: Datanode start up stream thread
(actual time=[0.023, 0.034], calls=[1, 1])
(16 rows)
====== Query Summary =====
------------------------------------------------------------------------------------------
Datanode executor start time [dn_6004_6005_6006, dn_6001_6002_6003]: [0.482 ms,0.611 ms]
Datanode executor run time [dn_6001_6002_6003, dn_6007_6008_6009]: [15.261 ms,17.065 ms]
Datanode executor end time [dn_6004_6005_6006, dn_6007_6008_6009]: [0.096 ms,0.142 ms]
Coordinator executor start time: 0.182 ms
Coordinator executor run time: 23.042 ms
Coordinator executor end time: 0.022 ms
Planner runtime: 0.537 ms
Plan size: 4192 byte
Query Id: 72339069014857668
Total runtime: 23.273 ms
(10 rows)
可以看出对于关联键非分布列的分片表,会生成带REDISTRIBUTE的算子,之前的实验我们可以采用修改分布键的方式消除该算子,同样我们也可以采用复制表的方式消除该算子,但需要注意,复制表仅适用于数据量少且改动不频繁的表。
7.使用复制表替换哈希分布表。
DROP TABLE IF EXISTS test_4_2;
CREATE TABLE test_4_2(
id int,
num int
) DISTRIBUTE BY replication;
8.为复制表插入数据并且更新统计信息,插入时间预计1s。
INSERT INTO test_4_2 select generate_series(1,10000), generate_series(1,10000);
9.执行测试SQL。
EXPLAIN (costs off, ANALYZE on)
SELECT *
FROM test_4_1 t1,test_4_2 t2
WHERE t1.id=t2.id;
SQL2执行计划及执行时间如下:
gaussdb=# EXPLAIN (costs off, ANALYZE on)
gaussdb-# SELECT *
gaussdb-# FROM test_4_1 t1,test_4_2 t2
gaussdb-# WHERE t1.id=t2.id;
QUERY PLAN
----------------------------------------------------------------
Data Node Scan (actual time=13.400..17.736 rows=10000 loops=1)
Node/s: All datanodes
Total runtime: 18.279 ms
(3 rows)
10.查看DN具体执行计划。
EXPLAIN VERBOSE
SELECT *
FROM test_4_1 t1,test_4_2 t2
WHERE t1.id=t2.id;
DN执行计划如下:
gaussdb=# EXPLAIN VERBOSE
gaussdb-# SELECT *
gaussdb-# FROM test_4_1 t1,test_4_2 t2
gaussdb-# WHERE t1.id=t2.id;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Data Node Scan (cost=0.00..0.00 rows=0 width=0)
Output: t1.id, t1.num, t2.id, t2.num
Node/s: All datanodes
Remote query: SELECT t1.id, t1.num, t2.id, t2.num FROM public.test_4_1 t1, public.test_4_2 t2 WHERE t1.id = t2.id
Remote SQL: SELECT t1.id, t1.num, t2.id, t2.num FROM public.test_4_1 t1, public.test_4_2 t2 WHERE t1.id = t2.id
Datanode Name: dn_6001_6002_6003
Hash Join (cost=883.57..1121.67 rows=8383 distinct=[1, 33581] width=16)
Output: t1.id, t1.num, t2.id, t2.num
Hash Cond: (t2.id = t1.id)
-> Seq Scan on public.test_4_2 t2 (cost=0.00..122.83 rows=8383 width=8)
Output: t2.id, t2.num
-> Hash (cost=463.81..463.81 rows=33581 width=8)
Output: t1.id, t1.num
-> Seq Scan on public.test_4_1 t1 (cost=0.00..463.81 rows=33581 width=8)
Output: t1.id, t1.num
Datanode Name: dn_6004_6005_6006
Hash Join (cost=877.53..1115.63 rows=8383 distinct=[1, 33357] width=16)
Output: t1.id, t1.num, t2.id, t2.num
Hash Cond: (t2.id = t1.id)
-> Seq Scan on public.test_4_2 t2 (cost=0.00..122.83 rows=8383 width=8)
Output: t2.id, t2.num
-> Hash (cost=460.57..460.57 rows=33357 width=8)
Output: t1.id, t1.num
-> Seq Scan on public.test_4_1 t1 (cost=0.00..460.57 rows=33357 width=8)
Output: t1.id, t1.num
Datanode Name: dn_6007_6008_6009
Hash Join (cost=869.89..1107.99 rows=8383 distinct=[1, 33062] width=16)
Output: t1.id, t1.num, t2.id, t2.num
Hash Cond: (t2.id = t1.id)
-> Seq Scan on public.test_4_2 t2 (cost=0.00..122.83 rows=8383 width=8)
Output: t2.id, t2.num
-> Hash (cost=456.62..456.62 rows=33062 width=8)
Output: t1.id, t1.num
-> Seq Scan on public.test_4_1 t1 (cost=0.00..456.62 rows=33062 width=8)
Output: t1.id, t1.num
(39 rows)
复制表适合什么场景使用?
数据量较小的表或者更新频率非常低的表。
四、算子下推
构建测试表及测试数据,执行原始SQL,对函数状态进行改写,对比改写前后生成执行计划的不同。
1.登录gaussdb数据库。
gsql -d postgres -p 8000 -r
2.创建测试表。
DROP TABLE IF EXISTS test_5;
CREATE TABLE test_5(c1 int,c2 int,c3 int) distribute by hash(c1);
3.插入数据,大约3s。
insert into test_5 select floor(random() * 5000), floor(random() * 5000), floor(random() * 5000) from generate_series(1,10000);
ANALYZE test_5;
4.设置打印风格为pretty。
set explain_perf_mode=pretty;
5.创建函数。
create or replace function
func_t (num1 int, num2 int) return int
as
begin
return num1 + num2;
end;
/
6.执行测试SQL1。
该查询将利用func_t将test_5中c2和c3进行求和后排序。
EXPLAIN (VERBOSE ON, ANALYZE) select func_t(c2,c3) as ans from test_5 order by ans;
执行计划如下所示:
可以看到执行计划中,显示REMOTE_TABLE_QUERY,func_t的执行,在cn上执行。
gaussdb=# EXPLAIN (VERBOSE ON, ANALYZE) select func_t(c2,c3) as ans from test_5 order by ans;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=3164.39..3189.39 rows=10000 width=8) (actual time=46.263..47.686 rows=10000 loops=1)
Output: (func_t(test_5.c2, test_5.c3))
Sort Key: (func_t(test_5.c2, test_5.c3))
Sort Method: quicksort Memory: 1010kB
-> Data Node Scan on test_5 "_REMOTE_TABLE_QUERY_" (cost=0.00..2500.00 rows=10000 width=8) (actual time=2.160..41.364 rows=10000 loops=1)
Output: func_t(test_5.c2, test_5.c3)
Node/s: All datanodes
Remote query: SELECT c2, c3 FROM ONLY public.test_5 WHERE true
Total runtime: 48.171 ms
(9 rows)
7.分析func_t,预期是将2个入参求和,即num1 + num2,其值稳定,创建的函数默认是不可下推的,可以设置成immutable,这样func_t就可以下推。
alter function func_t(int, int) immutable;
8.执行测试SQL2。
EXPLAIN (VERBOSE ON, ANALYZE) select func_t(c2,c3) as ans from test_5 order by ans;
执行计划如下所示,func_t已经下推,算子在dn上执行:
gaussdb=# EXPLAIN (VERBOSE ON, ANALYZE) select func_t(c2,c3) as ans from test_5 order by ans;
id | operation | A-time | A-rows | E-rows | E-distinct | Peak Memory | A-width | E-width | E-costs
----+-------------------------------------+-----------------+--------+--------+------------+---------------+---------------+---------+---------
1 | -> Streaming (type: GATHER) | 22.719 | 10000 | 9999 | | 171KB | | 8 | 1499.56
2 | -> Sort | [12.136,13.509] | 10000 | 9999 | | [297KB,318KB] | [63260,70080] | 8 | 1082.94
3 | -> Seq Scan on public.test_5 | [10.392,11.411] | 10000 | 10000 | | [55KB,55KB] | | 8 | 879.58
(3 rows)
Memory Information (identified by plan id)
-------------------------------------------------------
Coordinator Query Peak Memory:
Query Peak Memory: 0MB
Datanode:
Max Query Peak Memory: 0MB
Min Query Peak Memory: 0MB
2 --Sort
Sort Method: quicksort Memory: 294kB ~ 316kB
Sort Method: quicksort Disk: 1024kB ~ 0kB
(8 rows)
Targetlist Information (identified by plan id)
--------------------------------------------------------
1 --Streaming (type: GATHER)
Output: (func_t(c2, c3))
Merge Sort Key: (func_t(test_5.c2, test_5.c3))
Node/s: All datanodes
2 --Sort
Output: (func_t(c2, c3))
Sort Key: (func_t(test_5.c2, test_5.c3))
3 --Seq Scan on public.test_5
Output: func_t(c2, c3)
Distribute Key: c1
(10 rows)
User Define Profiling
---------------------------------------------------------------------------
Plan Node id: 1 Track name: coordinator get datanode connection
(actual time=[3.338, 3.338], calls=[1, 1])
Plan Node id: 1 Track name: Coordinator serialize plan
(actual time=[0.606, 0.606], calls=[1, 1])
Plan Node id: 1 Track name: Coordinator send begin command
(actual time=[0.002, 0.002], calls=[1, 1])
Plan Node id: 1 Track name: Coordinator start transaction and send query
(actual time=[0.102, 0.102], calls=[1, 1])
(8 rows)
====== Query Summary =====
------------------------------------------------------------------------------------------
Datanode executor start time [dn_6001_6002_6003, dn_6004_6005_6006]: [0.047 ms,0.165 ms]
Datanode executor run time [dn_6004_6005_6006, dn_6007_6008_6009]: [12.913 ms,14.493 ms]
Datanode executor end time [dn_6001_6002_6003, dn_6001_6002_6003]: [0.009 ms,0.009 ms]
Coordinator executor start time: 0.048 ms
Coordinator executor run time: 23.215 ms
Coordinator executor end time: 0.039 ms
Planner runtime: 0.166 ms
Plan size: 3189 byte
Query Id: 72339069014886763
Total runtime: 23.314 ms
(10 rows)
要寻找分布式场景下的性能瓶颈可以往什么方向探索?
分布式架构带来的性能优势主要来自于DN间的计算并行,在看到执行计划时,可以观察算子是否下推到DN执行。
五、数据倾斜
构建测试表及测试数据,测试数据均匀和倾斜场景下SQL性能,及耗时影响。
1.登录gaussdb数据库。
gsql -d postgres -p 8000 -r
2.创建测试表。
DROP TABLE IF EXISTS test_6;
CREATE TABLE test_6(
a int,
b int
) DISTRIBUTE BY hash(a);
3.插入测试数据,大约60s。
INSERT INTO test_6 select 1, generate_series(1,150000);
4.设置打印风格为pretty。
set explain_perf_mode=pretty;
5.执行测试SQL。
EXPLAIN performance
SELECT count(b)
FROM test_6;
通过执行计划分析可以发现SQL执行耗时在DN的扫描和聚合操作,通过DN的统计信息部分可以看到dn_6004_6005_6006分片扫描的行数和聚合操作的耗时高,怀疑表的数据可能存在数据倾斜。
gaussdb=# EXPLAIN performance
gaussdb-# SELECT count(b)
gaussdb-# FROM test_6;
WARNING: Statistics in some tables or columns(public.test_6.a) are not collected.
HINT: Do analyze for them in order to generate optimized plan.
id | operation | A-time | A-rows | E-rows | E-distinct | Peak Memory | A-width | E-width | E-costs
----+----------------------------------------+----------------+--------+--------+------------+--------------+---------+---------+---------
1 | -> Aggregate | 35.273 | 1 | 1 | | 11KB | | 12 | 14.31
2 | -> Streaming (type: GATHER) | 35.209 | 3 | 3 | | 81KB | | 12 | 14.31
3 | -> Aggregate | [0.006,32.448] | 3 | 3 | | [11KB, 11KB] | | 12 | 14.19
4 | -> Seq Scan on public.test_6 | [0.002,19.859] | 150000 | 30 | | [11KB, 53KB] | | 4 | 14.14
(4 rows)
Memory Information (identified by plan id)
-------------------------------------------------------------------------------
Coordinator Query Peak Memory:
Query Peak Memory: 0MB
DataNode Query Peak Memory
dn_6001_6002_6003 Query Peak Memory: 0MB
dn_6004_6005_6006 Query Peak Memory: 0MB
dn_6007_6008_6009 Query Peak Memory: 0MB
1 --Aggregate
Peak Memory: 11KB, Estimate Memory: 64MB
2 --Streaming (type: GATHER)
Peak Memory: 81KB, Estimate Memory: 64MB
3 --Aggregate
dn_6001_6002_6003 Peak Memory: 11KB, Estimate Memory: 64MB
dn_6004_6005_6006 Peak Memory: 11KB, Estimate Memory: 64MB
dn_6007_6008_6009 Peak Memory: 11KB, Estimate Memory: 64MB
dn_6001_6002_6003 Stream Send time: 0.000; Data Serialize time: 0.003
dn_6004_6005_6006 Stream Send time: 0.000; Data Serialize time: 0.012
dn_6007_6008_6009 Stream Send time: 0.000; Data Serialize time: 0.003
4 --Seq Scan on public.test_6
dn_6001_6002_6003 Peak Memory: 11KB, Estimate Memory: 64MB
dn_6004_6005_6006 Peak Memory: 53KB, Estimate Memory: 64MB
dn_6007_6008_6009 Peak Memory: 11KB, Estimate Memory: 64MB
(21 rows)
Targetlist Information (identified by plan id)
------------------------------------------------
1 --Aggregate
Output: count((count(b)))
2 --Streaming (type: GATHER)
Output: (count(b))
Node/s: All datanodes
3 --Aggregate
Output: count(b)
4 --Seq Scan on public.test_6
Output: a, b
Distribute Key: a
(10 rows)
Datanode Information (identified by plan id)
-----------------------------------------------------------------------------------------------
1 --Aggregate
(actual time=35.273..35.273 rows=1 loops=1)
(Buffers: 0)
(CPU: ex c/r=44760, ex row=3, ex cyc=134282, inc cyc=74496032)
2 --Streaming (type: GATHER)
(actual time=2.286..35.209 rows=3 loops=1)
(Buffers: 0)
(CPU: ex c/r=24787250, ex row=3, ex cyc=74361750, inc cyc=74361750)
3 --Aggregate
dn_6001_6002_6003 (actual time=0.006..0.006 rows=1 loops=1)
dn_6004_6005_6006 (actual time=32.448..32.448 rows=1 loops=1)
dn_6007_6008_6009 (actual time=0.006..0.006 rows=1 loops=1)
dn_6001_6002_6003 (Buffers: 0)
dn_6004_6005_6006 (Buffers: shared hit=573)
dn_6007_6008_6009 (Buffers: 0)
dn_6001_6002_6003 (CPU: ex c/r=0, ex row=0, ex cyc=8137, inc cyc=11366)
dn_6004_6005_6006 (CPU: ex c/r=239, ex row=150000, ex cyc=35886864, inc cyc=68529573)
dn_6007_6008_6009 (CPU: ex c/r=0, ex row=0, ex cyc=8792, inc cyc=12748)
4 --Seq Scan on public.test_6
dn_6001_6002_6003 (actual time=0.002..0.002 rows=0 loops=1)
dn_6004_6005_6006 (actual time=0.026..19.859 rows=150000 loops=1)
dn_6007_6008_6009 (actual time=0.002..0.002 rows=0 loops=1)
dn_6001_6002_6003 (Buffers: 0)
dn_6004_6005_6006 (Buffers: shared hit=573)
dn_6007_6008_6009 (Buffers: 0)
dn_6001_6002_6003 (CPU: ex c/r=0, ex row=0, ex cyc=3229, inc cyc=3229)
dn_6004_6005_6006 (CPU: ex c/r=217, ex row=150000, ex cyc=32642709, inc cyc=32642709)
dn_6007_6008_6009 (CPU: ex c/r=0, ex row=0, ex cyc=3956, inc cyc=3956)
(28 rows)
User Define Profiling
---------------------------------------------------------------------------
Plan Node id: 2 Track name: coordinator get datanode connection
cn_5001: (time=0.747 total_calls=1 loops=1)
Plan Node id: 2 Track name: Coordinator serialize plan
cn_5001: (time=0.655 total_calls=1 loops=1)
Plan Node id: 2 Track name: Coordinator send begin command
cn_5001: (time=0.002 total_calls=1 loops=1)
Plan Node id: 2 Track name: Coordinator start transaction and send query
cn_5001: (time=0.087 total_calls=1 loops=1)
(8 rows)
====== Query Summary =====
------------------------------------------------------------------------------------------
Datanode executor start time [dn_6001_6002_6003, dn_6004_6005_6006]: [0.183 ms,0.278 ms]
Datanode executor run time [dn_6001_6002_6003, dn_6004_6005_6006]: [0.011 ms,32.467 ms]
Datanode executor end time [dn_6001_6002_6003, dn_6004_6005_6006]: [0.006 ms,0.019 ms]
Remote query poll time: 0.000 ms, Deserialze time: 0.000 ms
Coordinator executor start time: 0.076 ms
Coordinator executor run time: 35.286 ms
Coordinator executor end time: 0.023 ms
Planner runtime: 0.300 ms
Plan size: 3149 byte
Query Id: 72339069014898956
Total runtime: 35.420 ms
(11 rows)
6.通过如下方式查看表数据的倾斜情况。
gaussdb=# SELECT table_skewness('test_6');
table_skewness
------------------------------------------
("dn_6004_6005_6006 ",150000,100.000%)
("dn_6001_6002_6003 ",0,0.000%)
("dn_6007_6008_6009 ",0,0.000%)
(3 rows)
当前的分布列中存在数据倾斜,所有数据集中在一个DN, 为了避免数据倾斜,重新选择表的分布列。
7.重新建表,选择数据分布均匀的列作为分布列。
DROP TABLE IF EXISTS test_6;
CREATE TABLE test_6(
a int,
b int
) DISTRIBUTE BY hash(b);
8.插入同样的测试数据,大约60s。
INSERT INTO test_6 select 1, generate_series(1,150000);
9.查看表数据倾斜情况。
gaussdb=# SELECT table_skewness('test_6');
table_skewness
----------------------------------------
("dn_6001_6002_6003 ",50362,33.575%)
("dn_6004_6005_6006 ",49959,33.306%)
("dn_6007_6008_6009 ",49679,33.119%)
(3 rows)
10.执行测试SQL。
EXPLAIN performance
SELECT count(b)
FROM test_6;
调整表的分布列之后,SQL执行时间由66ms提升到28ms,执行计划:
gaussdb=# EXPLAIN performance
gaussdb-# SELECT count(b)
gaussdb-# FROM test_6;
WARNING: Statistics in some tables or columns(public.test_6.b) are not collected.
HINT: Do analyze for them in order to generate optimized plan.
id | operation | A-time | A-rows | E-rows | E-distinct | Peak Memory | A-width | E-width | E-costs
----+----------------------------------------+-----------------+--------+--------+------------+--------------+---------+---------+---------
1 | -> Aggregate | 16.325 | 1 | 1 | | 11KB | | 12 | 14.31
2 | -> Streaming (type: GATHER) | 16.285 | 3 | 3 | | 81KB | | 12 | 14.31
3 | -> Aggregate | [11.061,11.218] | 3 | 3 | | [11KB, 11KB] | | 12 | 14.19
4 | -> Seq Scan on public.test_6 | [6.703,7.271] | 150000 | 30 | | [53KB, 53KB] | | 4 | 14.14
(4 rows)
Memory Information (identified by plan id)
-------------------------------------------------------------------------------
Coordinator Query Peak Memory:
Query Peak Memory: 0MB
DataNode Query Peak Memory
dn_6001_6002_6003 Query Peak Memory: 0MB
dn_6004_6005_6006 Query Peak Memory: 0MB
dn_6007_6008_6009 Query Peak Memory: 0MB
1 --Aggregate
Peak Memory: 11KB, Estimate Memory: 64MB
2 --Streaming (type: GATHER)
Peak Memory: 81KB, Estimate Memory: 64MB
3 --Aggregate
dn_6001_6002_6003 Peak Memory: 11KB, Estimate Memory: 64MB
dn_6004_6005_6006 Peak Memory: 11KB, Estimate Memory: 64MB
dn_6007_6008_6009 Peak Memory: 11KB, Estimate Memory: 64MB
dn_6001_6002_6003 Stream Send time: 0.000; Data Serialize time: 0.011
dn_6004_6005_6006 Stream Send time: 0.000; Data Serialize time: 0.011
dn_6007_6008_6009 Stream Send time: 0.000; Data Serialize time: 0.011
4 --Seq Scan on public.test_6
dn_6001_6002_6003 Peak Memory: 53KB, Estimate Memory: 64MB
dn_6004_6005_6006 Peak Memory: 53KB, Estimate Memory: 64MB
dn_6007_6008_6009 Peak Memory: 53KB, Estimate Memory: 64MB
(21 rows)
Targetlist Information (identified by plan id)
------------------------------------------------
1 --Aggregate
Output: count((count(b)))
2 --Streaming (type: GATHER)
Output: (count(b))
Node/s: All datanodes
3 --Aggregate
Output: count(b)
4 --Seq Scan on public.test_6
Output: a, b
Distribute Key: b
(10 rows)
Datanode Information (identified by plan id)
----------------------------------------------------------------------------------------------
1 --Aggregate
(actual time=16.325..16.325 rows=1 loops=1)
(Buffers: 0)
(CPU: ex c/r=28218, ex row=3, ex cyc=84655, inc cyc=34477556)
2 --Streaming (type: GATHER)
(actual time=15.776..16.285 rows=3 loops=1)
(Buffers: 0)
(CPU: ex c/r=11464300, ex row=3, ex cyc=34392901, inc cyc=34392901)
3 --Aggregate
dn_6001_6002_6003 (actual time=11.062..11.063 rows=1 loops=1)
dn_6004_6005_6006 (actual time=11.218..11.218 rows=1 loops=1)
dn_6007_6008_6009 (actual time=11.060..11.061 rows=1 loops=1)
dn_6001_6002_6003 (Buffers: shared hit=192)
dn_6004_6005_6006 (Buffers: shared hit=190)
dn_6007_6008_6009 (Buffers: shared hit=189)
dn_6001_6002_6003 (CPU: ex c/r=243, ex row=50362, ex cyc=12244336, inc cyc=23363045)
dn_6004_6005_6006 (CPU: ex c/r=245, ex row=49959, ex cyc=12258915, inc cyc=23692690)
dn_6007_6008_6009 (CPU: ex c/r=244, ex row=49679, ex cyc=12157254, inc cyc=23358595)
4 --Seq Scan on public.test_6
dn_6001_6002_6003 (actual time=0.026..6.703 rows=50362 loops=1)
dn_6004_6005_6006 (actual time=0.028..7.271 rows=49959 loops=1)
dn_6007_6008_6009 (actual time=0.027..7.070 rows=49679 loops=1)
dn_6001_6002_6003 (Buffers: shared hit=192)
dn_6004_6005_6006 (Buffers: shared hit=190)
dn_6007_6008_6009 (Buffers: shared hit=189)
dn_6001_6002_6003 (CPU: ex c/r=220, ex row=50362, ex cyc=11118709, inc cyc=11118709)
dn_6004_6005_6006 (CPU: ex c/r=228, ex row=49959, ex cyc=11433775, inc cyc=11433775)
dn_6007_6008_6009 (CPU: ex c/r=225, ex row=49679, ex cyc=11201341, inc cyc=11201341)
(28 rows)
User Define Profiling
---------------------------------------------------------------------------
Plan Node id: 2 Track name: coordinator get datanode connection
cn_5001: (time=3.384 total_calls=1 loops=1)
Plan Node id: 2 Track name: Coordinator serialize plan
cn_5001: (time=0.650 total_calls=1 loops=1)
Plan Node id: 2 Track name: Coordinator send begin command
cn_5001: (time=0.002 total_calls=1 loops=1)
Plan Node id: 2 Track name: Coordinator start transaction and send query
cn_5001: (time=0.101 total_calls=1 loops=1)
(8 rows)
====== Query Summary =====
------------------------------------------------------------------------------------------
Datanode executor start time [dn_6001_6002_6003, dn_6007_6008_6009]: [0.083 ms,0.090 ms]
Datanode executor run time [dn_6007_6008_6009, dn_6004_6005_6006]: [11.078 ms,11.235 ms]
Datanode executor end time [dn_6007_6008_6009, dn_6004_6005_6006]: [0.014 ms,0.025 ms]
Remote query poll time: 0.000 ms, Deserialze time: 0.000 ms
Coordinator executor start time: 0.072 ms
Coordinator executor run time: 16.336 ms
Coordinator executor end time: 0.018 ms
Planner runtime: 0.195 ms
Plan size: 3142 byte
Query Id: 72339069014905495
Total runtime: 16.464 ms
(11 rows)
在选择分布列时,应避免选取哪些数据特征的列,以规避数据倾斜的问题?
优先选择离散程度更好的列。

1047

被折叠的 条评论
为什么被折叠?



