参考openGauss官方文档介绍:
查看 Wdr报告 | openGauss文档 | openGauss社区
在Oralce数据库中,遇到性能问题,我们通常会查看有无对应时间段的快照,生成awr报告并进一步分析(AWR是Automatic Workload Repository的简称,中文叫着自动工作量资料档案库。是Oracle数据库用于收集、管理和维护数据库整个运行期间和性能相关统计数据的存储仓库,是Oracle数据库性能调整和优化的基础。awr收集到的数据会被定期保存到磁盘,可以从数据字典查询以及生成性能报告。)。AWR报告整个数据库在运行期间的现状或者说真实状态只有在被完整记录下来,才是可查,可知,可比较,可推测或者说为未来性能优化调整提供支撑建议的基础。
在opengauss数据库中,也有着这样的“awr”,它叫做——wdr。WDR是(Workload Diagnosis Report)负载诊断报告,是openGauss的工作负载诊断报告,常用于判断openGauss长期性能问题。
前提:
生成WDR报告的前提条件是,打开参数enable_wdr_snapshot。确认当前已按照的openGauss数据库是否打开WDR报告的参数,需要通过下图登录数据库进行查询。enable_wdr_snapshot的值为on表示打开,off表示关闭
打开WDR报告功能:
#gs_guc reload -N all -I all -c "enable_wdr_snapshot = on"
#select name,setting from pg_settings where name like '%wdr%';
以下介绍WDR报告的参数:
enable_wdr_snapshot
参数说明: 是否开启数据库监控快照功能。
该参数属于 SIGHUP 类型参数,请参考表GUC 参数分类中对应设置方法进行设置。
取值范围: 布尔型
on: 打开数据库监控快照功能。
off: 关闭数据库监控快照功能。
默认值: off
wdr_snapshot_retention_days
参数说明: 系统中数据库监控快照数据的保留天数,超过设置的值之后,系统每隔 wdr_snapshot_interval 时间间隔,清理 snapshot_id 最小的快照数据。
该参数属于 SIGHUP 类型参数,请参考表GUC 参数分类中对应设置方法进行设置。
取值范围:整型,1 ~ 8。
默认值: 8
wdr_snapshot_query_timeout
参数说明: 系统执行数据库监控快照操作时,设置快照操作相关的 sql 语句的执行超时时间。如果语句超过设置的时间没有执行完并返回结果,则本次快照操作失败。
该参数属于 SIGHUP 类型参数,请参考表GUC 参数分类中对应设置方法进行设置。
取值范围:整型,100 ~ INT_MAX(秒)。
默认值: 100s
wdr_snapshot_interval
参数说明: 后台线程 Snapshot 自动对数据库监控数据执行快照操作的时间间隔。
该参数属于 SIGHUP 类型参数,请参考表GUC 参数分类中对应设置方法进行设置。
取值范围:整型,10 ~ 60(分钟)。
默认值: 1h
快照表:
snapshot.snapshot:记录当前系统中存储的 WDR 快照信息
snapshot_id:WDR快照序列号
start_ts:WDR快照的开始时间
end_ts:WDR快照的结束时间
snapshot.tables_snap_timestamp:记录所有表的 WDR 快照信息
snapshot_id:WDR快照序列号
db_name:WDR snapshot对应的database
tablename:WDR snasphot对应的table
start_ts:WDR快照的开始时间
end_ts:WDR快照的结束时间
WDR 数据表:
说明:WDR 的数据表保存在 snapshot 这个 schema 下以 snap_开头的表,其数据来源于 dbe_perf 这个 schema 内的视图
postgres=# select relname from pg_class where relname like '%snap_%';
----------------------------------------------------------------------------------------------------------------
snapshot.tables_snap_timestamp -- 记录所有存储的WDR快照中数据库、表对象、数据采集的开始、结束时间
snapshot.snapshot -- 记录当前系统中存储的WDR快照数据的索引信息、开始、结束时间
snapshot.snapshot_pkey -- snapshot.snapshot表的primary key
snapshot.snap_seq -- 序列
snapshot.snap_global_os_runtime -- 操作系统运行状态信息
snapshot.snap_global_os_threads -- 线程状态信息
snapshot.snap_global_instance_time -- 各种时间消耗信息(时间类型见instance_time视图)
snapshot.snap_summary_workload_sql_count -- 各数据库主节点的workload上的SQL数量分布
snapshot.snap_summary_workload_sql_elapse_time -- 数据库主节点上workload(业务)负载的SQL耗时信息
snapshot.snap_global_workload_transaction -- 各节点上的workload的负载信息
snapshot.snap_summary_workload_transaction -- 汇总的负载事务信息
snapshot.snap_global_thread_wait_status -- 工作线程以及辅助线程的阻塞等待情况
snapshot.snap_global_memory_node_detail -- 节点的内存使用情况
snapshot.snap_global_shared_memory_detail -- 共享内存上下文的使用情况
snapshot.snap_global_stat_db_cu -- 数据库的CU命中情况,可以通过gs_stat_reset()进行清零
snapshot.snap_global_stat_database -- 数据库的统计信息
snapshot.snap_summary_stat_database -- 汇总的数据库统计信息
snapshot.snap_global_stat_database_conflicts -- 数据库冲突状态的统计信息
snapshot.snap_summary_stat_database_conflicts -- 汇总的数据库冲突状态的统计信息
snapshot.snap_global_stat_bad_block -- 表、索引等文件的读取失败信息
snapshot.snap_summary_stat_bad_block -- 汇总的表、索引等文件的读取失败信息
snapshot.snap_global_file_redo_iostat -- Redo(WAL)相关统计信息
snapshot.snap_summary_file_redo_iostat -- 汇总的Redo(WAL)相关统计信息
snapshot.snap_global_rel_iostat -- 数据对象IO统计信息
snapshot.snap_summary_rel_iostat -- 汇总的数据对象IO统计信息
snapshot.snap_global_file_iostat -- 数据文件IO统计信息
snapshot.snap_summary_file_iostat -- 汇总的数据文件IO统计信息
snapshot.snap_global_replication_slots -- 复制节点的信息
snapshot.snap_global_bgwriter_stat -- 后端写进程活动的统计信息
snapshot.snap_global_replication_stat -- 日志同步状态信息
snapshot.snap_global_transactions_running_xacts -- 各节点运行事务的信息
snapshot.snap_summary_transactions_running_xacts -- 汇总各节点运行事务的信息
snapshot.snap_global_transactions_prepared_xacts -- 当前准备好进行两阶段提交的事务的信息
snapshot.snap_summary_transactions_prepared_xacts -- 汇总的当前准备好进行两阶段提交的事务的信息
snapshot.snap_summary_statement -- SQL语句的全量信息
snapshot.snap_global_statement_count -- 当前时刻执行的DML/DDL/DQL/DCL语句统计信息
snapshot.snap_summary_statement_count -- 汇总的当前时刻执行的DML/DDL/DQL/DCL语句统计信息
snapshot.snap_global_config_settings -- 数据库运行时参数信息
snapshot.snap_global_wait_events -- event等待相关统计信息
snapshot.snap_summary_user_login -- 用户登录和退出次数的相关信息
snapshot.snap_global_ckpt_status -- 实例的检查点信息和各类日志刷页情况
snapshot.snap_global_double_write_status -- 实例的双写文件的情况
snapshot.snap_global_pagewriter_status -- 实例的刷页信息和检查点信息
snapshot.snap_global_redo_status -- 实例的日志回放情况
snapshot.snap_global_rto_status -- 极致RTO状态信息
snapshot.snap_global_recovery_status -- 主机和备机的日志流控信息
snapshot.snap_global_threadpool_status -- 节点上的线程池中工作线程及会话的状态信息
snapshot.snap_statement_responsetime_percentile -- SQL响应时间P80、P95分布信息
snapshot.snap_global_statio_all_indexes -- 数据库中的每个索引行、显示特定索引的I/O的统计
snapshot.snap_summary_statio_all_indexes -- 汇总的数据库中的每个索引行、显示特定索引的I/O的统计
snapshot.snap_global_statio_all_sequences -- 数据库中每个序列的每一行、显示特定序列关于I/O的统计
snapshot.snap_summary_statio_all_sequences -- 汇总的数据库中每个序列的每一行、显示特定序列关于I/O的统计
snapshot.snap_global_statio_all_tables -- 数据库中每个表(包括TOAST表)的I/O的统计
snapshot.snap_summary_statio_all_tables -- 汇总的数据库中每个表(包括TOAST表)的I/O的统计
snapshot.snap_global_stat_all_indexes -- 数据库中的每个索引行,显示访问特定索引的统计
snapshot.snap_summary_stat_all_indexes -- 汇总的数据库中的每个索引行,显示访问特定索引的统计
snapshot.snap_summary_stat_user_functions -- 汇总的数据库节点用户自定义函数的相关统计信息
snapshot.snap_global_stat_user_functions -- 用户所创建的函数的状态的统计信息
snapshot.snap_global_stat_all_tables -- 每个表的一行(包括TOAST表)的统计信息
snapshot.snap_summary_stat_all_tables -- 汇总的每个表的一行(包括TOAST表)的统计信息
snapshot.snap_class_vital_info -- 校验相同的表或者索引的Oid是否一致
snapshot.snap_global_record_reset_time -- 重置(重启,主备倒换,数据库删除)openGauss统计信息时间
snapshot.snap_summary_statio_indexes_name -- 表snap_summary_statio_all_indexes的索引
snapshot.snap_summary_statio_tables_name -- 表snap_summary_statio_all_tables的索引
snapshot.snap_summary_stat_indexes_name -- 表snap_summary_stat_all_indexes的索引
snapshot.snap_class_info_name -- 表snap_class_vital_info的索引
生成WDR报告一般操作步骤:
1.执行以下SQL命令,查询已经生成的快照信息。
#select * from snapshot.snapshot;
2.手工生成快照:
#select create_wdr_snapshot();
3.生成WDR报告:
1)指定生成一份 WDR 报告文件:
# \a \t \o wdr20240201.html
上述命令涉及参数说明如下:
\a: 切换非对齐模式。
\t: 切换输出的字段名的信息和行计数脚注。
\o: 把所有的查询结果发送至服务器文件里。
服务器文件路径:生成性能报告文件存放路径。用户需要拥有此路径的读写权限。
2)指定快照ID生成WDR报告内容
#select generate_wdr_report(begin_snap_id bigint, end_snap_id bigint, report_type cstring, report_scope cstring, node_name cstring);
参数说明:
begin_snap_id:查询时间段开始的 snapshot 的 id(表 snapshot.snaoshot 中的 snapshot_id)。
end_snap_id:查询时间段结束 snapshot 的 id。默认 end_snap_id 大于 begin_snap_id(表 snapshot.snaoshot 中的 snapshot_id)。
report_type 指定生成 report 的类型。例如,summary/detail/all。summary: 汇总数据;detail: 明细数据;all: 包含 summary 和 detail。
report_scope:指定生成 report 的范围,可以为 cluster 或者 node。cluster: 数据库级别的信息;node:节点级别的信息。
node_name:在 report_scope 指定为 node 时,需要把该参数指定为对应节点的名称。(节点名称可以执行 select * from pg_node_env;查询);在 report_scope 为 cluster 时,该值可以指定为省略、空或者为 NULL。
例如:
select generate_wdr_report(31, 32, 'all', 'cluster',null);
select generate_wdr_report(31, 32, 'all', 'node','dn_6001');
3)关闭WDR报告文件
#\a \t \o
完整的例子:
打开生成的WDR报告:
解读WDR报告:
WDR 报告的概况信息
从这一部分我们能得到如下信息:
这一部分是 WDR 报告的概况信息,从这一部分我们能得到如下信息:
信息分类 | 信息描述 |
---|---|
报告采集类型 | Summary + Detail,即汇总数据+明细数据 |
Snapshot 信息 | 使用 snapshot_id 为 31 和 32的快照采集 2025-06-18(19:49 ~ 19:59)的运行信息 |
硬件配置 | 16*16 7G内存 |
节点名 | dn_6001 |
openGauss 版本 | openGauss 6 |
相关代码:
第一部分,Report Type/Report Scope/Report Node内容来源于执行generate_wdr_report函数时输入的参数,详见源码“GenReport::ShowReportType(report_params* params)”
第二部分查询SQL:(变量ld-->snapshot_id)
select snapshot_id as "Snapshot Id",
to_char(start_ts, 'YYYY-MM-DD HH24:MI:SS') as "Start Time",
to_char(end_ts, 'YYYY-MM-DD HH24:MI:SS') as "End Time"
from snapshot.snapshot
where snapshot_id = %ld or snapshot_id = %ld;
第三部分查询SQL:(变量ld-->snapshot_id)
select 'CPUS', x.snap_value
from (select * from pg_node_env) t,
(select * from snapshot.snap_global_os_runtime) x
where x.snap_node_name = t.node_name
and x.snapshot_id = %ld
and (x.snap_name = 'NUM_CPUS');
select 'CPU Cores', x.snap_value
from (select * from pg_node_env) t,
(select * from snapshot.snap_global_os_runtime) x
where x.snap_node_name = t.node_name
and x.snapshot_id = %ld
and x.snap_name = 'NUM_CPU_CORES';
select 'CPU Sockets', x.snap_value
from (select * from pg_node_env) t,
(select * from snapshot.snap_global_os_runtime) x
where x.snap_node_name = t.node_name
and x.snapshot_id = %ld
and x.snap_name = 'NUM_CPU_SOCKETS';
select 'Physical Memory', pg_size_pretty(x.snap_value)
from (select * from pg_node_env) t,
(select * from snapshot.snap_global_os_runtime) x
where x.snap_node_name = t.node_name
and x.snapshot_id = %ld
and x.snap_name = 'PHYSICAL_MEMORY_BYTES';
select node_name as "Host Node Name" from pg_node_env;
select version() as "openGauss Version";
Database Stat
- 数据库维度性能统计信息:事务,读写,行活动,写冲突,死锁等。
- 数据库范围报表,仅cluster模式下可查看此报表。
Database Stat报表主要内容
这一部分描述的是后台写操作的统计信息,数据来源于 snapshot.snap_global_bgwriter_stat 表。
具体内容如下:
列名 | 数据获取相关函数 | 说明 |
---|---|---|
Checkpoints Timed | pg_stat_get_bgwriter_timed_checkpoints() | 执行的定期检查点数 |
Checkpoints Require | pg_stat_get_bgwriter_requested_checkpoints() | 执行的需求检查点数 |
Checkpoint Write Time(ms) | pg_stat_get_checkpoint_write_time() | 检查点操作中,文件写入到磁盘消耗的时间(单位:毫秒) |
Checkpoint Sync Time(ms) | pg_stat_get_checkpoint_sync_time() | 检查点操作中,文件同步到磁盘消耗的时间(单位:毫秒) |
Buffers Checkpoint | pg_stat_get_bgwriter_buf_written_checkpoints() | 检查点写缓冲区数量 |
Buffers Clean | pg_stat_get_bgwriter_buf_written_clean() | 后端写线程写缓冲区数量 |
Maxwritten Clean | pg_stat_get_bgwriter_maxwritten_clean() | 后端写线程停止清理 Buffer 的次数 |
Buffers Backend | pg_stat_get_buf_written_backend() | 通过后端直接写缓冲区数 |
Buffers Backend | Fsync pg_stat_get_buf_fsync_backend() | 后端不得不执行自己的 fsync 调用的时间数(通常后端写进程处理这些即使后端确实自己写) |
Buffers Alloc | pg_stat_get_buf_alloc() 分配的缓冲区数量 | |
Stats Reset | pg_stat_get_bgwriter_stat_reset_time() | 这些统计被重置的时间 |
相关代码:
-- 说明:%s代表node_name,%ld代表snapshot_id
select (snap_2.snap_checkpoints_timed - coalesce(snap_1.snap_checkpoints_timed, 0)) AS "Checkpoints Timed",
(snap_2.snap_checkpoints_req - coalesce(snap_1.snap_checkpoints_req, 0)) AS "Checkpoints Require",
(snap_2.snap_checkpoint_write_time - coalesce(snap_1.snap_checkpoint_write_time, 0)) AS "Checkpoint Write Time(ms)",
(snap_2.snap_checkpoint_sync_time - coalesce(snap_1.snap_checkpoint_sync_time, 0)) AS "Checkpoint Sync Time(ms)",
(snap_2.snap_buffers_checkpoint - coalesce(snap_1.snap_buffers_checkpoint, 0)) AS "Buffers Checkpoint",
(snap_2.snap_buffers_clean - coalesce(snap_1.snap_buffers_clean, 0)) AS "Buffers Clean",
(snap_2.snap_maxwritten_clean - coalesce(snap_1.snap_maxwritten_clean, 0)) AS "Maxwritten Clean",
(snap_2.snap_buffers_backend - coalesce(snap_1.snap_buffers_backend, 0)) AS "Buffers Backend",
(snap_2.snap_buffers_backend_fsync - coalesce(snap_1.snap_buffers_backend_fsync, 0)) AS "Buffers Backend Fsync",
(snap_2.snap_buffers_alloc - coalesce(snap_1.snap_buffers_alloc, 0)) AS "Buffers Alloc",
to_char(snap_2.snap_stats_reset, 'YYYY-MM-DD HH24:MI:SS') AS "Stats Reset"
from
(select * from snapshot.snap_global_bgwriter_stat
where snapshot_id = %ld
and snap_node_name = '%s') snap_2
LEFT JOIN
(select * from snapshot.snap_global_bgwriter_stat
where snapshot_id = %ld
and snap_node_name = '%s') snap_1
on snap_2.snapshot_id = snap_1.snapshot_id --错误点:snap_2.snapshot_id = snap_1.snapshot_id ? 这其实还是同一个snapshot
and snap_2.snap_node_name = snap_1.snap_node_name
and snap_2.snap_stats_reset = snap_1.snap_stats_reset
limit 200;
-- 统计信息应该是2次snapshot之间的数据,而以上SQL并不能正确输出相关数据。个人觉得可以删除LEFT JOIN连接。
-- 建议修改如下:
select (snap_2.snap_checkpoints_timed - coalesce(snap_1.snap_checkpoints_timed, 0)) AS "Checkpoints Timed",
(snap_2.snap_checkpoints_req - coalesce(snap_1.snap_checkpoints_req, 0)) AS "Checkpoints Require",
(snap_2.snap_checkpoint_write_time - coalesce(snap_1.snap_checkpoint_write_time, 0)) AS "Checkpoint Write Time(ms)",
(snap_2.snap_checkpoint_sync_time - coalesce(snap_1.snap_checkpoint_sync_time, 0)) AS "Checkpoint Sync Time(ms)",
(snap_2.snap_buffers_checkpoint - coalesce(snap_1.snap_buffers_checkpoint, 0)) AS "Buffers Checkpoint",
(snap_2.snap_buffers_clean - coalesce(snap_1.snap_buffers_clean, 0)) AS "Buffers Clean",
(snap_2.snap_maxwritten_clean - coalesce(snap_1.snap_maxwritten_clean, 0)) AS "Maxwritten Clean",
(snap_2.snap_buffers_backend - coalesce(snap_1.snap_buffers_backend, 0)) AS "Buffers Backend",
(snap_2.snap_buffers_backend_fsync - coalesce(snap_1.snap_buffers_backend_fsync, 0)) AS "Buffers Backend Fsync",
(snap_2.snap_buffers_alloc - coalesce(snap_1.snap_buffers_alloc, 0)) AS "Buffers Alloc",
to_char(snap_2.snap_stats_reset, 'YYYY-MM-DD HH24:MI:SS') AS "Stats Reset"
from
(select * from snapshot.snap_global_bgwriter_stat
where snapshot_id = %ld
and snap_node_name = '%s') snap_2,
(select * from snapshot.snap_global_bgwriter_stat
where snapshot_id = %ld
and snap_node_name = '%s') snap_1
limit 200;
Load Profile
- 数据库维度的性能统计信息:CPU时间,DB时间,逻辑读/物理读,IO性能,登入登出,负载强度,负载性能表现等。
- 数据库范围报表,仅cluster模式下可查看此报表。
Load Profile报表主要内容
从数据库的 内存缓冲区(Buffer Cache) 中读取数据块(Block) 数据已经在内存中,无需访问磁盘,速度极快。 可以增大 Buffer Cache提高逻辑读。 | |
从 磁盘(数据文件) 中读取数据块到内存缓冲区。 需要磁盘 I/O 操作,速度比逻辑读慢得多。 | |
Instance Efficiency Percentages
- 数据库级或者节点缓冲命中率。
- 数据库、节点范围报表,cluster模式和node模式下均可查看此报表。
Instance Efficiency Percentages报表主要内容
目标值是 100%,即越接近 100%,数据库运行越健康
Buffer Hit:即数据库请求的数据在 buffer 中命中的比例,该指标越高代表 openGauss 在 buffer 中查询到目标数据的概率越高,数据读取性能越好。
Effective CPU:即有效的 CPU 使用比例,该指标偏小则说明 CPU 的有效使用偏低,处于等待状态的比例可能较高。
WalWrite NoWait:即 WAL 日志写入时不等待的比例,该指标接近 100%,说明 buffer 容量充足,可以满足 WAL 写操作的需求,若指标值偏小则可能需要调大 buffer 容量。
Soft Parse:即 SQL 软解析的比例,该指标接近 100%,说明当前执行的 SQL 基本都可以在 Buffer 中找到,若指标值偏小则说明存在大量硬解析,需要分析原因,对 DML 语句进行适度优化。
Non-Parse CPU:即非解析占用的 CPU 比例,该指标接近 100%,说明 SQL 解析并没有占用较多的 CPU 时间。
相关代码:
-- 变量:ld指的是snapshot_id,手动执行以下SQL语句时请自行替换对应的snapshot_id
select
unnest(array['Buffer Hit %%', 'Effective CPU %%', 'WalWrite NoWait %%', 'Soft Parse %%', 'Non-Parse CPU %%']) as "Metric Name",
unnest(array[case when s1.all_reads = 0 then 1 else round(s1.blks_hit * 100 / s1.all_reads) end, s2.cpu_to_elapsd, s3.walwrite_nowait, s4.soft_parse, s5.non_parse]) as "Metric Value"
from
(select (snap_2.all_reads - coalesce(snap_1.all_reads, 0)) as all_reads,
(snap_2.blks_hit - coalesce(snap_1.blks_hit, 0)) as blks_hit
from
(select sum(coalesce(snap_blks_read, 0) + coalesce(snap_blks_hit, 0)) as all_reads,
coalesce(sum(snap_blks_hit), 0) as blks_hit
from snapshot.snap_summary_stat_database
where snapshot_id = %ld) snap_1,
(select sum(coalesce(snap_blks_read, 0) + coalesce(snap_blks_hit, 0)) as all_reads,
coalesce(sum(snap_blks_hit), 0) as blks_hit
from snapshot.snap_summary_stat_database
where snapshot_id = %ld) snap_2
) s1,
(select round(cpu_time.snap_value * 100 / greatest(db_time.snap_value, 1)) as cpu_to_elapsd
from
(select coalesce(snap_2.snap_value, 0) - coalesce(snap_1.snap_value, 0) as snap_value
from
(select snap_stat_name, snap_value from snapshot.snap_global_instance_time
where snapshot_id = %ld and snap_stat_name = 'CPU_TIME') snap_1,
(select snap_stat_name, snap_value from snapshot.snap_global_instance_time
where snapshot_id = %ld and snap_stat_name = 'CPU_TIME') snap_2) cpu_time,
(select coalesce(snap_2.snap_value, 0) - coalesce(snap_1.snap_value, 0) as snap_value
from
(select snap_stat_name, snap_value from snapshot.snap_global_instance_time
where snapshot_id = %ld and snap_stat_name = 'DB_TIME') snap_1,
(select snap_stat_name, snap_value from snapshot.snap_global_instance_time
where snapshot_id = %ld and snap_stat_name = 'DB_TIME') snap_2) db_time
) s2,
(select (bufferAccess.snap_wait - bufferFull.snap_wait) * 100 / greatest(bufferAccess.snap_wait, 1) as walwrite_nowait
from
(select coalesce(snap_2.snap_wait) - coalesce(snap_1.snap_wait, 0) as snap_wait
from
(select snap_wait from snapshot.snap_global_wait_events
where snapshot_id = %ld and snap_event = 'WALBufferFull') snap_1,
(select snap_wait from snapshot.snap_global_wait_events
where snapshot_id = %ld and snap_event = 'WALBufferFull') snap_2) bufferFull,
(select coalesce(snap_2.snap_wait) - coalesce(snap_1.snap_wait, 0) as snap_wait
from
(select snap_wait from snapshot.snap_global_wait_events
where snapshot_id = %ld and snap_event = 'WALBufferAccess') snap_1,
(select snap_wait from snapshot.snap_global_wait_events
where snapshot_id = %ld and snap_event = 'WALBufferAccess') snap_2) bufferAccess
) s3,
(select round((snap_2.soft_parse - snap_1.soft_parse) * 100 / greatest((snap_2.hard_parse + snap_2.soft_parse)-(snap_1.hard_parse + snap_1.soft_parse), 1)) as soft_parse
from
(select sum(snap_n_soft_parse) as soft_parse, sum(snap_n_hard_parse) as hard_parse from snapshot.snap_summary_statement
where snapshot_id = %ld ) snap_1,
(select sum(snap_n_soft_parse) as soft_parse, sum(snap_n_hard_parse) as hard_parse from snapshot.snap_summary_statement
where snapshot_id = %ld ) snap_2
) s4,
(select round((snap_2.elapse_time - snap_1.elapse_time) * 100 /greatest((snap_2.elapse_time + snap_2.parse_time)-(snap_1.elapse_time + snap_1.parse_time), 1)) as non_parse
from
(select sum(snap_total_elapse_time) as elapse_time, sum(snap_parse_time) as parse_time from snapshot.snap_summary_statement
where snapshot_id = %ld ) snap_1,
(select sum(snap_total_elapse_time) as elapse_time, sum(snap_parse_time) as parse_time from snapshot.snap_summary_statement
where snapshot_id = %ld ) snap_2
) s5;
TOP 10 Events by Total Wait Time
数据库 Top 10 的等待事件、等待次数、总等待时间、平均等待时间、等待事件类型
- 最消耗时间的事件。
- 节点范围报表,仅node模式下可查看此报表。
Top 10 Events by Total Wait Time报表主要内容
等待事件主要分为等待状态、等待轻量级锁、等待 IO、等待事务锁这 4 类,详见下表所示:
等待状态列表
当 wait_status 为 acquire lwlock、acquire lock 或者 wait io 时,表示有等待事件。
正在等待获取 wait_event 列对应类型的轻量级锁、事务锁,或者正在进行 IO。
- 轻量级锁等待事件列表
当 wait_status 值为 acquire lwlock(轻量级锁)时对应的 wait_event 等待事件类型即为轻量级锁等待。
wait_event 为 extension 时,表示此时的轻量级锁是动态分配的锁,未被监控。
wait_event类型 | |
---|---|
- IO 等待事件列表
当 wait_status 值为 wait io 时对应的 wait_event 等待事件类型即为 IO 等待事件。
wait_event类型 | |
---|---|
将pg_clog、pg_subtrans和pg_multixact文件持久化到磁盘。主要在执行checkpoint和数据库停机时发生。 | |
将脏页写入文件pg_clog、pg_subtrans和pg_multixact并持久化到磁盘。主要在执行checkpoint和数据库停机时发生。 | |
- 事务锁等待事件列表
当 wait_status 值为 acquire lock(事务锁)时对应的 wait_event 等待事件类型为事务锁等待事件。
wait_event 类型 | 类型描述 |
---|---|
relation | 对表加锁 |
extend | 对表扩展空间时加锁 |
partition | 对分区表加锁 |
partition_seq | 对分区表的分区加锁 |
page | 对表页面加锁 |
tuple | 对页面上的 tuple 加锁 |
transactionid | 对事务 ID 加锁 |
virtualxid | 对虚拟事务 ID 加锁 |
object | 加对象锁 |
cstore_freespace | 对列存空闲空间加锁 |
userlock | 加用户锁 |
advisory | 加 advisory 锁 |
相关代码:
-- 说明:变量ld-->snapshot_id, 变量s-->node_name
select snap_event as "Event",
snap_wait as "Waits",
snap_total_wait_time as "Total Wait Times(us)",
round(snap_total_wait_time/snap_wait) as "Wait Avg(us)",
snap_type as "Wait Class"
from (
select snap_2.snap_event as snap_event,
snap_2.snap_type as snap_type,
snap_2.snap_wait - snap_1.snap_wait as snap_wait,
snap_2.total_time - snap_1.total_time as snap_total_wait_time
from
(select snap_event,
snap_wait,
snap_total_wait_time as total_time,
snap_type
from snapshot.snap_global_wait_events
where snapshot_id = %ld
and snap_event != 'none'
and snap_event != 'wait cmd'
and snap_event != 'unknown_lwlock_event'
and snap_nodename = '%s') snap_1,
(select snap_event,
snap_wait,
snap_total_wait_time as total_time,
snap_type
from snapshot.snap_global_wait_events
where snapshot_id = %ld
and snap_event != 'none'
and snap_event != 'wait cmd'
and snap_event != 'unknown_lwlock_event'
and snap_nodename = '%s') snap_2
where snap_2.snap_event = snap_1.snap_event
order by snap_total_wait_time desc limit 10)
where snap_wait != 0;
Wait Classes by Total Wait Time
按照等待类型(STATUS、IO_EVENT、LWLOCK_EVENT、LOCK_EVENT),分类统计等待次数、总等待时间、平均等待时间。
- 最消耗时间的等待时间分类。
- 节点范围报表,仅node模式下可查看此报表。
Wait Classes by Total Wait Time报表主要内容
| |
相关代码
-- 说明: 变量ld-->snapshot_id, 变量s-->node_name
select
snap_2.type as "Wait Class",
(snap_2.wait - snap_1.wait) as "Waits",
(snap_2.total_wait_time - snap_1.total_wait_time) as "Total Wait Time(us)",
round((snap_2.total_wait_time - snap_1.total_wait_time) / greatest((snap_2.wait - snap_1.wait), 1)) as "Wait Avg(us)"
from
(select
snap_type as type,
sum(snap_total_wait_time) as total_wait_time,
sum(snap_wait) as wait from snapshot.snap_global_wait_events
where snapshot_id = %ld
and snap_nodename = '%s'
and snap_event != 'unknown_lwlock_event'
and snap_event != 'none'
group by snap_type) snap_2
left join
(select
snap_type as type,
sum(snap_total_wait_time) as total_wait_time,
sum(snap_wait) as wait
from snapshot.snap_global_wait_events
where snapshot_id = %ld
and snap_nodename = '%s'
and snap_event != 'unknown_lwlock_event' and snap_event != 'none'
group by snap_type) snap_1
on snap_2.type = snap_1.type
order by "Total Wait Time(us)" desc;
分别从等待时长、等待次数这两个维度对等待事件进行统计。
表格中列的含义即就是列的英文翻译,这里就不再复述了。
具体的等待事件的介绍详见前文:“Top 10 Events by Total Wait Time”部分的内容,这里也不再复述。
相关代码:
-- 说明:%s代表node_name,%ld代表snapshot_id,无论从哪个维度统计,基本的SQL语句差异不大,这里仅列举"by wait time"的SQL示例
select t2.snap_type as "Type",
t2.snap_event as "Event",
(t2.snap_total_wait_time - coalesce(t1.snap_total_wait_time, 0)) as "Total Wait Time (us)",
(t2.snap_wait - coalesce(t1.snap_wait, 0)) as "Waits",
(t2.snap_failed_wait - coalesce(t1.snap_failed_wait, 0)) as "Failed Waits",
(case "Waits"
when 0 then 0
else round("Total Wait Time (us)" / "Waits", 2)
end) as "Avg Wait Time (us)",
t2.snap_max_wait_time as "Max Wait Time (us)"
from
(select * from snapshot.snap_global_wait_events
where snapshot_id = %ld
and snap_nodename = '%s'
and snap_event != 'unknown_lwlock_event'
and snap_event != 'none') t1
right join
(select * from snapshot.snap_global_wait_events
where snapshot_id = %ld
and snap_nodename = '%s'
and snap_event != 'unknown_lwlock_event'
and snap_event != 'none') t2
on t1.snap_event = t2.snap_event
order by "Total Wait Time (us)"
desc limit 200;
Host CPU
- 主机CPU消耗。
- 节点范围报表,仅node模式下可查看此报表。
Host CPU报表主要内容
主机 CPU 的负载情况:
CPU 的平均负载、用户使用占比、系统使用占比、IO 等待占比、空闲占比
相关代码
select
snap_2.cpus as "Cpus",
snap_2.cores as "Cores",
snap_2.sockets as "Sockets",
snap_1.load as "Load Average Begin",
snap_2.load as "Load Average End",
round(coalesce((snap_2.user_time - snap_1.user_time), 0) / greatest(coalesce((snap_2.total_time - snap_1.total_time), 0), 1) * 100, 2) as "%User",
round(coalesce((snap_2.sys_time - snap_1.sys_time), 0) / greatest(coalesce((snap_2.total_time - snap_1.total_time), 0), 1) * 100, 2) as "%System",
round(coalesce((snap_2.iowait_time - snap_1.iowait_time), 0) / greatest(coalesce((snap_2.total_time - snap_1.total_time), 0), 1) * 100, 2) as "%WIO",
round(coalesce((snap_2.idle_time - snap_1.idle_time), 0) / greatest(coalesce((snap_2.total_time - snap_1.total_time), 0), 1) * 100, 2) as "%Idle"
from
(select H.cpus, H.cores, H.sockets, H.idle_time, H.user_time, H.sys_time, H.iowait_time,
(H.idle_time + H.user_time + H.sys_time + H.iowait_time) AS total_time, H.load from
(select C.cpus, E.cores, T.sockets, I.idle_time, U.user_time, S.sys_time, W.iowait_time, L.load from
(select snap_value as cpus from snapshot.snap_global_os_runtime
where (snapshot_id = %ld and snap_node_name = '%s' and snap_name = 'NUM_CPUS')) AS C,
(select snap_value as cores from snapshot.snap_global_os_runtime
where (snapshot_id = %ld and snap_node_name = '%s' and snap_name = 'NUM_CPU_CORES')) AS E,
(select snap_value as sockets from snapshot.snap_global_os_runtime
where (snapshot_id = %ld and snap_node_name = '%s' and snap_name = 'NUM_CPU_SOCKETS')) AS T,
(select snap_value as idle_time from snapshot.snap_global_os_runtime
where (snapshot_id = %ld and snap_node_name = '%s' and snap_name = 'IDLE_TIME')) AS I,
(select snap_value as user_time from snapshot.snap_global_os_runtime
where (snapshot_id = %ld and snap_node_name = '%s' and snap_name = 'USER_TIME')) AS U,
(select snap_value as sys_time from snapshot.snap_global_os_runtime
where (snapshot_id = %ld and snap_node_name = '%s' and snap_name = 'SYS_TIME')) AS S,
(select snap_value as iowait_time from snapshot.snap_global_os_runtime
where (snapshot_id = %ld and snap_node_name = '%s' and snap_name = 'IOWAIT_TIME')) AS W,
(select snap_value as load from snapshot.snap_global_os_runtime
where (snapshot_id = %ld and snap_node_name = '%s' and snap_name = 'LOAD')) AS L ) as H ) as snap_2,
(select H.cpus, H.cores, H.sockets, H.idle_time, H.user_time, H.sys_time, H.iowait_time,
(H.idle_time + H.user_time + H.sys_time + H.iowait_time) AS total_time, H.load from
(select C.cpus, E.cores, T.sockets, I.idle_time, U.user_time, S.sys_time, W.iowait_time, L.load from
(select snap_value as cpus from snapshot.snap_global_os_runtime
where (snapshot_id = %ld and snap_node_name = '%s' and snap_name = 'NUM_CPUS')) AS C,
(select snap_value as cores from snapshot.snap_global_os_runtime
where (snapshot_id = %ld and snap_node_name = '%s' and snap_name = 'NUM_CPU_CORES')) AS E,
(select snap_value as sockets from snapshot.snap_global_os_runtime
where (snapshot_id = %ld and snap_node_name = '%s' and snap_name = 'NUM_CPU_SOCKETS')) AS T,
(select snap_value as idle_time from snapshot.snap_global_os_runtime
where (snapshot_id = %ld and snap_node_name = '%s' and snap_name = 'IDLE_TIME')) AS I,
(select snap_value as user_time from snapshot.snap_global_os_runtime
where (snapshot_id = %ld and snap_node_name = '%s' and snap_name = 'USER_TIME')) AS U,
(select snap_value as sys_time from snapshot.snap_global_os_runtime
where (snapshot_id = %ld and snap_node_name = '%s' and snap_name = 'SYS_TIME')) AS S,
(select snap_value as iowait_time from snapshot.snap_global_os_runtime
where (snapshot_id = %ld and snap_node_name = '%s' and snap_name = 'IOWAIT_TIME')) AS W,
(select snap_value as load from snapshot.snap_global_os_runtime
where (snapshot_id = %ld and snap_node_name = '%s' and snap_name = 'LOAD')) AS L ) as H ) as snap_1 ;
Memory Statistics
- 内核内存使用分布。
- 节点范围报表,仅node模式下可查看此报表。
Memory Statistics报表主要内容
节点内存的变化信息
通过这些变化信息,我们可以了解到在两次快照期间,数据库的内存变化情况,作为数据库性能分析或异常分析的参考。数据来源于 snapshot.snap_global_memory_node_detail。
这部分分别描述了: 内存的类型 以及 对应的起始大小和终止大小。
这里没有采集到数据的原因: 测试环境内存太小,导致启动时将 memory protect 关闭了,从而导致无法查询 dbe_perf.global_memory_node_detail 视图。 而 WDR 的内存统计数据(snapshot.snap_global_memory_node_detail)则来源于该视图。
另外,请确保 disable_memory_protect=off。
关于这部分 Memory Type 常见类型如下:
Memory 类型 | 说明 |
---|---|
max_process_memory | openGauss 实例所占用的内存大小 |
process_used_memory | 进程所使用的内存大小 |
max_dynamic_memory | 最大动态内存 |
dynamic_used_memory | 已使用的动态内存 |
dynamic_peak_memory | 内存的动态峰值 |
dynamic_used_shrctx | 最大动态共享内存上下文 |
dynamic_peak_shrctx | 共享内存上下文的动态峰值 |
max_shared_memory | 最大共享内存 |
shared_used_memory | 已使用的共享内存 |
max_cstore_memory | 列存所允许使用的最大内存 |
cstore_used_memory | 列存已使用的内存大小 |
max_sctpcomm_memory | sctp 通信所允许使用的最大内存 |
sctpcomm_used_memory | sctp 通信已使用的内存大小 |
sctpcomm_peak_memory | sctp 通信的内存峰值 |
other_used_memory | 其他已使用的内存大小 |
gpu_max_dynamic_memory | GPU 最大动态内存 |
gpu_dynamic_used_memory | GPU 已使用的动态内存 |
gpu_dynamic_peak_memory | GPU 内存的动态峰值 |
pooler_conn_memory | 链接池申请内存计数 |
pooler_freeconn_memory | 链接池空闲连接的内存计数 |
storage_compress_memory | 存储模块压缩使用的内存大小 |
udf_reserved_memory | UDF 预留的内存大小 |
相关代码:
-- 说明:%s代表node_name,%ld代表snapshot_id
select
snap_2.snap_memorytype as "Memory Type",
snap_1.snap_memorymbytes as "Begin(MB)",
snap_2.snap_memorymbytes as "End(MB)"
from
(select snap_memorytype, snap_memorymbytes
from
snapshot.snap_global_memory_node_detail
where (snapshot_id = %ld and snap_nodename = '%s')
and
(snap_memorytype = 'max_process_memory'
or snap_memorytype = 'process_used_memory'
or snap_memorytype = 'max_shared_memory'
or snap_memorytype = 'shared_used_memory'))
as snap_2
left join
(select snap_memorytype, snap_memorymbytes
from
snapshot.snap_global_memory_node_detail
where (snapshot_id = %ld and snap_nodename = '%s')
and (snap_memorytype = 'max_process_memory'
or snap_memorytype = 'process_used_memory'
or snap_memorytype = 'max_shared_memory'
or snap_memorytype = 'shared_used_memory'))
as snap_1
on snap_2.snap_memorytype = snap_1.snap_memorytype;
Time Mode
数据库各种状态所消耗的时间
- 节点范围的语句的时间分布信息。
- 节点范围报表,仅node模式下可查看此报表。
Time Model报表主要内容
相关代码:
-- 说明:%s代表node_name,%ld代表snapshot_id
select t2.snap_node_name as "Node Name",
t2.snap_stat_name as "Stat Name",
(t2.snap_value - coalesce(t1.snap_value, 0)) as "Value(us)"
from
(select * from snapshot.snap_global_instance_time
where snapshot_id = %ld
and snap_node_name = '%s') t1
right join
(select * from snapshot.snap_global_instance_time
where snapshot_id = %ld
and snap_node_name = '%s') t2
on t1.snap_stat_name = t2.snap_stat_name
order by "Value(us)"
desc limit 200;
Wait Events
- 节点级别的等待事件的统计信息。
- 节点范围报表,仅node模式下可查看此报表。
Wait Events报表主要内容
| |
IO Profile
- 数据库或者节点维度的IO的使用情况,即IO负载情况。
- 数据库、节点范围报表,cluster模式和node模式下均可查看此报表。
IO Profile指标表主要内容
Database requests: 即每秒 IO 请求次数,包括请求次数总和、读请求次数、写请求次数.
Database(blocks): 即每秒 block 请求数量,包含请求的 block 总和数量、读 block 的数量和写 block 的数量.
Database(MB): 即将 block 换算成容量(MB)[如:blocks * 8/1024],增加数据的可读性。
Redo requests 和 Redo(MB) 分别表示每秒 redo 的写请求次数和 redo 写的数据量。
相关代码:
-- 由于源码中相关SQL融合了C++程序语法,像我这种不做开发的DBA读起来有些难以理解【如:(phyblkwrt * %d) >> 20 这个段没有很好理解】。
-- 但是依旧不影响我们对这些数据采集方法的理解,相关SQL如下:
-- 两个snapshot_id(24和25)期间,数据块的IO统计信息(数值除以3600即换算成以秒为单位的WDR数据)
postgres=# select
(snap_2.phytotal - snap_1.phytotal) as phytotal,
(snap_2.phyblktotal - snap_1.phyblktotal) as phyblktotal,
(snap_2.phyrds - snap_1.phyrds) as phyrds,
(snap_2.phyblkrd - snap_1.phyblkrd) as phyblkrd,
(snap_2.phywrts - snap_1.phywrts) as phywrts,
(snap_2.phyblkwrt - snap_1.phyblkwrt) as phyblkwrt
from
(select (snap_phyrds + snap_phywrts) as phytotal,
(snap_phyblkwrt + snap_phyblkrd) as phyblktotal,
snap_phyrds as phyrds, snap_phyblkrd as phyblkrd,
snap_phywrts as phywrts, snap_phyblkwrt as phyblkwrt
from snapshot.snap_global_rel_iostat
where snapshot_id = 24 and snap_node_name = 'dn_6001') snap_1,
(select (snap_phyrds + snap_phywrts) as phytotal,
(snap_phyblkwrt + snap_phyblkrd) as phyblktotal,
snap_phyrds as phyrds, snap_phyblkrd as phyblkrd,
snap_phywrts as phywrts, snap_phyblkwrt as phyblkwrt
from snapshot.snap_global_rel_iostat
where snapshot_id = 25 and snap_node_name = 'dn_6001') snap_2;
phytotal | phyblktotal | phyrds | phyblkrd | phywrts | phyblkwrt
----------+-------------+---------+----------+---------+-----------
4626892 | 4626892 | 2955639 | 2955639 | 1671253 | 1671253
-- 两个snapshot_id(24和25)期间,redo的统计信息(数值除以3600即换算成以秒为单位的WDR数据)
postgres=# select
(snap_2.phywrts - snap_1.phywrts) as phywrts,
(snap_2.phyblkwrt - snap_1.phyblkwrt) as phyblkwrt
from
(select sum(snap_phywrts) as phywrts, sum(snap_phyblkwrt) as phyblkwrt
from snapshot.snap_global_file_redo_iostat
where snapshot_id = 24 and snap_node_name = 'dn_6001') snap_1,
(select sum(snap_phywrts) as phywrts, sum(snap_phyblkwrt) as phyblkwrt
from snapshot.snap_global_file_redo_iostat
where snapshot_id = 25 and snap_node_name = 'dn_6001') snap_2;
phywrts | phyblkwrt
---------+-----------
132721 | 509414
Report Details
报告明细信息,包含:SQL统计信息,缓存IO状态,对象状态,SQL明细,可以点击连接查看每个细节
SQL Statistics
SQL 执行情况进行统计信息,从 SQL 执行时间、SQL 消耗 CPU 的时间、SQL 返回的行数、SQL 扫描的行数、SQL 执行的次数、SQL 物理读的次数、SQL 逻辑读的次数等多维度对两次快照期间的 SQL 执行情况进行统计
- SQL语句各个维度性能统计,按以下维度排序展示:总时间、平均时间、CPU耗时、返回的行数、扫描的行数、执行次数、逻辑读、物理读。
- 数据库、节点范围报表,cluster模式和node模式下均可查看此报表。
在Node下是SQL ordered by Tuples Read,在Cluster下是SQL ordered by Row Read
SQL Statistics报表主要内容
Tips:Top200 显得有些冗余,多余的 SQL 信息并没有太大用处,反而降低了可读性,希望将来能优化到 Top20。
相关代码:
-- 由于多个SQL统计信息的SQL语句类似,这里仅列举SQL执行时间的统计SQL,其他的类似。
-- 说明:%s代表node_name,%ld代表snapshot_id
select t2.snap_unique_sql_id as "Unique SQL Id",
t2.snap_user_name as "User Name",
(t2.snap_total_elapse_time - coalesce(t1.snap_total_elapse_time, 0)) as "Total Elapse Time(us)",
(t2.snap_n_calls - coalesce(t1.snap_n_calls, 0)) as "Calls",
round("Total Elapse Time(us)"/greatest("Calls", 1), 0) as "Avg Elapse Time(us)",
t2.snap_min_elapse_time as "Min Elapse Time(us)",
t2.snap_max_elapse_time as "Max Elapse Time(us)",
(t2.snap_n_returned_rows - coalesce(t1.snap_n_returned_rows, 0)) as "Returned Rows",
((t2.snap_n_tuples_fetched - coalesce(t1.snap_n_tuples_fetched, 0)) +
(t2.snap_n_tuples_returned - coalesce(t1.snap_n_tuples_returned, 0))) as "Tuples Read",
((t2.snap_n_tuples_inserted - coalesce(t1.snap_n_tuples_inserted, 0)) +
(t2.snap_n_tuples_updated - coalesce(t1.snap_n_tuples_updated, 0)) +
(t2.snap_n_tuples_deleted - coalesce(t1.snap_n_tuples_deleted, 0))) as "Tuples Affected",
(t2.snap_n_blocks_fetched - coalesce(t1.snap_n_blocks_fetched, 0)) as "Logical Read",
((t2.snap_n_blocks_fetched - coalesce(t1.snap_n_blocks_fetched, 0)) -
(t2.snap_n_blocks_hit - coalesce(t1.snap_n_blocks_hit, 0))) as "Physical Read",
(t2.snap_cpu_time - coalesce(t1.snap_cpu_time, 0)) as "CPU Time(us)",
(t2.snap_data_io_time - coalesce(t1.snap_data_io_time, 0)) as "Data IO Time(us)",
(t2.snap_sort_count - coalesce(t1.snap_sort_count, 0)) as "Sort Count",
(t2.snap_sort_time - coalesce(t1.snap_sort_time, 0)) as "Sort Time(us)",
(t2.snap_sort_mem_used - coalesce(t1.snap_sort_mem_used, 0)) as "Sort Mem Used(KB)",
(t2.snap_sort_spill_count - coalesce(t1.snap_sort_spill_count, 0)) as "Sort Spill Count",
(t2.snap_sort_spill_size - coalesce(t1.snap_sort_spill_size, 0)) as "Sort Spill Size(KB)",
(t2.snap_hash_count - coalesce(t1.snap_hash_count, 0)) as "Hash Count",
(t2.snap_hash_time - coalesce(t1.snap_hash_time, 0)) as "Hash Time(us)",
(t2.snap_hash_mem_used - coalesce(t1.snap_hash_mem_used, 0)) as "Hash Mem Used(KB)",
(t2.snap_hash_spill_count - coalesce(t1.snap_hash_spill_count, 0)) as "Hash Spill Count",
(t2.snap_hash_spill_size - coalesce(t1.snap_hash_spill_size, 0)) as "Hash Spill Size(KB)",
LEFT(t2.snap_query, 25) as "SQL Text"
from
(select * from snapshot.snap_summary_statement where snapshot_id = %ld and snap_node_name = '%s') t1
right join
(select * from snapshot.snap_summary_statement where snapshot_id = %ld and snap_node_name = '%s') t2
on t1.snap_unique_sql_id = t2.snap_unique_sql_id
and t1.snap_user_id = t2.snap_user_id
order by "Total Elapse Time(us)"
desc limit 200;
Cache IO Stats
- 用户的表、索引的IO的统计信息。
- 数据库、节点范围报表,cluster模式和node模式下均可查看此报表。
Cache IO Stats包含User table和User index两张表,列名称及描述如下所示。
User table IO activity ordered by heap blks hit ratio
表 1 User table IO activity ordered by heap blks hit ratio报表主要内容
这一部分根据 Heap block 的命中率排序统计用户表的 IO 活动状态。
数据来源于 snapshot.snap_global_statio_all_indexes 表和 snapshot.snap_global_statio_all_tables 表。
相关代码:
-- 说明:%s代表node_name,%ld代表snapshot_id
SELECT table_io.db_name as "DB Name",
table_io.snap_schemaname as "Schema Name",
table_io.snap_relname as "Table Name",
table_io.heap_blks_hit_ratio as "%Heap Blks Hit Ratio",
table_io.heap_blks_read as "Heap Blks Read",
table_io.heap_blks_hit as "Heap Blks Hit",
idx_io.idx_blks_read as "Idx Blks Read",
idx_io.idx_blks_hit as "Idx Blks Hit",
table_io.toast_blks_read as "Toast Blks Read",
table_io.toast_blks_hit as "Toast Blks Hit",
table_io.tidx_blks_read as "Tidx Blks Read",
table_io.tidx_blks_hit as "Tidx Blks Hit"
FROM
(select t2.db_name, t2.snap_schemaname , t2.snap_relname ,
(case
when ((t2.snap_heap_blks_read - coalesce(t1.snap_heap_blks_read, 0)) + (t2.snap_heap_blks_hit - coalesce(t1.snap_heap_blks_hit, 0))) = 0
then 0
else round((t2.snap_heap_blks_hit - coalesce(t1.snap_heap_blks_hit, 0))/
((t2.snap_heap_blks_read - coalesce(t1.snap_heap_blks_read, 0)) + (t2.snap_heap_blks_hit - coalesce(t1.snap_heap_blks_hit, 0))) * 100, 2)
end ) as heap_blks_hit_ratio,
(t2.snap_heap_blks_read - coalesce(t1.snap_heap_blks_read, 0)) as heap_blks_read,
(t2.snap_heap_blks_hit - coalesce(t1.snap_heap_blks_hit, 0)) as heap_blks_hit,
(t2.snap_idx_blks_read - coalesce(t1.snap_idx_blks_read, 0)) as idx_blks_read,
(t2.snap_idx_blks_hit - coalesce(t1.snap_idx_blks_hit, 0)) as idx_blks_hit,
(t2.snap_toast_blks_read - coalesce(t1.snap_toast_blks_read, 0)) as toast_blks_read,
(t2.snap_toast_blks_hit - coalesce(t1.snap_toast_blks_hit, 0)) as toast_blks_hit,
(t2.snap_tidx_blks_read - coalesce(t1.snap_tidx_blks_read, 0)) as tidx_blks_read,
(t2.snap_tidx_blks_hit - coalesce(t1.snap_tidx_blks_hit, 0)) as tidx_blks_hit from
(select * from snapshot.snap_global_statio_all_tables
where snapshot_id = %ld and snap_node_name = '%s'
and snap_schemaname NOT IN ('pg_catalog', 'information_schema', 'snapshot')
and snap_schemaname !~ '^pg_toast') t1
right join
(select * from snapshot.snap_global_statio_all_tables
where snapshot_id = %ld
and snap_node_name = '%s'
and snap_schemaname NOT IN ('pg_catalog', 'information_schema', 'snapshot')
and snap_schemaname !~ '^pg_toast') t2
on t1.snap_relid = t2.snap_relid
and t2.db_name = t1.db_name
and t2.snap_schemaname = t1.snap_schemaname ) as table_io
LEFT JOIN
(select t2.db_name , t2.snap_schemaname , t2.snap_relname,
(t2.snap_idx_blks_read - coalesce(t1.snap_idx_blks_read, 0)) as idx_blks_read,
(t2.snap_idx_blks_hit - coalesce(t1.snap_idx_blks_hit, 0)) as idx_blks_hit
from
(select * from snapshot.snap_global_statio_all_indexes
where snapshot_id = %ld
and snap_node_name = '%s'
and snap_schemaname NOT IN ('pg_catalog', 'information_schema', 'snapshot')
and snap_schemaname !~ '^pg_toast') t1
right join
(select * from snapshot.snap_global_statio_all_indexes
where snapshot_id = %ld
and snap_node_name = '%s'
and snap_schemaname NOT IN ('pg_catalog', 'information_schema', 'snapshot')
AND snap_schemaname !~ '^pg_toast') t2
on t1.snap_relid = t2.snap_relid
and t2.snap_indexrelid = t1.snap_indexrelid
and t2.db_name = t1.db_name and t2.snap_schemaname = t1.snap_schemaname) as idx_io
on table_io.db_name = idx_io.db_name
and table_io.snap_schemaname = idx_io.snap_schemaname
and table_io.snap_relname = idx_io.snap_relname
order by "%%Heap Blks Hit Ratio"
asc limit 200;
User index IO activity ordered by idx blks hit ratio
表 2 User index IO activity ordered by idx blks hit ratio报表主要内容
根据索引缓存命中率,统计用户索引 IO 活动信息
数据来源于 snapshot.snap_global_statio_all_indexes 表。
相关代码:
-- 说明:%s代表node_name,%ld代表snapshot_id
select t2.db_name as "DB Name",
t2.snap_schemaname as "Schema Name",
t2.snap_relname as "Table Name",
t2.snap_indexrelname as "Index Name",
(case
when ((t2.snap_idx_blks_read - coalesce(t1.snap_idx_blks_read, 0)) + (t2.snap_idx_blks_hit - coalesce(t1.snap_idx_blks_hit, 0))) = 0 then 0
else
round((t2.snap_idx_blks_hit - coalesce(t1.snap_idx_blks_hit, 0))/((t2.snap_idx_blks_hit - coalesce(t1.snap_idx_blks_hit, 0)) +
(t2.snap_idx_blks_read - coalesce(t1.snap_idx_blks_read, 0))) * 100, 2)
end) as "%Idx Blks Hit Ratio",
(t2.snap_idx_blks_read - coalesce(t1.snap_idx_blks_read, 0)) as "Idx Blks Read",
(t2.snap_idx_blks_hit - coalesce(t1.snap_idx_blks_hit, 0)) as "Idx Blks Hit"
from
(select * from snapshot.snap_global_statio_all_indexes
where snapshot_id = %ld
and snap_node_name = '%s'
and snap_schemaname NOT IN ('pg_catalog', 'information_schema', 'snapshot')
and snap_schemaname !~ '^pg_toast') t1
right join
(select * from snapshot.snap_global_statio_all_indexes
where snapshot_id = %ld
and snap_node_name = '%s'
and snap_schemaname NOT IN ('pg_catalog', 'information_schema', 'snapshot')
and snap_schemaname !~ '^pg_toast') t2
on t1.snap_relid = t2.snap_relid
and t2.snap_indexrelid = t1.snap_indexrelid
and t2.db_name = t1.db_name
and t2.snap_schemaname = t1.snap_schemaname
order by "%Idx Blks Hit Ratio"
asc limit 200;
Object Stats
- 表、索引维度的性能统计信息。
- 数据库、节点范围报表,cluster模式和node模式下均可查看此报表。
Object stats包含User Tables stats、User index stats和Bad lock stats三张表,列名称及描述如下所示。
User Tables Stats
表 1 User Tables stats报表主要内容
描述用户表状态的统计信息
数据源于 snapshot.snap_global_stat_all_tables 表。
相关代码:
-- 说明:%s代表node_name,%ld代表snapshot_id
SELECT snap_2.db_name as "DB Name",
snap_2.snap_schemaname as "Schema",
snap_2.snap_relname as "Relname",
(snap_2.snap_seq_scan - coalesce(snap_1.snap_seq_scan, 0)) as "Seq Scan",
(snap_2.snap_seq_tup_read - coalesce(snap_1.snap_seq_tup_read, 0)) as "Seq Tup Read",
(snap_2.snap_idx_scan - coalesce(snap_1.snap_idx_scan, 0)) as "Index Scan",
(snap_2.snap_idx_tup_fetch - coalesce(snap_1.snap_idx_tup_fetch, 0)) as "Index Tup Fetch",
(snap_2.snap_n_tup_ins - coalesce(snap_1.snap_n_tup_ins, 0)) as "Tuple Insert",
(snap_2.snap_n_tup_upd - coalesce(snap_1.snap_n_tup_upd, 0)) as "Tuple Update",
(snap_2.snap_n_tup_del - coalesce(snap_1.snap_n_tup_del, 0)) as "Tuple Delete",
(snap_2.snap_n_tup_hot_upd - coalesce(snap_1.snap_n_tup_hot_upd, 0)) as "Tuple Hot Update",
snap_2.snap_n_live_tup as "Live Tuple",
snap_2.snap_n_dead_tup as "Dead Tuple",
to_char(snap_2.snap_last_vacuum, 'YYYY-MM-DD HH24:MI:SS') as "Last Vacuum",
to_char(snap_2.snap_last_autovacuum, 'YYYY-MM-DD HH24:MI:SS') as "Last Autovacuum",
to_char(snap_2.snap_last_analyze, 'YYYY-MM-DD HH24:MI:SS') as "Last Analyze",
to_char(snap_2.snap_last_autoanalyze, 'YYYY-MM-DD HH24:MI:SS') as "Last Autoanalyze",
(snap_2.snap_vacuum_count - coalesce(snap_1.snap_vacuum_count, 0)) as "Vacuum Count",
(snap_2.snap_autovacuum_count - coalesce(snap_1.snap_autovacuum_count, 0)) as "Autovacuum Count",
(snap_2.snap_analyze_count - coalesce(snap_1.snap_analyze_count, 0)) as "Analyze Count",
(snap_2.snap_autoanalyze_count - coalesce(snap_1.snap_autoanalyze_count, 0)) as "Autoanalyze Count"
FROM
(SELECT * FROM snapshot.snap_global_stat_all_tables
WHERE snapshot_id = %ld
and snap_node_name = '%s'
and snap_schemaname NOT IN ('pg_catalog', 'information_schema', 'snapshot')
AND snap_schemaname !~ '^pg_toast') snap_2
LEFT JOIN
(SELECT * FROM snapshot.snap_global_stat_all_tables
WHERE snapshot_id = %ld
and snap_node_name = '%s'
and snap_schemaname NOT IN ('pg_catalog', 'information_schema', 'snapshot')
AND snap_schemaname !~ '^pg_toast') snap_1
ON snap_2.snap_relid = snap_1.snap_relid
AND snap_2.snap_schemaname = snap_1.snap_schemaname
AND snap_2.snap_relname = snap_1.snap_relname
AND snap_2.db_name = snap_1.db_name
order by snap_2.db_name, snap_2.snap_schemaname
limit 200;
User index stats
表 2 User index stats报表主要内容
用户索引状态的统计信息
数据源于 snapshot.snap_global_stat_all_indexes 表。
相关代码:
-- 说明:%s代表node_name,%ld代表snapshot_id
SELECT snap_2.db_name as "DB Name",
snap_2.snap_schemaname as "Schema",
snap_2.snap_relname as "Relname",
snap_2.snap_indexrelname as "Index Relname",
(snap_2.snap_idx_scan - coalesce(snap_1.snap_idx_scan, 0)) as "Index Scan",
(snap_2.snap_idx_tup_read - coalesce(snap_1.snap_idx_tup_read, 0)) as "Index Tuple Read",
(snap_2.snap_idx_tup_fetch - coalesce(snap_1.snap_idx_tup_fetch, 0)) as "Index Tuple Fetch"
FROM
(SELECT * FROM snapshot.snap_global_stat_all_indexes
WHERE snapshot_id = %ld
and snap_node_name = '%s'
and snap_schemaname NOT IN ('pg_catalog', 'information_schema', 'snapshot')
and snap_schemaname !~ '^pg_toast') snap_2
LEFT JOIN
(SELECT * FROM snapshot.snap_global_stat_all_indexes
WHERE snapshot_id = %ld
and snap_node_name = '%s'
and snap_schemaname NOT IN ('pg_catalog', 'information_schema', 'snapshot')
and snap_schemaname !~ '^pg_toast') snap_1
ON snap_2.snap_relid = snap_1.snap_relid
and snap_2.snap_indexrelid = snap_1.snap_indexrelid
and snap_2.snap_schemaname = snap_1.snap_schemaname
and snap_2.snap_relname = snap_1.snap_relname
and snap_2.snap_indexrelname = snap_1.snap_indexrelname
and snap_2.db_name = snap_1.db_name
order by snap_2.db_name, snap_2.snap_schemaname
limit 200;
Bad lock stats
表 3 Bad lock stats报表主要内容
描述坏块的统计信息
数据源于 snapshot.snap_global_stat_bad_block 表。
相关代码:
-- 说明:%s代表node_name,%ld代表snapshot_id
SELECT snap_2.snap_databaseid AS "DB Id",
snap_2.snap_tablespaceid AS "Tablespace Id",
snap_2.snap_relfilenode AS "Relfilenode",
snap_2.snap_forknum AS "Fork Number",
(snap_2.snap_error_count - coalesce(snap_1.snap_error_count, 0)) AS "Error Count",
snap_2.snap_first_time AS "First Time",
snap_2.snap_last_time AS "Last Time"
FROM
(SELECT * FROM snapshot.snap_global_stat_bad_block
WHERE snapshot_id = %ld
and snap_node_name = '%s') snap_2
LEFT JOIN
(SELECT * FROM snapshot.snap_global_stat_bad_block
WHERE snapshot_id = %ld
and snap_node_name = '%s') snap_1
ON snap_2.snap_databaseid = snap_1.snap_databaseid
and snap_2.snap_tablespaceid = snap_1.snap_tablespaceid
and snap_2.snap_relfilenode = snap_1.snap_relfilenode
limit 200;
Utility status
- 复制槽和后台checkpoint的状态信息。
- 节点范围报表,仅node模式下可查看此报表。
Utility status包含Replication slot和Replication stat两张表,列名称及描述如下所示。
Replication slot
表 1 Replication slot报表主要内容
描述的是复制槽的相关信息
解读:[本次实验环境是单机,没有复制槽数据]
这一部分描述的是复制槽的相关信息。数据来源于:snapshot.snap_global_replication_slots 表。
相关代码:
-- 说明:%s代表node_name,%ld代表snapshot_id
SELECT snap_slot_name as "Slot Name",
snap_slot_type as "Slot Type",
snap_database as "DB Name",
snap_active as "Active",
snap_x_min as "Xmin",
snap_restart_lsn as "Restart Lsn",
snap_dummy_standby as "Dummy Standby"
FROM snapshot.snap_global_replication_slots
WHERE snapshot_id = %ld
and snap_node_name = '%s'
limit 200;
Replication stat
表 2 Replication stat报表主要内容
描述事务槽详细的状态信息
解读:[本次实验环境是单机,没有复制槽数据]
这一部分描述事务槽详细的状态信息,数据源于 snapshot.snap_global_replication_stat 表。
相关代码:
-- 说明:%s代表node_name,%ld代表snapshot_id
SELECT snap_pid as "Thread Id",
snap_usesysid as "Usesys Id",
snap_usename as "Usename",
snap_application_name as "Application Name",
snap_client_addr as "Client Addr",
snap_client_hostname as "Client Hostname",
snap_client_port as "Client Port",
snap_backend_start as "Backend Start",
snap_state as "State",
snap_sender_sent_location as "Sender Sent Location",
snap_receiver_write_location as "Receiver Write Location",
snap_receiver_flush_location as "Receiver Flush Location",
snap_receiver_replay_location as "Receiver Replay Location",
snap_sync_priority as "Sync Priority",
snap_sync_state as "Sync State"
FROM snapshot.snap_global_replication_stat
WHERE snapshot_id = %ld
and snap_node_name = '%s' limit 200;
SQL Detail
- SQL语句文本详情。
- 数据库、节点范围报表,cluster模式和node模式下均可查看此报表。
SQL Detail报表主要内容
数据来源于 snapshot.snap_summary_statement 表。
相关代码:
-- 说明:%s代表node_name,%ld代表snapshot_id
select (t2.snap_unique_sql_id) as "Unique SQL Id",
(t2.snap_query) as "SQL Text"
from
snapshot.snap_summary_statement t2
where snapshot_id = %ld
and snap_node_name = '%s';
Configuration settings
- 节点配置。
- 节点范围报表,仅node模式下可查看此报表。
Configuration settings报表主要内容
数据源于 snapshot.snap_global_config_settings 表。
信息内容如下所示:
相关代码:
-- 说明:%s代表node_name,%ld代表snapshot_id
select snap_name as "Name",
snap_short_desc as "Abstract",
snap_vartype as "Type",
snap_setting as "Curent Value",
snap_min_val as "Min Value",
snap_max_val as "Max Value",
snap_category as "Category",
snap_enumvals as "Enum Values",
snap_boot_val as "Default Value",
snap_reset_val as "Reset Value"
FROM
snapshot.snap_global_config_settings
WHERE snapshot_id = %ld
and snap_node_name = '%s';
注意:
用于生成报告的两个快照应满足以下条件:
- 两次快照之间不能有节点重启。
- 两次快照之间不能有主备倒换。
- 两次快照之间不能对性能指标进行reset操作。
- 两次快照之间不能有drop database操作。
- 生成的WDR中如果存在负数时,说明该指标不能反映数据库的表现。
- 生成报告的时间与性能快照中的性能数据量有关系,一般在分钟级可以完成。如果超过5分钟没有完成,请尝试收集snapshot schema下的表(首先考虑snap_global_statio_all_tables,snap_global_statio_all_indexes)的统计信息ANALYZE | ANALYSE,然后再次运行报告生成。或者设置会话级语句超时时间set statement_timeout=*,主动终止报告生成。
- 生成报告时,尽量设置客户端的字符集与GaussDB数据库的字符集保持一致(可以通过set client_encoding to *去设置客户端字符集)。
参考: