PG高CPU查询语句收集

最近看了一个视频,介绍相关的SQL语句,记录如下

Troubleshooting high cpu utilization in PostgreSQL databases
High CPU usage is often connected to
-long running times
-poor performance
-unexcepted crashes
-bad end-customer experience
-limited scalability of app
-wrong architecture
-unplanned and uncontrolled increase in cost

目录

DB Connections

Distribution of non-idle connections per database

Distribution of non-idle connections per database and per query

Non-idle connections detailed

The PostgreSQL parameter statement_timeout

Check connections patterns in The PostgreSQL DB log file(s)

Running frequent SQL queries

Check execution plan of the query/queries

The PostgreSQL parameter random_page_cost

Invalid indexes

Reusing cached prepared statement slow

PostgreSQL DB Server CPU distribution by queries/databases

PostgreSQL DB statistics

PostgreSQL DB bloat

Proactive PostgreSQL DB Scanner


DB Connections

One of the patterns of PostgreSQL DB leading to high CPU utilizaiton is a high number of active connections.
"Connection Storm" is a very painful pattern for DB 
The following SQL query lists the 
-total number of connections.
-number of non-idle connections.
-number of maximum available connections.
-connections utilization percentage

select 
  A.total_connections,
  A.non_idle_connections,
  B.max_connections,
  round((100 * A.total_connections::numeric / B.max_connections::numeric),
  2) connections_utilization_pctg 
from 
  (select count(1) as total_connections, sum(case when state!='idle' then 1 
  else 0 end) as non_idle_connections from pg_stat_activity) A,
  (select setting as max_connections from pg_settings where 
  name='max_connections') B;

Here's an example of the SQL output:
total_connections | non_idle_connections | max_connections | conn_utilization_pctg
    3457                          3            9057                38.17

To tune the parameter max_connections,
and other key PostgreSQL DB instance parameters 
I recommand using free online tools, like, for example,
PostgreSQL configuration builder at:
https://www.pgconfig.org

Distribution of non-idle connections per database

The following query allows to check 
the distribution of non-idle connections per databases,
sorted in descending order:

select datname as db_name, count(1) as num_non_idle_connections 
from pg_stat_activity 
where state!='idle' 
group by 1 
order by 2 desc;

Here's an exmaple of the SQL output:
db_name    num_non_idle_connections
my_db_1    133
my_db_2      6
my_db_3      3

Distribution of non-idle connections per database and per query

The following sql checks the distribution of non-idle connections 
per database and per query, sorted in descending order 
as seen in the example below:

select datname as db_name, query, count(1) as num_non_idle_connections 
from pg_stat_activity 
where state!='idel' 
group by 1,2 
order by 3 desc;

Here's an example of the SQL output
db_name   |   short_query   |  num_non_idle_connections
db_1        select * from table_1   40
db_1        select * from table_2    1

Non-idle connections detailed

The following query lists non-idle PostgreSQL sessions 
that take more than 5 seconds, The result is sorted by the runtime 
in descending order:

select 
  now()-query_start as runtime,
  pid as process_id,
  datname as db_name, client_addr, client_hostname,
  query 
from pg_stat_activity 
where state='idle' 
and now() - query_start > '5 seconds'::interval 
order by 1 desc;

Here's an example of the SQL output:
runtime  |  process_id  |  db_name  |  client_addr  |  client_hostname  | query
00:12:34       7770         db_1      192.168.12.208                      select

For each long-running query,the resultset contains the corresponding 
runtime,process id,database name,client address,and hostname.

In case the query runs too long,
causing a high load on the DB CPU and other resources,
you may want to terminate it explicitly.
To terminate a PostgreSQL DB sesiion by <process id> 
run the following command:

select pg_terminate_backend(<process_id>);

The PostgreSQL parameter statement_timeout

The statement_timeout parameter in Postgres 
sets the maximum allowed duration in milliseconds(ms) 
for any statement.

The statement_timeout parameter default value is 0,
meaning no timeout.

I recommend to set a default value to 30 seconds 
at the role/session level to defend PostgreSQL DB Server 
from long-running queries.
 

Check connections patterns in The PostgreSQL DB log file(s)

Let's check if there is a "connection storming" pattern,
when a lot of connections arriving at the database at the specific time.
This can be the root cause for a high CPU utilization of PostgreSQL 
Database Server.

Let's consider,for example,
Azure Database for PostgreSQL Server log file(s).

I will demonstrate how 
Azure Database for PostgreSQL Server log file(s) 
can be parsed using AWK.

AWK is a text-processing utility.It is very powerful 
and uses a simple programming language.

How to list top DB users by "connection authorized" text pattern 
occurrences sorted in descending order?

grep "connection authorized" postgresql-2023-02-01_100000.log | awk '{
printf("%s %s\n", substr($2,1,5), substr($6,index($6,"user=")+5, index($6,
"database=")-6)); } ' | awk '{count[$2]++} END {for (word in count) print 
word, count[word]}'| sort -k 2,2 -n -r

db_user_1 1234
db_user_2 98

From this output we can conclude that the DB user db_user1 did themost of connections to the DB.
And it can be a good candidate to investigate high CPU utilization on PostgreSQL DB Server.
 

Running frequent SQL queries

The root cause of high CPU utilization in PostgreSQL databases 
may not be a necessary long-running query.

Quick, but too fequest queries running hundreds of times per 
second can cause high CPU utilization too.

To find the top frequent PostgreSQL queries 
run the following sql as shown in the example below:

with 
a as (select dbid, queryid, query, calls s from pg_stat_statements),
b as (select dbid, queryid, query, calls s from pg_stat_statements,
pg_sleep(1)) 
select 
  pd.datname as db_name,
  substr(a.query, 1, 400) as the_query,
  sum(b.s-a.s) as runs_per_second 
from a, b, pg_database pd 
where
  a.dbid=b.dbid and a.queryid=b.queryid and pd.oid=a.dbid 
group by 1, 2
order by 3 desc;

Here's an example of the output:
db_name  |  the_query  |  runs_per_second
db_1        select ...     10
db_1        insert into ... 2

This resultset includes a list of queries 
and corresponding frequency:
how many times each query runs per second.

Recommandation:
Examine SQL queries and corresponding application logic.
Try to improve the application architecture to use caching mechanisms.
This will help to prevent the running SQL queries too frequently 
on the PostgreSQL Database Server.
 

Check execution plan of the query/queries

postgres=# explain (buffers, analyze) select count(1) from my_table where 
abs(a)=22988504;

postgres=# explain (buffers, analyze) select * from some_table;
 

The PostgreSQL parameter random_page_cost

Reducing the parameter random_page_cost value 
relative to the parameter seq_page_cost 
will cause the system to prefer index scans.

Default values:

postgres=# show random_page_cost;
4
postgres=# show seq_page_cost;
1

Recommendation:
In case of use SSD storage,
lower the parameter random_page_cost to 1

Invalid indexes

One of the reasons of high CPU utilization 
may be missing and invalid indexes.

postgres=# \d tab
  Table "public.tab"
Column | Type | Collation | Nullable | Default
col    | integer |
Indexes:
  "idx" btree(col) INVALID

The following query checks invalid indexes:

select pg_class.relname 
from pg_class, pg_index 
where pg_index.indisvalid = false 
AND pg_index.indexrelid = pg_class.oid;

In case invalid indexes found they should be rebuild / recreated.

Reusing cached prepared statement slow

In case accessing PostgreSQL via the JDBC driver 
using a prepared statements cache 
it can be performance degradation and high CPU utilization.

During the first 5 executions of the prepared statement 
PostgreSQL engine is using good execution plan.

Starting from the 6th execution 
it will use generic execution plan,
which will lead to performance degradation 
and high CPU utilization effect.

Recommendation:
Disable prepared statements caching.

PostgreSQL DB Server CPU distribution by queries/databases

The following sql checks 
how much each query in each database 
uses the CPU.

It provides a resultset 
sorted by the most CPU-intensive queries 
in descending order

select 
  pss.userid, pss.dbid, pd.datname as db_name,
  round((pss.total_exec_time + pss.total_plan_time)::numeric,2) as total_time,
  pss.calls,
  round((pss.mean_exec_time + pss.mean_plan_time)::numeric,2) as mean,
  round((100 * (pss.total_exec_time + pss.total_plan_time) /
  sum((pss.total_exec_time+pss.total_plan_time)::numeric) OVER ())::numeric,2) 
  as cpu_portion_pctg,pss.query 
from pg_stat_statements pss, pg_database pd 
where pd.oid=pss.dbid 
order by (pss.total_exec_time + pss.total_plan_time) 
desc limit 30;

Here's an example of the SQL output:

db_name |  total_time  | calls  | mean  |  cpu_pctg |  short_query
db_1     27349172.12     3905898     7.00  25.15        select ... from table_1
db_1       391744.00         105  3731.00  16.76        select ... from table_2

From the output example,you can see 
the first query (i.e.,select ... from table_1)
takes the most portion of the CPU,
because of a high number of calls.

I recommend looking at how often 
the application is running queries 
connected to this PostgreSQL DB.

Also, from the output example,you can see 
a high mean time of the second query(i.e., select ... from table_2),
which exceeds three seconds.

PostgreSQL DB statistics

Outdated PostgreSQL statistics 
can be another root cause for high CPU utilization.

When statistical data isn't updated,
the PostgreSQL query planner may generate 
non-efficient execution plans for queries,
which will lead to a bad performance 
of the entire PostgreSQL DB server.

select 
  schemaname, relname,
  DATE_TRUNC('minute', last_analyze) last_analyze,
  DATE_TRUNC('minute', last_autoanalyze) last_autoanalyze 
from 
  pg_stat_all_tables 
where 
  schemaname = 'public' 
order by 
  last_analyze desc NULLS FIRST,
  last_autoanalyze desc NULLS FIRST;

To check the last date and time the statistics were updated 
for each table in the PostgreSQL DB Server 
for a specific DB,
connect to the DB and run the following query:

Recommendation:
Ensure tables are analyzed regularly.

To collect statistics manually for a specific table 
and its associated indexes 
run the command:

ANALYZE <table_name>

PostgreSQL DB bloat

In cases of intensive data updates,
both with frequent UPDATE and with INSERT/DELETE operations 
PostgreSQL tables and their indices become bloated.

Bloat refers to disk space 
that was allocated by a table or index 
and is now available for reuse by the database,
but has not been reclaimed.

Because of this bloat, the performance of the PostgreSQL DB Server is 
degraded, which can lead to high CPU utilization scenarios.

Under normal PostgreSQL operations,
tuples(rows) that are deleted or stale because of an update 
aren't physically removed from the table--
they're stored there until the VACUUM command is issued.

VACUUM releases the space occupied by "dead" tuples.
It's necessary to perform a VACUUM periodically,
especially for tables that change often.

To check information about dead tuples,
and when vacuum / autovacuum was run 
for each table for a specific DB,
connect to the DB and run the following query:

select 
  schemaname, relname, n_tup_ins, n_tup_upd, n_tup_del, n_live_tup,
  n_dead_tup,
  DATE_TRUNC('minute', last_vacuum) last_vacuum,
  DATE_TRUNC('minute', last_autovacuum) last_autovacuum 
from 
  pg_stat_all_tables 
where 
  schemaname = 'public' 
order by 
  n_dead_tup desc;

Here's an example of the SQL output:
table_1  30237555 41784024  26858 30184398 142226

It indicates the number of inserted, updated, deleted and live tuples per table 
since the last time the analyze process was performed.
It indicates the last date and time 
the vacuum/autovacuum process was performed.

Recommendation.
Ensure tables are vacuumed regularly.
To run VACUUM(regular,not FULL)
for a specific table and all its associated indexes 
run the command:

VACUUM <table_name>;

VACUUM can be run by an autovacuum process 
together with statistics collection.

Autovacuum is a background process 
that executes VACUUM and ANALYZE commands automatically.

To collect DB statistics and to do vacuum (regular, not FULL)
on all the objects of all the dbs
run the following command:

vacuumdb -h <db_port> -p <db_port> -U <db_user> -j 4 -z -v -a

Proactive PostgreSQL DB Scanner

On mission-critical systems, it is important to be proactive 
in preventing the situation that the key parameters of the system,
including CPU utilization reach maximum values 
that directly impact performance, stability, and cost.

Like in medicine, we want to find a cure 
as soon as the first symptoms appear.

I'd like to consider an approach that makes it easy to 
- identify queries that use the system inefficiently
- help to find a root cause for the performance issues
- assist to understand typical workload patterns and performance bottlenecks
The found patterns and queries can be improved and the system will work efficiently.

The Proactive PostgreSQL DB Performance Scanner 
is a script that connects to a database 
and runs a set of probes that can be extended if desired.

All the probes are queries to a database,
that are unified by structure.

It includes:
- the threshold value
- description of the check
- the issue this check is associated
- recommendation on how to Troubleshoot the issue
- SQL query to perform the check
- an additional optional SQL query
in case there is a need for more evidence

Example of how to run the Proactive PostgreSQL DB Performance Scanner:

proactive_pg_db_performance_scanner.sh -h db_host -p 5432 -U postgres -d postgres

相关代码在github上提供了,作者真是个好人
The source code of the Proactive PostgreSQL DB Scanner:

https://github.com/dm8ry/proactive-postgresql-db-performance-scanner

能看到最后的人才能看到这个工具:)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值