PG高CPU查询语句收集

张某人的胡思乱想

已于 2024-10-11 21:23:32 修改

阅读量896

点赞数 9

分类专栏： database 文章标签： database

于 2024-10-11 13:07:10 首次发布

本文链接：https://blog.youkuaiyun.com/hb_zxl/article/details/142851402

版权

database 专栏收录该内容

10 篇文章

订阅专栏

最近看了一个视频，介绍相关的SQL语句，记录如下

Troubleshooting high cpu utilization in PostgreSQL databases
High CPU usage is often connected to
-long running times
-poor performance
-unexcepted crashes
-bad end-customer experience
-limited scalability of app
-wrong architecture
-unplanned and uncontrolled increase in cost

DB Connections

Distribution of non-idle connections per database

Distribution of non-idle connections per database and per query

Non-idle connections detailed

The PostgreSQL parameter statement_timeout

Check connections patterns in The PostgreSQL DB log file(s)

Running frequent SQL queries

Check execution plan of the query/queries

The PostgreSQL parameter random_page_cost

Invalid indexes

Reusing cached prepared statement slow

PostgreSQL DB Server CPU distribution by queries/databases

PostgreSQL DB statistics

PostgreSQL DB bloat

Proactive PostgreSQL DB Scanner

DB Connections

One of the patterns of PostgreSQL DB leading to high CPU utilizaiton is a high number of active connections.
"Connection Storm" is a very painful pattern for DB
The following SQL query lists the
-total number of connections.
-number of non-idle connections.
-number of maximum available connections.
-connections utilization percentage

select 
  A.total_connections,
  A.non_idle_connections,
  B.max_connections,
  round((100 * A.total_connections::numeric / B.max_connections::numeric),
  2) connections_utilization_pctg 
from 
  (select count(1) as total_connections, sum(case when state!='idle' then 1 
  else 0 end) as non_idle_connections from pg_stat_activity) A,
  (select setting as max_connections from pg_settings where 
  name='max_connections') B;

Here's an example of the SQL output:
total_connections | non_idle_connections | max_connections | conn_utilization_pctg
3457 3 9057 38.17

To tune the parameter max_connections,
and other key PostgreSQL DB instance parameters
I recommand using free online tools, like, for example,
PostgreSQL configuration builder at:
https://www.pgconfig.org

Distribution of non-idle connections per database

The following query allows to check
the distribution of non-idle connections per databases,
sorted in descending order:

select datname as db_name, count(1) as num_non_idle_connections 
from pg_stat_activity 
where state!='idle' 
group by 1 
order by 2 desc;

Here's an exmaple of the SQL output:
db_name num_non_idle_connections
my_db_1 133
my_db_2 6
my_db_3 3

Distribution of non-idle connections per database and per query

The following sql checks the distribution of non-idle connections
per database and per query, sorted in descending order
as seen in the example below:

select datname as db_name, query, count(1) as num_non_idle_connections 
from pg_stat_activity 
where state!='idel' 
group by 1,2 
order by 3 desc;

Here's an example of the SQL output
db_name | short_query | num_non_idle_connections
db_1 select * from table_1 40
db_1 select * from table_2 1

Non-idle connections detailed

The following query lists non-idle PostgreSQL sessions
that take more than 5 seconds, The result is sorted by the runtime
in descending order:

select 
  now()-query_start as runtime,
  pid as process_id,
  datname as db_name, client_addr, client_hostname,
  query 
from pg_stat_activity 
where state='idle' 
and now() - query_start > '5 seconds'::interval 
order by 1 desc;

For each long-running query,the resultset contains the corresponding
runtime,process id,database name,client address,and hostname.

In case the query runs too long,
causing a high load on the DB CPU and other resources,
you may want to terminate it explicitly.
To terminate a PostgreSQL DB sesiion by <process id>
run the following command:

select pg_terminate_backend(<process_id>);

The PostgreSQL parameter statement_timeout

The statement_timeout parameter in Postgres
sets the maximum allowed duration in milliseconds(ms)
for any statement.

The statement_timeout parameter default value is 0,
meaning no timeout.

I recommend to set a default value to 30 seconds
at the role/session level to defend PostgreSQL DB Server
from long-running queries.

Check connections patterns in The PostgreSQL DB log file(s)

Let's check if there is a "connection storming" pattern,
when a lot of connections arriving at the database at the specific time.
This can be the root cause for a high CPU utilization of PostgreSQL
Database Server.

Let's consider,for example,
Azure Database for PostgreSQL Server log file(s).

I will demonstrate how
Azure Database for PostgreSQL Server log file(s)
can be parsed using AWK.

AWK is a text-processing utility.It is very powerful
and uses a simple programming language.

How to list top DB users by "connection authorized" text pattern
occurrences sorted in descending order?

grep "connection authorized" postgresql-2023-02-01_100000.log | awk '{
printf("%s %s\n", substr($2,1,5), substr($6,index($6,"user=")+5, index($6,
"database=")-6)); } ' | awk '{count[$2]++} END {for (word in count) print 
word, count[word]}'| sort -k 2,2 -n -r

db_user_1 1234
db_user_2 98

From this output we can conclude that the DB user db_user1 did themost of connections to the DB.
And it can be a good candidate to investigate high CPU utilization on PostgreSQL DB Server.

Running frequent SQL queries

The root cause of high CPU utilization in PostgreSQL databases
may not be a necessary long-running query.

Quick, but too fequest queries running hundreds of times per
second can cause high CPU utilization too.

To find the top frequent PostgreSQL queries
run the following sql as shown in the example below:

with 
a as (select dbid, queryid, query, calls s from pg_stat_statements),
b as (select dbid, queryid, query, calls s from pg_stat_statements,
pg_sleep(1)) 
select 
  pd.datname as db_name,
  substr(a.query, 1, 400) as the_query,
  sum(b.s-a.s) as runs_per_second 
from a, b, pg_database pd 
where
  a.dbid=b.dbid and a.queryid=b.queryid and pd.oid=a.dbid 
group by 1, 2
order by 3 desc;

Here's an example of the output:
db_name | the_query | runs_per_second
db_1 select ... 10
db_1 insert into ... 2

This resultset includes a list of queries
and corresponding frequency:
how many times each query runs per second.

Recommandation:
Examine SQL queries and corresponding application logic.
Try to improve the application architecture to use caching mechanisms.
This will help to prevent the running SQL queries too frequently
on the PostgreSQL Database Server.

Check execution plan of the query/queries

postgres=# explain (buffers, analyze) select count(1) from my_table where
abs(a)=22988504;

postgres=# explain (buffers, analyze) select * from some_table;

The PostgreSQL parameter random_page_cost

Reducing the parameter random_page_cost value
relative to the parameter seq_page_cost
will cause the system to prefer index scans.

Default values:

postgres=# show random_page_cost;
4
postgres=# show seq_page_cost;
1

Recommendation:
In case of use SSD storage,
lower the parameter random_page_cost to 1

Invalid indexes

One of the reasons of high CPU utilization
may be missing and invalid indexes.

The following query checks invalid indexes:

select pg_class.relname 
from pg_class, pg_index 
where pg_index.indisvalid = false 
AND pg_index.indexrelid = pg_class.oid;

In case invalid indexes found they should be rebuild / recreated.

Reusing cached prepared statement slow

In case accessing PostgreSQL via the JDBC driver
using a prepared statements cache
it can be performance degradation and high CPU utilization.

During the first 5 executions of the prepared statement
PostgreSQL engine is using good execution plan.

Starting from the 6th execution
it will use generic execution plan,
which will lead to performance degradation
and high CPU utilization effect.

Recommendation:
Disable prepared statements caching.

PostgreSQL DB Server CPU distribution by queries/databases

The following sql checks
how much each query in each database
uses the CPU.

It provides a resultset
sorted by the most CPU-intensive queries
in descending order

select 
  pss.userid, pss.dbid, pd.datname as db_name,
  round((pss.total_exec_time + pss.total_plan_time)::numeric,2) as total_time,
  pss.calls,
  round((pss.mean_exec_time + pss.mean_plan_time)::numeric,2) as mean,
  round((100 * (pss.total_exec_time + pss.total_plan_time) /
  sum((pss.total_exec_time+pss.total_plan_time)::numeric) OVER ())::numeric,2) 
  as cpu_portion_pctg,pss.query 
from pg_stat_statements pss, pg_database pd 
where pd.oid=pss.dbid 
order by (pss.total_exec_time + pss.total_plan_time) 
desc limit 30;

Here's an example of the SQL output:

From the output example,you can see
the first query (i.e.,select ... from table_1)
takes the most portion of the CPU,
because of a high number of calls.

I recommend looking at how often
the application is running queries
connected to this PostgreSQL DB.

Also, from the output example,you can see
a high mean time of the second query(i.e., select ... from table_2),
which exceeds three seconds.

PostgreSQL DB statistics

Outdated PostgreSQL statistics
can be another root cause for high CPU utilization.

When statistical data isn't updated,
the PostgreSQL query planner may generate
non-efficient execution plans for queries,
which will lead to a bad performance
of the entire PostgreSQL DB server.

select 
  schemaname, relname,
  DATE_TRUNC('minute', last_analyze) last_analyze,
  DATE_TRUNC('minute', last_autoanalyze) last_autoanalyze 
from 
  pg_stat_all_tables 
where 
  schemaname = 'public' 
order by 
  last_analyze desc NULLS FIRST,
  last_autoanalyze desc NULLS FIRST;

To check the last date and time the statistics were updated
for each table in the PostgreSQL DB Server
for a specific DB,
connect to the DB and run the following query:

Recommendation：
Ensure tables are analyzed regularly.

To collect statistics manually for a specific table
and its associated indexes
run the command:

ANALYZE <table_name>

PostgreSQL DB bloat

In cases of intensive data updates,
both with frequent UPDATE and with INSERT/DELETE operations
PostgreSQL tables and their indices become bloated.

Bloat refers to disk space
that was allocated by a table or index
and is now available for reuse by the database,
but has not been reclaimed.

Because of this bloat, the performance of the PostgreSQL DB Server is
degraded, which can lead to high CPU utilization scenarios.

Under normal PostgreSQL operations,
tuples(rows) that are deleted or stale because of an update
aren't physically removed from the table--
they're stored there until the VACUUM command is issued.

VACUUM releases the space occupied by "dead" tuples.
It's necessary to perform a VACUUM periodically,
especially for tables that change often.

To check information about dead tuples,
and when vacuum / autovacuum was run
for each table for a specific DB,
connect to the DB and run the following query:

select 
  schemaname, relname, n_tup_ins, n_tup_upd, n_tup_del, n_live_tup,
  n_dead_tup,
  DATE_TRUNC('minute', last_vacuum) last_vacuum,
  DATE_TRUNC('minute', last_autovacuum) last_autovacuum 
from 
  pg_stat_all_tables 
where 
  schemaname = 'public' 
order by 
  n_dead_tup desc;

Here's an example of the SQL output:
table_1 30237555 41784024 26858 30184398 142226

It indicates the number of inserted, updated, deleted and live tuples per table
since the last time the analyze process was performed.
It indicates the last date and time
the vacuum/autovacuum process was performed.

Recommendation.
Ensure tables are vacuumed regularly.
To run VACUUM(regular,not FULL)
for a specific table and all its associated indexes
run the command:

VACUUM <table_name>;

VACUUM can be run by an autovacuum process
together with statistics collection.

Autovacuum is a background process
that executes VACUUM and ANALYZE commands automatically.

To collect DB statistics and to do vacuum (regular, not FULL)
on all the objects of all the dbs
run the following command:

vacuumdb -h <db_port> -p <db_port> -U <db_user> -j 4 -z -v -a

Proactive PostgreSQL DB Scanner

On mission-critical systems, it is important to be proactive
in preventing the situation that the key parameters of the system,
including CPU utilization reach maximum values
that directly impact performance, stability, and cost.

Like in medicine, we want to find a cure
as soon as the first symptoms appear.

I'd like to consider an approach that makes it easy to
- identify queries that use the system inefficiently
- help to find a root cause for the performance issues
- assist to understand typical workload patterns and performance bottlenecks
The found patterns and queries can be improved and the system will work efficiently.

The Proactive PostgreSQL DB Performance Scanner
is a script that connects to a database
and runs a set of probes that can be extended if desired.

All the probes are queries to a database,
that are unified by structure.

It includes:
- the threshold value
- description of the check
- the issue this check is associated
- recommendation on how to Troubleshoot the issue
- SQL query to perform the check
- an additional optional SQL query
in case there is a need for more evidence

Example of how to run the Proactive PostgreSQL DB Performance Scanner:

proactive_pg_db_performance_scanner.sh -h db_host -p 5432 -U postgres -d postgres

相关代码在github上提供了，作者真是个好人
The source code of the Proactive PostgreSQL DB Scanner:

https://github.com/dm8ry/proactive-postgresql-db-performance-scanner

能看到最后的人才能看到这个工具：）