环境:redhat 7.4 postgresql 12.3 pg_squeeze 1.2
pg_squeeze下载地址:https://github.com/cybertec-postgresql/pg_squeeze/
介绍:pg_squeeze是一个扩展,它从表中删除未使用的空间,并且可以选择根据特定索引对元组进行排序,一般当一个表膨胀时一般使用vacuum full或者cluster进行表重建,在这一过程中会加排他锁,导致该表无法进行读写,只有等整个过程完成后才可以进行正常使用。
pg_squeeze就是来解决这个问题的,它只会在最后切换filenode的过程中加锁,其他过程中是不影响读写的。
pg_squeeze的优点
相比pg_repack或pg_reorg,pg_squeeze不需要建触发器,所以在重组时对原表的DML几乎没有性能影响。安装
解压文件,安装即可
安装报错:
[postgres@localhost pg_squeeze-REL1_0_PG_10]$ make
make: --pgxs: Command not found
make: *** No targets. Stop.
解决方式:
找到pg_config所在路径
添加到Makefile文件中
PG_CONFIG=/opt/pg12/bin/pg_config
如何查看本机PG_CONFIG可用目录位置:
sudo find / -name "pg_config" -print
[postgres@localhost pg_squeeze-master]$ make
[postgres@localhost pg_squeeze-master]$ make install
配置Posgresql,修改postgresql.conf添加如下配置
wal_level = logical
max_replication_slots = 1 # 大于等于1
shared_preload_libraries = 'pg_squeeze'
添加插件
[postgres@localhost data]$ psql
psql (12.3)
Type "help" for help.
postgres=# create extension pg_squeeze;
CREATE EXTENSION
查看插件安装情况
postgres=# \dx
List of installed extensions
Name | Version | Schema | Description
--------------------+---------+------------+-----------------------------------------------------------
pg_squeeze | 1.2 | squeeze | A tool to remove unused space from a relation.
pg_stat_statements | 1.7 | public | track execution statistics of all SQL statements executed
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(3 rows)
postgres=# \d squeeze.tables;
Table "squeeze.tables"
Column | Type | Collation | Nullable | Default
------------------+-----------------------+-----------+----------+------------------------------------
id | integer | | not null | nextval('tables_id_seq'::regclass)
tabschema | name | | not null |
tabname | name | | not null |
clustering_index | name | | |
rel_tablespace | name | | |
ind_tablespaces | name[] | | |
schedule | time with time zone[] | | not null |
free_space_extra | integer | | not null | 50
min_size | real | | not null | 8
vacuum_max_age | interval | | not null | '01:00:00'::interval
max_retry | integer | | not null | 0
skip_analyze | boolean | | not null | false
Indexes:
"tables_pkey" PRIMARY KEY, btree (id)
"tables_tabschema_tabname_key" UNIQUE CONSTRAINT, btree (tabschema, tabname)
Check constraints:
"tables_free_space_extra_check" CHECK (free_space_extra >= 0 AND free_space_extra < 100)
"tables_min_size_check" CHECK (min_size > 0.0::double precision)
Referenced by:
TABLE "tables_internal" CONSTRAINT "tables_internal_table_id_fkey" FOREIGN KEY (table_id) REFERENCES tables(id) ON DELETE CASCADE
TABLE "tasks" CONSTRAINT "tasks_table_id_fkey" FOREIGN KEY (table_id) REFERENCES tables(id) ON DELETE CASCADE
Triggers:
tables_internal_trig AFTER INSERT ON tables FOR EACH ROW EXECUTE FUNCTION tables_internal_trig_func()
参数解释:
tabschema和tabname分别是架构和表名。
schedule:列告诉您应该在一天的什么时候检查表格。这样的检查可能导致新的处理任务。
注:此参数替代了之前版本的task_interval和first_check两个字段(task_interval :表示检查表膨胀是否超过阀值的时间间隔;first_check :表示第一次检查时间)
free_space_extra:表示空闲空间超过多少时就会对表进行重建,默认是50。
min_size:被处理的最小的表占用的磁盘空间,默认是8。
vacuum_max_age:当进行一次vacuum后,认为fsm是有效的最大时间,默认1小时。
max_retry:当重建表失败时最大的重新尝试的次数,默认是0(即不重试)。
clustering_index:用来在表重构结束后对表元组做物理排序的索引。
rel_tablespace:表示表重建时,移动到哪个表空间中,NULL表示该表应保留在原处。
ind_tablespaces:是一个二维数组,其中每行指定索引的表空间映射。第一和第二列分别表示索引名称和表空间名称。所有未指定映射的索引将保留在原始表空间中。
关于表空间,值得一提的是一种特殊情况:如果表空间是为表而不是索引指定的,则表将移至该表空间,但索引仍保留在原始表空间中(即表的表空间不是索引的默认值)正如人们所期望的那样。
skip_analyze:表示不应在ANALYZE命令之后进行表处理。默认值为“ false”,表示默认情况下执行ANALYZE。
注册语句:
最简单的“注册”看起来像
INSERT INTO squeeze.tables (tabschema, tabname, schedule) VALUES ('public', 'test', '{22:30, 03:00}');
可以选择指定其他列,例如:
INSERT INTO squeeze.tables (tabschema, tabname, schedule,free_space_extra,vacuum_max_age,max_retry) VALUES ('public', 'test', '{22:30, 03:00}', 30, '2 hours', 2);
启动pg_squeeze的进程需要调用
SELECT squeeze.start_worker();
关闭
SELECT squeeze.stop_worker();
注册
postgres=# INSERT INTO squeeze.tables (tabschema, tabname, schedule,free_space_extra,vacuum_max_age,max_retry) VALUES ('public', 'test', '{22:30, 03:00}', 30, '2 hours', 2);
INSERT 0 1
开启pg_squeeze
postgres=# SELECT squeeze.start_worker();
start_worker
--------------
4230
(1 row)
创建表
postgres=# create table test(id int primary key);
CREATE TABLE
要求:表中必须要有一个主键或者是唯一键。
插入数据
postgres=# insert into test select generate_series(1,5000000);
INSERT 0 5000000
查看表大小
postgres=# \dt+ test;
List of relations
Schema | Name | Type | Owner | Size | Description
--------+------+-------+----------+--------+-------------
public | test | table | postgres | 173 MB |
(1 row)
删除数据
postgres=# delete from test where id < 2500000;
DELETE 2499999
查看当前表的膨胀情况
postgres=# select * from squeeze.tables_internal;
table_id | class_id | class_id_toast | free_space | last_task_created | last_task_finished
----------+----------+----------------+------------+------------------------------+------------------------------
1 | | | | 2020-06-22 11:26:41.69873+08 | 2020-06-22 11:26:41.82096+08
(1 row)
postgres=# \dt+ test;
List of relations
Schema | Name | Type | Owner | Size | Description
--------+------+-------+----------+--------+-------------
public | test | table | postgres | 173 MB |
(1 row)
postgres=# 2020-06-22 11:26:41.825 CST [4230] LOG: logical decoding found consistent point at 0/37487C40
2020-06-22 11:26:41.825 CST [4230] DETAIL: There are no running transactions.
2020-06-22 11:26:41.825 CST [4230] CONTEXT: SQL statement "SELECT squeeze.squeeze_table(v_tabschema, v_tabname,
v_cl_index, v_rel_tbsp, v_ind_tbsps)"
PL/pgSQL function squeeze.process_current_task() line 41 at PERFORM
SQL statement "SELECT squeeze.process_current_task()"
查看当前的事务
postgres=# SELECT squeeze.process_current_task();
查看表空间
postgres=# \dt+ test;
List of relations
Schema | Name | Type | Owner | Size | Description
--------+------+-------+----------+-------+-------------
public | test | table | postgres | 86 MB |
(1 row)
表空间已经释放
参考:
https://github.com/cybertec-postgresql/pg_squeeze/
https://blog.youkuaiyun.com/u012551524/article/details/88802570