【pg_rman】备份GIN索引bug

最近有业务反应,每次使用pg_rman备份恢复之后,过段时间数据库会停止失败。当时查看进程相关信息,发现有个对GIN索引的自动清理进程挂死,怀疑和IN索引有关。如是做了如下测试,发现为pg_rman备份恢复GIN索引文件存在bug导致。


测试版本:PG9.2.21 和pg_rman1.3


测试步骤如下:


1、建立表和索引


postgres@X86U149:~$ psql postgres -p 5432
psql (9.2.21)
Type "help" for help.

postgres=# \d+ tsvector_test
                       Table "public.tsvector_test"
   Column   |   Type   | Modifiers | Storage  | Stats target | Description
------------+----------+-----------+----------+--------------+-------------
 id         | bigint   |           | plain    |              |
 phone_list | tsvector |           | extended |              |
Indexes:
    "idx_tsvector_test" gin (phone_list)
Has OIDs: no

postgres=# vacuum tsvector_test;
VACUUM
postgres=# \q 


2、使用pg_rman备份恢复

postgres@X86U149:~$ pg_rman backup -B /home/postgres/backup/ -d postgres -p 5432 -b f
INFO: copying database files
INFO: copying archived WAL files
INFO: backup complete
INFO: Please execute 'pg_rman validate' to verify the files are correctly copied.
postgres@X86U149:~$ pg_rman validate -B /home/postgres/backup/
INFO: validate: "2017-06-15 23:10:06" backup and archive log files by CRC
INFO: backup "2017-06-15 23:10:06" is valid
postgres@X86U149:~$ pg_ctl stop
waiting for server to shut down....LOG:  received smart shutdown request
LOG:  autovacuum launcher shutting down
LOG:  shutting down
LOG:  database system is shut down
 done
server stopped
postgres@X86U149:~$ pg_rman restore -B /home/postgres/backup/
INFO: the recovery target timeline ID is not given
INFO: use timeline ID of current database cluster as recovery target: 1
INFO: calculating timeline branches to be used to recovery target point
INFO: searching latest full backup which can be used as restore start point
INFO: found the full backup can be used as base in recovery: "2017-06-15 23:10:06"
INFO: copying online WAL files and server log files
INFO: clearing restore destination
INFO: validate: "2017-06-15 23:10:06" backup and archive log files by SIZE
INFO: backup "2017-06-15 23:10:06" is valid
INFO: restoring database files from the full mode backup "2017-06-15 23:10:06"
INFO: searching incremental backup to be restored
INFO: searching backup which contained archived WAL files to be restored
INFO: backup "2017-06-15 23:10:06" is valid
INFO: restoring WAL files from backup "2017-06-15 23:10:06"
INFO: restoring online WAL files and server log files
INFO: generating recovery.conf
INFO: restore complete
HINT: Recovery will start automatically when the PostgreSQL server is started.
postgres@X86U149:~$ pg_ctl start
server starting
postgres@X86U149:~$ LOG:  database system was interrupted; last known up at 2017-06-15 23:10:07 CST
LOG:  starting archive recovery
LOG:  restored log file "000000010000000000000010" from archive
LOG:  redo starts at 0/10000080
LOG:  consistent recovery state reached at 0/100000A8
LOG:  restored log file "000000010000000000000011" from archive
cp: cannot stat '/home/postgres/archive//000000010000000000000012': No such file or directory
LOG:  record with zero length at 0/12000080
LOG:  redo done at 0/12000020
LOG:  last completed transaction was at log time 2017-06-15 23:10:10.481215+08
cp: cannot stat '/home/postgres/archive//00000002.history': No such file or directory
LOG:  selected new timeline ID: 2
cp: cannot stat '/home/postgres/archive//00000001.history': No such file or directory
LOG:  archive recovery complete
LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections 

3、再次启动后对该表执行vacuum清理操作,发现挂死


postgres@X86U149:~$ psql postgres -p 5432
psql (9.2.21)
Type "help" for help.

postgres=# \d+
                        List of relations
 Schema |     Name      | Type  |  Owner   | Size  | Description
--------+---------------+-------+----------+-------+-------------
 public | phone         | table | postgres | 36 MB |
 public | tsvector_test | table | postgres | 47 MB |
(2 rows)

postgres=# vacuum tsvector_test;
^Z
[2]+  Stopped                 psql postgres -p 5432

postgres=# select pid,query from pg_stat_activity;
  pid  |                  query                  
-------+-----------------------------------------
 21633 | vacuum tsvector_test;
 22885 | select pid,query from pg_stat_activity;
(2 rows) 

调用栈信息如下


#0  0x00007f955c3e0057 in semop () at ../sysdeps/unix/syscall-template.S:84
#1  0x00000000006034d8 in PGSemaphoreLock ()
#2  0x0000000000647ae8 in LWLockAcquire ()
#3  0x00000000004b2c2b in ginInsertCleanup ()
#4  0x00000000004b1ade in ginvacuumcleanup ()
#5  0x00000000007212f6 in FunctionCall2Coll ()
#6  0x00000000005787cb in lazy_scan_heap ()
#7  0x0000000000579148 in lazy_vacuum_rel ()
#8  0x00000000005769ba in vacuum_rel ()
#9  0x00000000005774e9 in vacuum ()
#10 0x0000000000656100 in PortalRunUtility ()
#11 0x0000000000656cf5 in PortalRunMulti ()
#12 0x0000000000657858 in PortalRun ()
#13 0x00000000006546d6 in PostgresMain ()
#14 0x0000000000610d40 in ServerLoop ()
#15 0x0000000000611996 in PostmasterMain ()
#16 0x0000000000458b3e in main () 


使用debug版本测试,出现assert断言失败的情况


Assert(maxoff >= FirstOffsetNumber);


后经反复测试,手工备份文件恢复,发现没有问题,怀疑问题为pg_rman问题


将该问题反馈给pg_rman社区,目前已经修复


https://github.com/ossc-db/pg_rman/commit/a09c68e7de519858c76085cfbbdb4f33db993338


Backup GIN index files using copy_file()

This requires an ugly hack in parse_page() to interpret the
non-PageHeaderData content.  Currently, that only includes checking
the page content to see if it is a GIN metapage.  To determine that,
we need to interpret the page content using GIN index implementation-
private struct definition.  Including the PostgreSQL-side header is
useless because it's not allowed to do within FRONTEND code, which
pg_rman is.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值