PostgreSQL standby replication error : invalid record length at %u

最新推荐文章于 2024-09-15 15:53:39 发布

weixin_33712881

最新推荐文章于 2024-09-15 15:53:39 发布

阅读量1.3k

点赞数

CC 4.0 BY-SA版权

文章标签：数据库运维

原文链接：https://yq.aliyun.com/articles/14700

本文记录了一位网友启动POSTGRES STANDBY数据库遇到的问题及初步排查过程。主要问题是启动时出现invalid record length错误，通过分析日志与配置文件，探讨了可能的原因并尝试了解决方案。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一位QQ上的网友问我的一个问题，我觉得比较有意思。

记录如下 :

Question :

我在启动POSTGRES的STANDBY数据库时，报

--------------------------

LOG: database system was shut down in recovery at 2011-12-30 23:20:25 CST

LOG: entering standby mode

LOG: restored log file "0000000100000002000000E9" from archive

LOG: invalid record length at 2/E9000108

LOG: invalid record length at 2/E9000108

LOG: streaming replication successfully connected to primary

LOG: invalid record length at 2/E9000108

FATAL: terminating walreceiver process due to administrator command

LOG: restored log file "0000000100000002000000E9" from archive

LOG: invalid record length at 2/E9000108

FATAL: the database system is starting up

LOG: invalid record length at 2/E9000108

LOG: restored log file "0000000100000002000000E9" from archive

LOG: invalid record length at 2/E9000108

--------------------------------------

我是把主库整个做了个TAR 包到备库的

问题分析 :

报错 invalid record length at 2/E9000108

源码中对应的部分如下 :

/*

* NOTE: We disallow len == 0 because it provides a useful bit of extra

* error checking in ReadRecord. This means that all callers of

* XLogInsert must supply at least some not-in-a-buffer data. However, we

* make an exception for XLOG SWITCH records because we don't want them to

* ever cross a segment boundary.

*/

if (len == 0 && !isLogSwitch)

elog(PANIC, "invalid xlog record length %u", len);

START_CRIT_SECTION();

因此报错的原因是standby在恢复过程中从 xlog 0000000100000002000000E9 的 2/E9000108位置读取到了len == 0 并且 !isLogSwitch 的信息。

日志中还提到了 restored log file "0000000100000002000000E9" from archive

因此问他要了recovery.conf 配置文件的内容如下 :

standby_mode = 'on'
primary_conninfo = 'host=隐藏 port=5432 user=隐藏 password=隐藏'
restore_command = 'cp "/home/postgres/data/pg_xlog/%f" "%p"'
trigger_file = '/home/postgres/trigger_activestb'

从这个配置文件来看， restore_command = 'cp "/home/postgres/data/pg_xlog/%f" "%p"' 有问题，如果要配置 restore_command，那么应该是把主库的归档文件拷贝standby的pg_xlog目录。

要么就是不要配置 restore_command , 那么standby会尝试从stream获取xlog的信息（要求primary库的这个xlog没有被覆盖）。

Answer :

经过确认主库的 0000000100000002000000E9 存在pg_xlog中未被覆盖.

注释standby节点recovery.conf的restore_command = 'cp "/home/postgres/data/pg_xlog/%f" "%p"'

重启standby.

未解决，还是报 invalid record length at 2/E9000108

另外需要了解的是，

做STANDBY的时候有没有按照正常流程来做( select pg_start_backup(), backup, select pg_stop_backup() )

拷贝数据文件的过程中有没有错误发生.

这些都不得而知, 因此问题不太好判断。

所以先重做 standby 来解决目前的问题 .

最后.

可以尝试使用pg_filedump来分析一下 0000000100000002000000E9的内容。（由于时间关系，暂时没做）

【参考】

src/backend/access/transam/xlog.c

http://blog.163.com/digoal@126/blog/static/163877040201142610215685/

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。