转:debug and release

本文详细阐述了Debug与Release两种编译方式的区别,包括使用的运行时刻函数库、优化开关及调试信息等方面的不同。同时分析了可能导致Release版本出现错误的原因,如优化选项导致的问题等。
一、Debug   和   Release   编译方式的本质区别  
   
          Debug   通常称为调试版本,它包含调试信息,并且不作任何优化,便于程序员调试程序。Release   称为发布版本,它往往是进行了各种优化,使得程序在代码大小和运行速度上都是最优的,以便用户很好地使用。  
          Debug   和   Release   的真正秘密,在于一组编译选项。下面列出了分别针对二者的选项(当然除此之外还有其他一些,如/Fd   /Fo,但区别并不重要,通常他们也不会引起   Release   版错误,在此不讨论)  
           
  Debug   版本:  
    /MDd   /MLd   或   /MTd       使用   Debug   runtime   library(调试版本的运行时刻函数库)  
    /Od                                   关闭优化开关  
    /D   "_DEBUG"                   相当于   #define   _DEBUG,打开编译调试代码开关(主要针对  
                                            assert函数)  
    /ZI                                   创建   Edit   and   continue(编辑继续)数据库,这样在调试过  
                                            程中如果修改了源代码不需重新编译  
    /GZ                                   可以帮助捕获内存错误  
    /Gm                                   打开最小化重链接开关,减少链接时间  
                                             
  Release   版本:                
    /MD   /ML   或   /MT             使用发布版本的运行时刻函数库  
    /O1   或   /O2                     优化开关,使程序最小或最快  
    /D   "NDEBUG"                   关闭条件编译调试代码开关(即不编译assert函数)  
    /GF                                   合并重复的字符串,并将字符串常量放到只读内存,防止  
                                            被修改  
   
          实际上,Debug   和   Release   并没有本质的界限,他们只是一组编译选项的集合,编译器只是按照预定的选项行动。事实上,我们甚至可以修改这些选项,从而得到优化过的调试版本或是带跟踪语句的发布版本。  
           
  二、哪些情况下   Release   版会出错  
   
          有了上面的介绍,我们再来逐个对照这些选项看看   Release   版错误是怎样产生的  
           
    1.   Runtime   Library:链接哪种运行时刻函数库通常只对程序的性能产生影响。调试版本的   Runtime   Library   包含了调试信息,并采用了一些保护机制以帮助发现错误,因此性能不如发布版本。编译器提供的   Runtime   Library   通常很稳定,不会造成   Release   版错误;倒是由于   Debug   的   Runtime   Library   加强了对错误的检测,如堆内存分配,有时会出现   Debug   有错但   Release   正常的现象。应当指出的是,如果   Debug   有错,即使   Release   正常,程序肯定是有   Bug   的,只不过可能是   Release   版的某次运行没有表现出来而已。  
     
    2.   优化:这是造成错误的主要原因,因为关闭优化时源程序基本上是直接翻译的,而打开优化后编译器会作出一系列假设。这类错误主要有以下几种:  
     
          (1)   帧指针(Frame   Pointer)省略(简称   FPO   ):在函数调用过程中,所有调用信息(返回地址、参数)以及自动变量都是放在栈中的。若函数的声明与实现不同(参数、返回值、调用方式),就会产生错误————但   Debug   方式下,栈的访问通过   EBP   寄存器保存的地址实现,如果没有发生数组越界之类的错误(或是越界“不多”),函数通常能正常执行;Release   方式下,优化会省略   EBP   栈基址指针,这样通过一个全局指针访问栈就会造成返回地址错误是程序崩溃。C++   的强类型特性能检查出大多数这样的错误,但如果用了强制类型转换,就不行了。你可以在   Release   版本中强制加入   /Oy-   编译选项来关掉帧指针省略,以确定是否此类错误。此类错误通常有:  
             
            ●   MFC   消息响应函数书写错误。正确的应为  
              afx_msg   LRESULT   OnMessageOwn(WPARAM   wparam,   LPARAM   lparam);  
              ON_MESSAGE   宏包含强制类型转换。防止这种错误的方法之一是重定义   ON_MESSAGE   宏,把下列代码加到   stdafx.h   中(在#include   "afxwin.h"之后),函数原形错误时编译会报错  
              #undef   ON_MESSAGE  
              #define   ON_MESSAGE(message,   memberFxn)   /  
              {   message,   0,   0,   0,   AfxSig_lwl,   /  
              (AFX_PMSG)(AFX_PMSGW)(static_cast<   LRESULT   (AFX_MSG_CALL   /  
              CWnd::*)(WPARAM,   LPARAM)   >   (&memberFxn)   },  
               
          (2)   volatile   型变量:volatile   告诉编译器该变量可能被程序之外的未知方式修改(如系统、其他进程和线程)。优化程序为了使程序性能提高,常把一些变量放在寄存器中(类似于   register   关键字),而其他进程只能对该变量所在的内存进行修改,而寄存器中的值没变。如果你的程序是多线程的,或者你发现某个变量的值与预期的不符而你确信已正确的设置了,则很可能遇到这样的问题。这种错误有时会表现为程序在最快优化出错而最小优化正常。把你认为可疑的变量加上   volatile   试试。  
           
          (3)   变量优化:优化程序会根据变量的使用情况优化变量。例如,函数中有一个未被使用的变量,在   Debug   版中它有可能掩盖一个数组越界,而在   Release   版中,这个变量很可能被优化调,此时数组越界会破坏栈中有用的数据。当然,实际的情况会比这复杂得多。与此有关的错误有:  
            ●   非法访问,包括数组越界、指针错误等。例如  
                    void   fn(void)  
                    {  
                        int   i;  
                        i   =   1;  
                        int   a[4];  
                        {  
                            int   j;  
                            j   =   1;  
                        }  
                        a[-1]   =   1;//当然错误不会这么明显,例如下标是变量  
                        a[4]   =   1;  
                    }  
                j   虽然在数组越界时已出了作用域,但其空间并未收回,因而   i   和   j   就会掩盖越界。而   Release   版由于   i、j   并未其很大作用可能会被优化掉,从而使栈被破坏。  
   
  3.   _DEBUG   与   NDEBUG   :当定义了   _DEBUG   时,assert()   函数会被编译,而   NDEBUG   时不被编译。除此之外,VC++中还有一系列断言宏。这包括:  
   
          ANSI   C   断言                   void   assert(int   expression   );  
          C   Runtime   Lib   断言     _ASSERT(   booleanExpression   );  
                                                  _ASSERTE(   booleanExpression   );  
          MFC   断言                         ASSERT(   booleanExpression   );  
                                                  VERIFY(   booleanExpression   );  
                                                  ASSERT_VALID(   pObject   );  
                                                  ASSERT_KINDOF(   classname,   pobject   );  
          ATL   断言                         ATLASSERT(   booleanExpression   );  
          此外,TRACE()   宏的编译也受   _DEBUG   控制。  
   
          所有这些断言都只在   Debug版中才被编译,而在   Release   版中被忽略。唯一的例外是   VERIFY()   。事实上,这些宏都是调用了   assert()   函数,只不过附加了一些与库有关的调试代码。如果你在这些宏中加入了任何程序代码,而不只是布尔表达式(例如赋值、能改变变量值的函数调用   等),那么   Release   版都不会执行这些操作,从而造成错误。初学者很容易犯这类错误,查找的方法也很简单,因为这些宏都已在上面列出,只要利用   VC++   的   Find   in   Files   功能在工程所有文件中找到用这些宏的地方再一一检查即可。另外,有些高手可能还会加入   #ifdef   _DEBUG   之类的条件编译,也要注意一下。  
          顺便值得一提的是   VERIFY()   宏,这个宏允许你将程序代码放在布尔表达式里。这个宏通常用来检查   Windows   API   的返回值。有些人可能为这个原因而滥用   VERIFY()   ,事实上这是危险的,因为   VERIFY()   违反了断言的思想,不能使程序代码和调试代码完全分离,最终可能会带来很多麻烦。因此,专家们建议尽量少用这个宏。  
   
  4.   /GZ   选项:这个选项会做以下这些事  
   
          (1)   初始化内存和变量。包括用   0xCC   初始化所有自动变量,0xCD   (   Cleared   Data   )   初始化堆中分配的内存(即动态分配的内存,例如   new   ),0xDD   (   Dead   Data   )   填充已被释放的堆内存(例如   delete   ),0xFD(   deFencde   Data   )   初始化受保护的内存(debug   版在动态分配内存的前后加入保护内存以防止越界访问),其中括号中的词是微软建议的助记词。这样做的好处是这些值都很大,作为指针是不可能的(而且   32   位系统中指针很少是奇数值,在有些系统中奇数的指针会产生运行时错误),作为数值也很少遇到,而且这些值也很容易辨认,因此这很有利于在   Debug   版中发现   Release   版才会遇到的错误。要特别注意的是,很多人认为编译器会用   0   来初始化变量,这是错误的(而且这样很不利于查找错误)。  
          (2)   通过函数指针调用函数时,会通过检查栈指针验证函数调用的匹配性。(防止原形不匹配)  
          (3)   函数返回前检查栈指针,确认未被修改。(防止越界访问和原形不匹配,与第二项合在一起可大致模拟帧指针省略   FPO   )  
           
          通常   /GZ   选项会造成   Debug   版出错而   Release   版正常的现象,因为   Release   版中未初始化的变量是随机的,这有可能使指针指向一个有效地址而掩盖了非法访问。  
           
  除此之外,/Gm   /GF   等选项造成错误的情况比较少,而且他们的效果显而易见,比较容易发现。   
    
 
2025-09-22 11:11:37.538: watchdog pid 902555: DEBUG: sending watchdog packet to socket:7, type:[M], command ID:24557, data Length:124 2025-09-22 11:11:37.539: watchdog pid 902555: DEBUG: received watchdog packet type:I 2025-09-22 11:11:37.539: watchdog pid 902555: DEBUG: reading packet type I of length 369 2025-09-22 11:11:37.539: watchdog pid 902555: DEBUG: STATE MACHINE INVOKED WITH EVENT = PACKET RECEIVED Current State = LEADER 2025-09-22 11:11:37.539: watchdog pid 902555: DEBUG: received packet, watchdog node:[192.168.1.72:9999 Linux localhost.localdomain] command id:[24557] type:[NODE INFO] state:[LEADER] 2025-09-22 11:11:37.539: watchdog pid 902555: DEBUG: packet I with command ID 24557 is reply to the command M 2025-09-22 11:11:37.539: watchdog pid 902555: DEBUG: Watchdog node "192.168.1.72:9999 Linux localhost.localdomain" has replied for command id 24557 2025-09-22 11:11:37.539: watchdog pid 902555: DEBUG: command I with command id 24557 is finished with COMMAND_FINISHED_ALL_REPLIED 2025-09-22 11:11:37.539: watchdog pid 902555: DEBUG: STATE MACHINE INVOKED WITH EVENT = COMMAND FINISHED Current State = LEADER 2025-09-22 11:11:37.539: watchdog pid 902555: DEBUG: I am the cluster leader node command finished with status:[ALL NODES REPLIED] 2025-09-22 11:11:37.539: watchdog pid 902555: DETAIL: The command was sent to 1 nodes and 1 nodes replied to it 2025-09-22 11:11:37.548: watchdog pid 902555: DEBUG: received watchdog packet type:d 2025-09-22 11:11:37.549: watchdog pid 902555: DEBUG: reading packet type d of length 62 2025-09-22 11:11:37.549: watchdog pid 902555: DEBUG: STATE MACHINE INVOKED WITH EVENT = PACKET RECEIVED Current State = LEADER 2025-09-22 11:11:37.549: watchdog pid 902555: DEBUG: received packet, watchdog node:[192.168.1.72:9999 Linux localhost.localdomain] command id:[24954] type:[DATA REQUEST FOR LEADER] state:[LEADER] 2025-09-22 11:11:37.549: watchdog pid 902555: DEBUG: sending packet, watchdog node:[192.168.1.72:9999 Linux localhost.localdomain] command id:[24954] type:[DATA] state:[LEADER] 2025-09-22 11:11:37.549: watchdog pid 902555: DEBUG: sending watchdog packet to socket:7, type:[D], command ID:24954, data Length:110 2025-09-22 11:11:37.924: health_check1 pid 902636: DEBUG: health check: clearing alarm 2025-09-22 11:11:37.924: health_check1 pid 902636: DEBUG: SSL is requested but SSL support is not available 2025-09-22 11:11:37.926: health_check1 pid 902636: DEBUG: authenticate kind = 10 2025-09-22 11:11:37.935: health_check1 pid 902636: DEBUG: SCRAM authentication successful for user:pgpool 2025-09-22 11:11:37.935: health_check1 pid 902636: DEBUG: authenticate backend: key data received 2025-09-22 11:11:37.935: health_check1 pid 902636: DEBUG: authenticate backend: transaction state: I 2025-09-22 11:11:37.935: health_check1 pid 902636: DEBUG: health check: clearing alarm 2025-09-22 11:11:37.935: health_check1 pid 902636: DEBUG: health check: clearing alarm 2025-09-22 11:11:38.969: heart_beat_sender pid 902601: DEBUG: watchdog heartbeat: send heartbeat signal to 192.168.1.72:9694 2025-09-22 11:11:40.969: heart_beat_sender pid 902601: DEBUG: watchdog heartbeat: send heartbeat signal to 192.168.1.72:9694 2025-09-22 11:11:42.544: watchdog pid 902555: DEBUG: new IPC connection received 2025-09-22 11:11:42.544: watchdog pid 902555: DEBUG: sending watchdog packet to socket:9, type:[7], command ID:0, data Length:34 2025-09-22 11:11:42.544: sr_check_worker pid 903049: DEBUG: watchdog status: 4 2025-09-22 11:11:42.544: watchdog pid 902555: DEBUG: new IPC connection received 2025-09-22 11:11:42.544: watchdog pid 902555: DEBUG: received the get data request from local pgpool-II on IPC interface 2025-09-22 11:11:42.544: watchdog pid 902555: DEBUG: sending watchdog packet to socket:9, type:[7], command ID:0, data Length:0 2025-09-22 11:11:42.544: sr_check_worker pid 903049: DEBUG: 1:0: Unexpected 2025-09-22 11:11:42.545: watchdog pid 902555: DEBUG: new IPC connection received 2025-09-22 11:11:42.545: watchdog pid 902555: DEBUG: sending watchdog packet to socket:9, type:[7], command ID:0, data Length:34 2025-09-22 11:11:42.545: sr_check_worker pid 903049: DEBUG: quorum: 1 node_count: -1 2025-09-22 11:11:42.545: sr_check_worker pid 903049: DEBUG: pool_acquire_follow_primary_lock: lock was not held by anyone 2025-09-22 11:11:42.545: sr_check_worker pid 903049: DEBUG: pool_acquire_follow_primary_lock: succeeded in acquiring lock 2025-09-22 11:11:42.545: sr_check_worker pid 903049: DEBUG: SSL is requested but SSL support is not available 2025-09-22 11:11:42.547: sr_check_worker pid 903049: DEBUG: authenticate kind = 10 2025-09-22 11:11:42.555: sr_check_worker pid 903049: DEBUG: SCRAM authentication successful for user:rep 2025-09-22 11:11:42.555: sr_check_worker pid 903049: DEBUG: authenticate backend: key data received 2025-09-22 11:11:42.555: sr_check_worker pid 903049: DEBUG: authenticate backend: transaction state: I 2025-09-22 11:11:42.555: sr_check_worker pid 903049: DEBUG: do_query: extended:0 query:"SELECT pg_catalog.pg_is_in_recovery()" 2025-09-22 11:11:42.556: sr_check_worker pid 903049: DEBUG: verify_backend_node_status: there's no primary node 2025-09-22 11:11:42.556: sr_check_worker pid 903049: DEBUG: node status[0]: 0 2025-09-22 11:11:42.556: sr_check_worker pid 903049: DEBUG: node status[1]: 2 2025-09-22 11:11:42.556: sr_check_worker pid 903049: DEBUG: pool_release_follow_primary_lock called 2025-09-22 11:11:42.935: health_check1 pid 902636: DEBUG: health check: clearing alarm 2025-09-22 11:11:42.936: health_check1 pid 902636: DEBUG: SSL is requested but SSL support is not available 2025-09-22 11:11:42.938: health_check1 pid 902636: DEBUG: authenticate kind = 10 2025-09-22 11:11:42.946: health_check1 pid 902636: DEBUG: SCRAM authentication successful for user:pgpool 2025-09-22 11:11:42.947: health_check1 pid 902636: DEBUG: authenticate backend: key data received 2025-09-22 11:11:42.947: health_check1 pid 902636: DEBUG: authenticate backend: transaction state: I 2025-09-22 11:11:42.947: health_check1 pid 902636: DEBUG: health check: clearing alarm 2025-09-22 11:11:42.947: health_check1 pid 902636: DEBUG: health check: clearing alarm 2025-09-22 11:11:42.969: heart_beat_sender pid 902601: DEBUG: watchdog heartbeat: send heartbeat signal to 192.168.1.72:9694 2025-09-22 11:11:44.969: heart_beat_sender pid 902601: DEBUG: watchdog heartbeat: send heartbeat signal to 192.168.1.72:9694 2025-09-22 11:11:46.970: heart_beat_sender pid 902601: DEBUG: watchdog heartbeat: send heartbeat signal to 192.168.1.72:9694 2025-09-22 11:11:47.550: watchdog pid 902555: DEBUG: STATE MACHINE INVOKED WITH EVENT = TIMEOUT Current State = LEADER 2025-09-22 11:11:47.550: watchdog pid 902555: DEBUG: sending packet, watchdog node:[192.168.1.72:9999 Linux localhost.localdomain] command id:[24558] type:[IAM COORDINATOR] state:[LEADER] 2025-09-22 11:11:47.550: watchdog pid 902555: DEBUG: sending watchdog packet to socket:7, type:[M], command ID:24558, data Length:124 2025-09-22 11:11:47.551: watchdog pid 902555: DEBUG: received watchdog packet type:I 2025-09-22 11:11:47.551: watchdog pid 902555: DEBUG: reading packet type I of length 369 2025-09-22 11:11:47.551: watchdog pid 902555: DEBUG: STATE MACHINE INVOKED WITH EVENT = PACKET RECEIVED Current State = LEADER 2025-09-22 11:11:47.551: watchdog pid 902555: DEBUG: received packet, watchdog node:[192.168.1.72:9999 Linux localhost.localdomain] command id:[24558] type:[NODE INFO] state:[LEADER] 2025-09-22 11:11:47.551: watchdog pid 902555: DEBUG: packet I with command ID 24558 is reply to the command M 2025-09-22 11:11:47.551: watchdog pid 902555: DEBUG: Watchdog node "192.168.1.72:9999 Linux localhost.localdomain" has replied for command id 24558 2025-09-22 11:11:47.551: watchdog pid 902555: DEBUG: command I with command id 24558 is finished with COMMAND_FINISHED_ALL_REPLIED 2025-09-22 11:11:47.551: watchdog pid 902555: DEBUG: STATE MACHINE INVOKED WITH EVENT = COMMAND FINISHED Current State = LEADER 2025-09-22 11:11:47.551: watchdog pid 902555: DEBUG: I am the cluster leader node command finished with status:[ALL NODES REPLIED] 2025-09-22 11:11:47.551: watchdog pid 902555: DETAIL: The command was sent to 1 nodes and 1 nodes replied to it 2025-09-22 11:11:47.560: watchdog pid 902555: DEBUG: received watchdog packet type:d 2025-09-22 11:11:47.560: watchdog pid 902555: DEBUG: reading packet type d of length 62 2025-09-22 11:11:47.560: watchdog pid 902555: DEBUG: STATE MACHINE INVOKED WITH EVENT = PACKET RECEIVED Current State = LEADER 2025-09-22 11:11:47.560: watchdog pid 902555: DEBUG: received packet, watchdog node:[192.168.1.72:9999 Linux localhost.localdomain] command id:[24955] type:[DATA REQUEST FOR LEADER] state:[LEADER] 2025-09-22 11:11:47.560: watchdog pid 902555: DEBUG: sending packet, watchdog node:[192.168.1.72:9999 Linux localhost.localdomain] command id:[24955] type:[DATA] state:[LEADER] 2025-09-22 11:11:47.560: watchdog pid 902555: DEBUG: sending watchdog packet to socket:7, type:[D], command ID:24955, data Length:110 2025-09-22 11:11:47.947: health_check1 pid 902636: DEBUG: health check: clearing alarm 2025-09-22 11:11:47.948: health_check1 pid 902636: DEBUG: SSL is requested but SSL support is not available 2025-09-22 11:11:47.949: health_check1 pid 902636: DEBUG: authenticate kind = 10 2025-09-22 11:11:47.958: health_check1 pid 902636: DEBUG: SCRAM authentication successful for user:pgpool 2025-09-22 11:11:47.958: health_check1 pid 902636: DEBUG: authenticate backend: key data received 2025-09-22 11:11:47.958: health_check1 pid 902636: DEBUG: authenticate backend: transaction state: I 2025-09-22 11:11:47.958: health_check1 pid 902636: DEBUG: health check: clearing alarm 2025-09-22 11:11:47.958: health_check1 pid 902636: DEBUG: health check: clearing alarm 2025-09-22 11:11:48.970: heart_beat_sender pid 902601: DEBUG: watchdog heartbeat: send heartbeat signal to 192.168.1.72:9694 2025-09-22 11:11:50.971: heart_beat_sender pid 902601: DEBUG: watchdog heartbeat: send heartbeat signal to 192.168.1.72:9694 2025-09-22 11:11:52.557: watchdog pid 902555: DEBUG: new IPC connection received 2025-09-22 11:11:52.557: watchdog pid 902555: DEBUG: sending watchdog packet to socket:9, type:[7], command ID:0, data Length:34 2025-09-22 11:11:52.557: sr_check_worker pid 903049: DEBUG: watchdog status: 4 2025-09-22 11:11:52.557: watchdog pid 902555: DEBUG: new IPC connection received 2025-09-22 11:11:52.557: watchdog pid 902555: DEBUG: received the get data request from local pgpool-II on IPC interface 2025-09-22 11:11:52.557: watchdog pid 902555: DEBUG: sending watchdog packet to socket:9, type:[7], command ID:0, data Length:0 2025-09-22 11:11:52.557: sr_check_worker pid 903049: DEBUG: 1:0: Unexpected 2025-09-22 11:11:52.557: watchdog pid 902555: DEBUG: new IPC connection received 2025-09-22 11:11:52.557: watchdog pid 902555: DEBUG: sending watchdog packet to socket:9, type:[7], command ID:0, data Length:34 2025-09-22 11:11:52.557: sr_check_worker pid 903049: DEBUG: quorum: 1 node_count: -1 2025-09-22 11:11:52.558: sr_check_worker pid 903049: DEBUG: pool_acquire_follow_primary_lock: lock was not held by anyone 2025-09-22 11:11:52.558: sr_check_worker pid 903049: DEBUG: pool_acquire_follow_primary_lock: succeeded in acquiring lock 2025-09-22 11:11:52.558: sr_check_worker pid 903049: DEBUG: SSL is requested but SSL support is not available 2025-09-22 11:11:52.560: sr_check_worker pid 903049: DEBUG: authenticate kind = 10 2025-09-22 11:11:52.569: sr_check_worker pid 903049: DEBUG: SCRAM authentication successful for user:rep 2025-09-22 11:11:52.569: sr_check_worker pid 903049: DEBUG: authenticate backend: key data received 2025-09-22 11:11:52.569: sr_check_worker pid 903049: DEBUG: authenticate backend: transaction state: I 2025-09-22 11:11:52.569: sr_check_worker pid 903049: DEBUG: do_query: extended:0 query:"SELECT pg_catalog.pg_is_in_recovery()" 2025-09-22 11:11:52.570: sr_check_worker pid 903049: DEBUG: verify_backend_node_status: there's no primary node 2025-09-22 11:11:52.570: sr_check_worker pid 903049: DEBUG: node status[0]: 0 2025-09-22 11:11:52.570: sr_check_worker pid 903049: DEBUG: node status[1]: 2 2025-09-22 11:11:52.570: sr_check_worker pid 903049: DEBUG: pool_release_follow_primary_lock called 2025-09-22 11:11:52.959: health_check1 pid 902636: DEBUG: health check: clearing alarm 2025-09-22 11:11:52.960: health_check1 pid 902636: DEBUG: SSL is requested but SSL support is not available 2025-09-22 11:11:52.961: health_check1 pid 902636: DEBUG: authenticate kind = 10 2025-09-22 11:11:52.970: health_check1 pid 902636: DEBUG: SCRAM authentication successful for user:pgpool 2025-09-22 11:11:52.970: health_check1 pid 902636: DEBUG: authenticate backend: key data received 2025-09-22 11:11:52.970: health_check1 pid 902636: DEBUG: authenticate backend: transaction state: I 2025-09-22 11:11:52.970: health_check1 pid 902636: DEBUG: health check: clearing alarm 2025-09-22 11:11:52.970: health_check1 pid 902636: DEBUG: health check: clearing alarm 2025-09-22 11:11:52.971: heart_beat_sender pid 902601: DEBUG: watchdog heartbeat: send heartbeat signal to 192.168.1.72:9694 2025-09-22 11:11:54.971: heart_beat_sender pid 902601: DEBUG: watchdog heartbeat: send heartbeat signal to 192.168.1.72:9694 2025-09-22 11:11:56.972: heart_beat_sender pid 902601: DEBUG: watchdog heartbeat: send heartbeat signal to 192.168.1.72:9694 2025-09-22 11:11:57.563: watchdog pid 902555: DEBUG: STATE MACHINE INVOKED WITH EVENT = TIMEOUT Current State = LEADER 2025-09-22 11:11:57.563: watchdog pid 902555: DEBUG: sending packet, watchdog node:[192.168.1.72:9999 Linux localhost.localdomain] command id:[24559] type:[IAM COORDINATOR] state:[LEADER] 2025-09-22 11:11:57.563: watchdog pid 902555: DEBUG: sending watchdog packet to socket:7, type:[M], command ID:24559, data Length:124 2025-09-22 11:11:57.564: watchdog pid 902555: DEBUG: received watchdog packet type:I 2025-09-22 11:11:57.564: watchdog pid 902555: DEBUG: reading packet type I of length 369 2025-09-22 11:11:57.564: watchdog pid 902555: DEBUG: STATE MACHINE INVOKED WITH EVENT = PACKET RECEIVED Current State = LEADER 2025-09-22 11:11:57.564: watchdog pid 902555: DEBUG: received packet, watchdog node:[192.168.1.72:9999 Linux localhost.localdomain] command id:[24559] type:[NODE INFO] state:[LEADER] 2025-09-22 11:11:57.564: watchdog pid 902555: DEBUG: packet I with command ID 24559 is reply to the command M 2025-09-22 11:11:57.564: watchdog pid 902555: DEBUG: Watchdog node "192.168.1.72:9999 Linux localhost.localdomain" has replied for command id 24559 2025-09-22 11:11:57.564: watchdog pid 902555: DEBUG: command I with command id 24559 is finished with COMMAND_FINISHED_ALL_REPLIED 2025-09-22 11:11:57.564: watchdog pid 902555: DEBUG: STATE MACHINE INVOKED WITH EVENT = COMMAND FINISHED Current State = LEADER 2025-09-22 11:11:57.564: watchdog pid 902555: DEBUG: I am the cluster leader node command finished with status:[ALL NODES REPLIED] 2025-09-22 11:11:57.564: watchdog pid 902555: DETAIL: The command was sent to 1 nodes and 1 nodes replied to it 2025-09-22 11:11:57.573: watchdog pid 902555: DEBUG: received watchdog packet type:d 2025-09-22 11:11:57.573: watchdog pid 902555: DEBUG: reading packet type d of length 62 2025-09-22 11:11:57.573: watchdog pid 902555: DEBUG: STATE MACHINE INVOKED WITH EVENT = PACKET RECEIVED Current State = LEADER 2025-09-22 11:11:57.573: watchdog pid 902555: DEBUG: received packet, watchdog node:[192.168.1.72:9999 Linux localhost.localdomain] command id:[24956] type:[DATA REQUEST FOR LEADER] state:[LEADER] 2025-09-22 11:11:57.573: watchdog pid 902555: DEBUG: sending packet, watchdog node:[192.168.1.72:9999 Linux localhost.localdomain] command id:[24956] type:[DATA] state:[LEADER] 2025-09-22 11:11:57.573: watchdog pid 902555: DEBUG: sending watchdog packet to socket:7, type:[D], command ID:24956, data Length:110 2025-09-22 11:11:57.971: health_check1 pid 902636: DEBUG: health check: clearing alarm 2025-09-22 11:11:57.971: health_check1 pid 902636: DEBUG: SSL is requested but SSL support is not available 2025-09-22 11:11:57.973: health_check1 pid 902636: DEBUG: authenticate kind = 10 2025-09-22 11:11:57.981: health_check1 pid 902636: DEBUG: SCRAM authentication successful for user:pgpool 2025-09-22 11:11:57.981: health_check1 pid 902636: DEBUG: authenticate backend: key data received 2025-09-22 11:11:57.981: health_check1 pid 902636: DEBUG: authenticate backend: transaction state: I 2025-09-22 11:11:57.981: health_check1 pid 902636: DEBUG: health check: clearing alarm 2025-09-22 11:11:57.981: health_check1 pid 902636: DEBUG: health check: clearing alarm 2025-09-22 11:11:58.972: heart_beat_sender pid 902601: DEBUG: watchdog heartbeat: send heartbeat signal to 192.168.1.72:9694 2025-09-22 11:12:00.972: heart_beat_sender pid 902601: DEBUG: watchdog heartbeat: send heartbeat signal to 192.168.1.72:9694 2025-09-22 11:12:02.570: watchdog pid 902555: DEBUG: new IPC connection received 2025-09-22 11:12:02.570: watchdog pid 902555: DEBUG: sending watchdog packet to socket:9, type:[7], command ID:0, data Length:34 2025-09-22 11:12:02.570: sr_check_worker pid 903049: DEBUG: watchdog status: 4 2025-09-22 11:12:02.570: watchdog pid 902555: DEBUG: new IPC connection received 2025-09-22 11:12:02.571: watchdog pid 902555: DEBUG: received the get data request from local pgpool-II on IPC interface 2025-09-22 11:12:02.571: watchdog pid 902555: DEBUG: sending watchdog packet to socket:9, type:[7], command ID:0, data Length:0 2025-09-22 11:12:02.571: sr_check_worker pid 903049: DEBUG: 1:0: Unexpected 2025-09-22 11:12:02.571: watchdog pid 902555: DEBUG: new IPC connection received 2025-09-22 11:12:02.571: watchdog pid 902555: DEBUG: sending watchdog packet to socket:9, type:[7], command ID:0, data Length:34 2025-09-22 11:12:02.571: sr_check_worker pid 903049: DEBUG: quorum: 1 node_count: -1 2025-09-22 11:12:02.571: sr_check_worker pid 903049: DEBUG: pool_acquire_follow_primary_lock: lock was not held by anyone 2025-09-22 11:12:02.571: sr_check_worker pid 903049: DEBUG: pool_acquire_follow_primary_lock: succeeded in acquiring lock 2025-09-22 11:12:02.571: sr_check_worker pid 903049: DEBUG: SSL is requested but SSL support is not available 2025-09-22 11:12:02.572: sr_check_worker pid 903049: DEBUG: authenticate kind = 10 2025-09-22 11:12:02.581: sr_check_worker pid 903049: DEBUG: SCRAM authentication successful for user:rep 2025-09-22 11:12:02.581: sr_check_worker pid 903049: DEBUG: authenticate backend: key data received 2025-09-22 11:12:02.581: sr_check_worker pid 903049: DEBUG: authenticate backend: transaction state: I 2025-09-22 11:12:02.581: sr_check_worker pid 903049: DEBUG: do_query: extended:0 query:"SELECT pg_catalog.pg_is_in_recovery()" 2025-09-22 11:12:02.581: sr_check_worker pid 903049: DEBUG: verify_backend_node_status: there's no primary node 2025-09-22 11:12:02.581: sr_check_worker pid 903049: DEBUG: node status[0]: 0 2025-09-22 11:12:02.581: sr_check_worker pid 903049: DEBUG: node status[1]: 2 2025-09-22 11:12:02.581: sr_check_worker pid 903049: DEBUG: pool_release_follow_primary_lock called 2025-09-22 11:12:02.972: heart_beat_sender pid 902601: DEBUG: watchdog heartbeat: send heartbeat signal to 192.168.1.72:9694 2025-09-22 11:12:02.981: health_check1 pid 902636: DEBUG: health check: clearing alarm 2025-09-22 11:12:02.982: health_check1 pid 902636: DEBUG: SSL is requested but SSL support is not available 2025-09-22 11:12:02.983: health_check1 pid 902636: DEBUG: authenticate kind = 10 2025-09-22 11:12:02.992: health_check1 pid 902636: DEBUG: SCRAM authentication successful for user:pgpool 2025-09-22 11:12:02.992: health_check1 pid 902636: DEBUG: authenticate backend: key data received 2025-09-22 11:12:02.992: health_check1 pid 902636: DEBUG: authenticate backend: transaction state: I 2025-09-22 11:12:02.992: health_check1 pid 902636: DEBUG: health check: clearing alarm 2025-09-22 11:12:02.992: health_check1 pid 902636: DEBUG: health check: clearing alarm 2025-09-22 11:12:04.973: heart_beat_sender pid 902601: DEBUG: watchdog heartbeat: send heartbeat signal to 192.168.1.72:9694 2025-09-22 11:12:06.973: heart_beat_sender pid 902601: DEBUG: watchdog heartbeat: send heartbeat signal to 192.168.1.72:9694根据pgpool日志,排除错误,写出详细故障处理步骤
最新发布
09-23
根据日志信息,系统存在以下关键问题: 1. **SSL支持缺失**:多处出现 `SSL is requested but SSL support is not available` 2. **主节点缺失**:`verify_backend_node_status: there's no primary node` 3. **节点状态异常**:`node status[0]: 0`(DOWN)和 `node status[1]: 2`(可能为备用节点) 4. **IPC通信异常**:`1:0: Unexpected` 和 `node_count: -1` ### 详细故障处理步骤 #### 1. 解决SSL支持缺失问题 ```bash # 检查是否安装OpenSSL开发包 rpm -qa | grep openssl-devel # RHEL/CentOS dpkg -l | grep libssl-dev # Debian/Ubuntu # 若未安装则执行(以RHEL为例) sudo yum install openssl-devel # 重新编译安装Pgpool-II(确保配置时启用SSL) ./configure --with-openssl make && sudo make install ``` #### 2. 修复主节点缺失问题 ```sql -- 在所有后端节点上执行检查 SELECT pg_is_in_recovery(), pg_isready(); -- 确认各节点状态 -- 如果主节点确实宕机,需手动提升备用节点(以节点1为例) -- 在备用节点上执行: SELECT pg_promote(); -- PostgreSQL 9.6+ -- 或使用pgpool的failover命令 pcp_promote_node -h localhost -U pgpool_user -n 1 ``` #### 3. 修复节点状态异常 ```bash # 检查pcp.conf和pgpool.conf配置 vim /etc/pgpool-II/pgpool.conf # 确保以下参数正确: backend_hostname0 = '192.168.1.70' # 主节点IP backend_port0 = 5432 backend_weight0 = 1 backend_data_directory0 = '/var/lib/postgresql/12/main' # 检查心跳配置 heartbeat_destination0 = '192.168.1.70' heartbeat_device = 'eth0' # 重启服务使配置生效 sudo systemctl restart pgpool-II ``` #### 4. 处理IPC通信异常 ```bash # 检查Unix域套接字权限 ls -la /tmp/.s.PGSQL.* sudo chown postgres:postgres /tmp/.s.PGSQL.* # 检查IPC连接限制 ipcs -l sudo sysctl fs.aio-max-nr=1048576 # 必要时增加限制 # 检查pgpool的IPC配置 vim /etc/pgpool-II/pgpool.conf # 确保以下参数: num_init_children = 32 child_life_time = 300 ``` #### 5. 验证修复效果 ```bash # 检查集群状态 pcp_node_info -h localhost -U pgpool_user -v # 执行连接测试 psql -h localhost -p 9999 -U postgres -c "SHOW pool_nodes" # 监控日志(新开终端) tail -f /var/log/pgpool/pgpool.log | grep -E "ERROR|WARNING|DETAIL" ``` #### 6. 长期维护建议 1. 配置自动故障检测: ```bash # 在pgpool.conf中启用 health_check_period = 10 health_check_timeout = 30 sr_check_period = 10 ``` 2. 设置监控告警: ```bash # 使用cron监控主节点状态 */5 * * * * /usr/bin/psql -h localhost -p 9999 -U monitor_user -c "SELECT 1 FROM pool_nodes WHERE status=2 AND role='primary'" | grep -q 1 || echo "Primary node down!" | mail -s "Pgpool Alert" admin@example.com ``` 3. 定期验证故障移: ```bash # 模拟测试 sudo systemctl stop postgresql@12-main # 停止主节点服务 # 观察pgpool日志是否自动触发failover ```
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值