1 pg_ctl utility程序
postgreSQL的启动是从pg_ctl utility开始的,为了探究postgreSQL的启动过程,得从pg_ctl utility程序的main函数开始。我们找到pg_ctl,先给main函数打上短点,再开始启动。详见下面。
[postgres@centos7 postgres]$ gdb /postgres/postgresql-17.2-install/bin/pg_ctl
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /postgres/postgresql-17.2-install/bin/pg_ctl...done.
(gdb) b main
Breakpoint 1 at 0x4046b8: file /postgres/postgresql-17.2-build/../postgresql-17.2/src/bin/pg_ctl/pg_ctl.c, line 2210.
(gdb) set args -D /postgres/postgresql-17.2-data -l logfile start -W
(gdb) r
Starting program: /postgres/postgresql-17.2-install/bin/pg_ctl -D /postgres/postgresql-17.2-data -l logfile start
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Breakpoint 1, main (argc=6, argv=0x7fffffffe2f8) at /postgres/postgresql-17.2-build/../postgresql-17.2/src/bin/pg_ctl/pg_ctl.c:2210
2210 pid_t killproc = 0;
Missing separate debuginfos, use: debuginfo-install glibc-2.17-317.el7.x86_64 glibc-2.17-326.el7_9.3.x86_64
(gdb) p argv[0]
$1 = 0x7fffffffe594 "/postgres/postgresql-17.2-install/bin/pg_ctl"
(gdb) p argv[1]
$2 = 0x7fffffffe5c1 "-D"
(gdb) p argv[2]
$3 = 0x7fffffffe5c4 "/postgres/postgresql-17.2-data"
(gdb) p argv[3]
$4 = 0x7fffffffe5e3 "-l"
(gdb) p argv[4]
$5 = 0x7fffffffe5e6 "logfile"
(gdb) p argv[5]
$6 = 0x7fffffffe5ee "start"
1.1 pg_ctl main()函数主题结构
pg_ctl main函数主题结构如下
pg_ctl main()
{
//第一部分:分析命令行参数
/* process command-line options */
while ((c = getopt_long(argc, argv, "cD:e:l:m:N:o:p:P:sS:t:U:wW",
long_options, &option_index)) != -1)
{
......
}
//第二部分:识别目的动作。
/* Process an action */
if (optind < argc)
{
if (strcmp(argv[optind], "init") == 0
|| strcmp(argv[optind], "initdb") == 0)
ctl_command = INIT_COMMAND;
else if (strcmp(argv[optind], "start") == 0)
ctl_command = START_COMMAND;
else if (strcmp(argv[optind], "stop") == 0)
ctl_command = STOP_COMMAND;
else if (strcmp(argv[optind], "restart") == 0)
ctl_command = RESTART_COMMAND;
else if (strcmp(argv[optind], "reload") == 0)
ctl_command = RELOAD_COMMAND;
......
}
//第三部分:执行目的动作。
switch (ctl_command)
{
case INIT_COMMAND:
do_init();
break;
case STATUS_COMMAND:
do_status();
break;
case START_COMMAND:
do_start();
break;
case STOP_COMMAND:
do_stop();
break;
case RESTART_COMMAND:
do_restart();
break;
case RELOAD_COMMAND:
do_reload();
break;
......
}
}
1.2 do_start()函数
从上面的分析能看出来,还是要do_start函数来处理启动的动作。
2459 switch (ctl_command)
(gdb)
2468 do_start();
(gdb)
1.2.1 一个诡异的子进程
在fork()出子进程作为postMaster进程之前,另一个子进程先神秘的运行了一次,并正常的退出。参见下面的23087子进程。
943 if (exec_path == NULL)
(gdb)
944 exec_path = find_other_exec_or_die(argv0, "postgres", PG_BACKEND_VERSIONSTR);
(gdb)
[Attaching after process 23087 fork to child process 23087]
[New inferior 2 (process 23087)]
[Detaching after fork from parent process 22458]
[Inferior 1 (process 22458) detached]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
process 23087 is executing new program: /usr/bin/bash
Error in re-setting breakpoint 1: Function "do_start" not defined.
process 23087 is executing new program: /postgres/postgresql-17.2-install/bin/postgres
Missing separate debuginfos, use: debuginfo-install bash-4.2.46-34.el7.x86_64
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Inferior 2 (process 23087) exited normally]
实际上是执行了一次postgres -V
379 if ((pipe_cmd = popen(cmd, "r")) == NULL)
(gdb) p cmd
$13 = 0x7fffffffd910 "\"/postgres/postgresql-17.2-install/bin/postgres\" -V"
(gdb) n
[Detaching after fork from child process 64107]
1.2.2 调试子进程
通过给GDB设置set follow-fork-mode child,进入子进程。
start_postmaster () at /postgres/postgresql-17.2-build/../postgresql-17.2/src/bin/pg_ctl/pg_ctl.c:447
447 fflush(NULL);
(gdb) n
453 pm_pid = fork();
(gdb) set follow-fork-mode child
(gdb) n
[Attaching after process 93725 fork to child process 93725]
[New inferior 2 (process 93725)]
[Detaching after fork from parent process 47776]
[Inferior 1 (process 47776) detached]
server starting
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Switching to Thread 0x7ffff7fca740 (LWP 93725)]
454 if (pm_pid < 0)
(gdb) p pm_pid
$23 = 0
从上面可以看出,pg_ctl进程(进程号47776)fork产生了一个pg_ctl子进程(进程号93725),在另外一个窗口查看postgres用户的进程,
postgres 46543 18264 0 15:14 pts/0 00:00:00 gdb /postgres/postgresql-17.2-install/bin/pg_ctl
postgres 47776 46543 0 15:15 pts/0 00:00:00 /postgres/postgresql-17.2-install/bin/pg_ctl -D /postgres/postgresql-17.2-data -l logfile start -W
而在fork之后,再看看子进程如下。
postgres 93725 1 0 15:43 pts/0 00:00:00 /postgres/postgresql-17.2-install/bin/pg_ctl -D /postgres/postgresql-17.2-data -l logfile start -W
父进程47776结束使命,已经退出了,留下子进程93725继续负重前行,此时的子进程93725还是pg_ctl。
1.2.3 从pg_ctl到postMaster的华丽变身
pg_ctl子进程,通过系统调用execl更换进程镜像,把自己变更为postgres继续执行,这将是postMaster进程。
454 if (pm_pid < 0)
(gdb) p pm_pid
$23 = 0
(gdb) n
461 if (pm_pid > 0)
(gdb) n
475 if (setsid() < 0)
(gdb) n
488 if (log_file != NULL)
(gdb)
[tcsetpgrp failed in terminal_inferior: No such process]
489 cmd = psprintf("exec \"%s\" %s%s < \"%s\" >> \"%s\" 2>&1",
(gdb) n
496 (void) execl("/bin/sh", "/bin/sh", "-c", cmd, (char *) NULL);
(gdb) p cmd
$24 = 0x417080 "exec \"/postgres/postgresql-17.2-install/bin/postgres\" -D \"/postgres/postgresql-17.2-data\" < \"/dev/null\" >> \"logfile\" 2>&1"
(gdb) n
process 93725 is executing new program: /usr/bin/bash
Error in re-setting breakpoint 1: No symbol table is loaded. Use the "file" command.
process 93725 is executing new program: /postgres/postgresql-17.2-install/bin/postgres
Missing separate debuginfos, use: debuginfo-install bash-4.2.46-34.el7.x86_64
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Attaching after process 128439 fork to child process 128439]
[New inferior 3 (process 128439)]
[Detaching after fork from parent process 93725]
[Inferior 2 (process 93725) detached]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
1.2.4 postmaster进程的启动命令
通过前面的调试能够看到,pg_ctl子进程通过调用execl更换进程镜像,变为postgres进程,即今后所说的postMaster进程。由此我们知道,以后调试postMaster的启动过程,可以不经过pg_ctl,直接调试下面的命令。
/postgres/postgresql-17.2-install/bin/postgres -D /postgres/postgresql-17.2-data < /dev/null >> logfile 2>&1
这个调用,直接到了src/backend/main/main.c里面的main()函数,任何一个postgres服务端进程,都从这个main()函数开始执行。
调试postMaster进程,也可以从上面的pg_ctl开始,不过要注意,在进入到pg_ctl之后,在切换进程镜像之前,要给即将运行到的postgres main()函数打上断点,如下所示。
[postgres@centos7 postgres]$ gdb /postgres/postgresql-17.2-install/bin/pg_ctl
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /postgres/postgresql-17.2-install/bin/pg_ctl...done.
(gdb) set args -D /postgres/postgresql-17.2-data -l logfile start -W
(gdb) b pg_ctl.c:453
Breakpoint 1 at 0x402c29: file /postgres/postgresql-17.2-build/../postgresql-17.2/src/bin/pg_ctl/pg_ctl.c, line 453.
(gdb) r
Starting program: /postgres/postgresql-17.2-install/bin/pg_ctl -D /postgres/postgresql-17.2-data -l logfile start -W
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Detaching after fork from child process 16513]
Breakpoint 1, start_postmaster ()
at /postgres/postgresql-17.2-build/../postgresql-17.2/src/bin/pg_ctl/pg_ctl.c:453
453 pm_pid = fork();
Missing separate debuginfos, use: debuginfo-install glibc-2.17-317.el7.x86_64 glibc-2.17-326.el7_9.3.x86_64
(gdb) set follow-fork-mode child
(gdb) n
[Attaching after process 18245 fork to child process 18245]
[New inferior 2 (process 18245)]
[Detaching after fork from parent process 16509]
[Inferior 1 (process 16509) detached]
server starting
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Switching to Thread 0x7ffff7fca740 (LWP 18245)]
454 if (pm_pid < 0)
(gdb) n
461 if (pm_pid > 0)
(gdb) p pm_pid
$1 = 0
(gdb) n
475 if (setsid() < 0)
(gdb)
488 if (log_file != NULL)
(gdb)
[tcsetpgrp failed in terminal_inferior: No such process]
489 cmd = psprintf("exec \"%s\" %s%s < \"%s\" >> \"%s\" 2>&1",
(gdb)
496 (void) execl("/bin/sh", "/bin/sh", "-c", cmd, (char *) NULL);
(gdb) p cmd
$2 = 0x417080 "exec \"/postgres/postgresql-17.2-install/bin/postgres\" -D \"/postgres/postgresql-17.2-data\" < \"/dev/null\" >> \"logfile\" 2>&1"
(gdb) b main.c:62
No source file named main.c.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 2 (main.c:62) pending.
(gdb) n
process 18245 is executing new program: /usr/bin/bash
Error in re-setting breakpoint 1: No symbol table is loaded. Use the "file" command.
process 18245 is executing new program: /postgres/postgresql-17.2-install/bin/postgres
Missing separate debuginfos, use: debuginfo-install bash-4.2.46-34.el7.x86_64
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Breakpoint 2, main (argc=3, argv=0x7fffffffe288)
at /postgres/postgresql-17.2-build/../postgresql-17.2/src/backend/main/main.c:62
62 reached_main = true;
Missing separate debuginfos, use: debuginfo-install glibc-2.17-317.el7.x86_64 glibc-2.17-326.el7_9.3.x86_64 libicu-50.2-4.el7_7.x86_64 libstdc++-4.8.5-44.el7.x86_64 zlib-1.2.7-19.el7_9.x86_64
(gdb) bt
#0 main (argc=3, argv=0x7fffffffe288)
at /postgres/postgresql-17.2-build/../postgresql-17.2/src/backend/main/main.c:62
(gdb) p argv[0]
$3 = 0x7fffffffe52f "/postgres/postgresql-17.2-install/bin/postgres"
(gdb) p argv[1]
$4 = 0x7fffffffe55e "-D"
(gdb) p argv[2]
$5 = 0x7fffffffe561 "/postgres/postgresql-17.2-data"
(gdb) bt
#0 main (argc=3, argv=0x7fffffffe288)
at /postgres/postgresql-17.2-build/../postgresql-17.2/src/backend/main/main.c:62
(gdb)
2 postgres main()函数基本逻辑
2.1 错误和内存管理
MemoryContextInit();
2.2 设置LOCALE
/*
* Set up locale information
*/
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("postgres"));
init_locale("LC_COLLATE", LC_COLLATE, "");
init_locale("LC_CTYPE", LC_CTYPE, "");
init_locale("LC_MESSAGES", LC_MESSAGES, "");
init_locale("LC_MONETARY", LC_MONETARY, "C");
init_locale("LC_NUMERIC", LC_NUMERIC, "C");
init_locale("LC_TIME", LC_TIME, "C");
unsetenv("LC_ALL");
2.3 根据第一参数,分派到众多子程序
这其中,默认的也是最重要的当然是PostmasterMain()函数,下次继续分析。
if (argc > 1 && strcmp(argv[1], "--check") == 0)
BootstrapModeMain(argc, argv, true);
else if (argc > 1 && strcmp(argv[1], "--boot") == 0)
BootstrapModeMain(argc, argv, false);
#ifdef EXEC_BACKEND
else if (argc > 1 && strncmp(argv[1], "--forkchild", 11) == 0)
SubPostmasterMain(argc, argv);
#endif
else if (argc > 1 && strcmp(argv[1], "--describe-config") == 0)
GucInfoMain();
else if (argc > 1 && strcmp(argv[1], "--single") == 0)
PostgresSingleUserMain(argc, argv,
strdup(get_user_name_or_exit(progname)));
else
PostmasterMain(argc, argv);