Understanding postgresql.conf : log*

本文详细解析了PostgreSQL的日志系统配置,包括日志目的地、日志格式、日志级别、日志轮换策略及各类日志选项,帮助读者掌握如何有效监控和调试数据库运行状态。

After loooong pause, adding next (well, second) post to the “series“. This time, I'd like to describe how logging works. And I don't mean binary logging (WAL), but the log for us, humans, to read.

Before I will go to the postgresql.conf options, let me first describe what happens when PostgreSQL wants to log something.

 

All examples below are done using PostgreSQL 9.0.4.

When I'll just create new $PGDATA, with default settings, PostgreSQL is configured to log everything simply by spitting it out to STDERR. Which can be seen when starting it:

14:21:50 depesz@blob ~
=$ pg_ctl -D data start
server starting
 
14:21:54 depesz@blob ~
=$ LOG:  database system was shut down at 2011-04-30 14:21:35 CEST
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started

(the lines with timestamp and ‘depesz@blob' are my prompt).

As you can see I got 2nd prompt, and then I got 3 LOG lines. If I'll connect to this Pg instance, from another console, and run something which will generate error:

=$ psql -d postgres -c 'select 1/0'
ERROR:  division by zero

On the first console I will get additional lines saying:

ERROR:  division by zero
STATEMENT:  select 1/0

This is the basic mode for PostgreSQL. Just spit everything to STDERR, and don't bother doing anything about it.

Pg_ctl has support for redirecting the logs, thanks to its -l option:

14:25:08 depesz@blob ~
=$ pg_ctl -D data -l server.log start
server starting
 
14:25:16 depesz@blob ~
=$ psql -d postgres -c 'select 1/0'
ERROR:  division by zero
 
14:25:32 depesz@blob ~
=$ cat server.log 
LOG:  database system was shut down at 2011-04-30 14:25:07 CEST
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
ERROR:  division by zero
STATEMENT:  select 1/0

(yes, I stopped Pg first, before starting it again).

As you can see now the log is not outputted to STDERR (aside form “server starting" message, which we will discuss later).

This is done in the simplest possible way: before actually running postgres it opens the log for writing, and redirects STDOUT and STDERR there. So, it's virtually the same as:

=$ pg_ctl -D data start >> server.log 2>&1

Now, that we know what happens by default we can dig deeper into configuration, to see what's really happening.

First option that we need to know/understand is log_destination. It has 3 different possible values:

  • stderr – default, logs go STDERR, unmodified
  • syslog – logs go to syslog (system logging daemon – every Unix has one, as far as I know)
  • csvlog – logs are saved to .csv files, but that's a bit more complex, will come to it later

What does stderr we already know. What about syslog? Well. When using syslog, start of PostgreSQL, and all other messages are sent to the syslog daemon, which can then be configured to do something with them.

Generally, what's important – when logging to syslog there are 2 additional parameters:

  • syslog_facility
  • syslog_ident

Syslog_facility can be assumed to be “category of source". Some syslogd configurations choose different logfile based on facility, so it's pretty common to use it for directing logs to different files. Unfortunately list of possible facilities is closed, and not really big: LOCAL0, LOCAL1, LOCAL2, LOCAL3, LOCAL4, LOCAL5, LOCAL6, LOCAL7 (there are also other facilities, but PostgreSQL can't log with them).

Syslog_ident is free text description of what program logged given line.

My setup of syslogd, logs messages to files named like facility, so I have every facility separately. With default settings, start of PostgreSQL, and 1/0 division logged to local0.log file:

=# cat local0.log 
Apr 30 14:34:19 blob postgres[29901]: [1-1] LOG:  database system was shut down at 2011-04-30 14:26:55 CEST
Apr 30 14:34:19 blob postgres[29900]: [1-1] LOG:  database system is ready to accept connections
Apr 30 14:34:19 blob postgres[29904]: [1-1] LOG:  autovacuum launcher started
Apr 30 14:35:03 blob postgres[30116]: [2-1] ERROR:  division by zero
Apr 30 14:35:03 blob postgres[30116]: [2-2] STATEMENT:  select 1/0

If i'll change syslog_facility to LOCAL6, and syslog_ident to whatever, the same test (pg start + 1/0 division), will change the logfile, and logged content will look like:

=# cat local6.log 
Apr 30 14:41:42 blob whatever[30547]: [1-1] LOG:  database system was shut down at 2011-04-30 14:41:18 CEST
Apr 30 14:41:42 blob whatever[30546]: [1-1] LOG:  database system is ready to accept connections
Apr 30 14:41:42 blob whatever[30550]: [1-1] LOG:  autovacuum launcher started
Apr 30 14:41:45 blob whatever[30553]: [2-1] ERROR:  division by zero
Apr 30 14:41:45 blob whatever[30553]: [2-2] STATEMENT:  select 1/0

As you can see instead of “postgres" I now have “whatever" – which is pretty cool, if you're into using syslog, and you have more than one PostgreSQL instance on single server.

Syslog has many cool features (like sending logs to external machine, over TCP or UDP), but I'm not really liking it. Reasons:

  • It's overhead. Instead of just doing write() to file, the data has to be sent to another process, which applies some rules to find where to log it, and then finally do the write.
  • I don't like syslog log format

But, if you need stuff like sending logs to remote machine – syslog is the best option (well, without writing your own software).

Now, let's come back to csvlog. To use it, as comments in postgresql.conf say, we have to enable “logging_collector".

So, let's revert log_destination to stderr, and enable logging_collector.

Quick Pg stop/start, and 1/0 division, and we see … so, what do we see?

First of all, we see that there is another child process for postmaster:

=$ ps f -C postgres
  PID TTY      STAT   TIME COMMAND
31255 pts/0    S      0:00 /opt/pgsql-9.0.4/bin/postgres -D data
<b>31256 ?        Ss     0:00  \_ postgres: logger process             </b>
31258 ?        Ss     0:00  \_ postgres: writer process             
31259 ?        Ss     0:00  \_ postgres: wal writer process         
31260 ?        Ss     0:00  \_ postgres: autovacuum launcher process   
31261 ?        Ss     0:00  \_ postgres: stats collector process

Starting Pg still shows “server started" line, but that's all. And logs are now handled fully by PostgreSQL itself.

With default configuration, they are stored in $PGDATA/pg_log directory, and log file name is based on when Pg was started. So now it is:

=$ cat data/pg_log/postgresql-2011-04-30_145020.log 
LOG:  database system was shut down at 2011-04-30 14:50:19 CEST
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
ERROR:  division by zero
STATEMENT:  select 1/0

Of course name of the file can be changed, but that's not the point for now. This is STDERR logging, with turned on logging_collector. And since we have logging_collector, we can also test csvlog. So, pg_ctl stop, s/stderr/csvlog, start, 1/0 division later we see:

14:54:36 depesz@blob ~
=$ ls -l data/pg_log/
total 4
-rw------- 1 depesz depesz 630 2011-04-30 14:54 postgresql-2011-04-30_145436.csv
-rw------- 1 depesz depesz   0 2011-04-30 14:54 postgresql-2011-04-30_145436.log
 
14:54:39 depesz@blob ~
=$ cat data/pg_log/postgresql-2011-04-30_145436.csv
2011-04-30 14:54:36.296 CEST,,,31579,,4dbffb0c.7b5b,1,,2011-04-30 14:54:36 CEST,,0,LOG,00000,&quot;database system was shut down at 2011-04-30 14:54:34 CEST&quot;,,,,,,,,,&quot;&quot;
2011-04-30 14:54:36.339 CEST,,,31577,,4dbffb0b.7b59,1,,2011-04-30 14:54:35 CEST,,0,LOG,00000,&quot;database system is ready to accept connections&quot;,,,,,,,,,&quot;&quot;
2011-04-30 14:54:36.339 CEST,,,31582,,4dbffb0c.7b5e,1,,2011-04-30 14:54:36 CEST,,0,LOG,00000,&quot;autovacuum launcher started&quot;,,,,,,,,,&quot;&quot;
2011-04-30 14:54:36.957 CEST,&quot;depesz&quot;,&quot;postgres&quot;,31586,&quot;[local]&quot;,4dbffb0c.7b62,1,&quot;SELECT&quot;,2011-04-30 14:54:36 CEST,2/3,0,ERROR,22012,&quot;division by zero&quot;,,,,,,&quot;select 1/0&quot;,,,&quot;psql&quot;

As you can see 2 files were created. .log, which is empty, and .csv.

.log is usually empty. There are some cases when there is something logged there, but these are usually hard-core critical errors related (as far as I recall) with starting PostgreSQL.

The csv file on the other hand is interesting. It contains a lot of information that wasn't previously visible/logged.

This file format, has been designed to contain all necessary information, and be easily loadable to PostgreSQL itself. The whole procedure and table schema (where to load the log to) is described nicely in the fine manual.

But, since we're humans, and not servers ourselves, we prefer formats a bit easier on our eyes. That's why, I generally use normal STDERR log_destination, but logging_collector set to true, to I can use some more of PostgreSQL magic. What magic?

For starters, I can change log_directory, to make it log someplace else.

log_directory, unless it starts with / – is assumed to be inside datadir. Which is helpful in some cases, and problematic in other. If it's problematic in yours, just do:

log_directory = '/var/log/postgresql'

and be happy :).

Next possibility is to use automatic log rotation – without logrotate, or PostgreSQL restarts/reloads.

PostgreSQL supports 2 modes for rotation:

  • time based
  • size based

Time based log rotation works like this. There is option log_rotation_age, which takes integer representing number of minutes, but can also handle some common units: ms (milliseconds), s (seconds), min (minutes), h (hours), and d (days). Honestly – Never tried any other aside from “h" and “d".

This means that whenever given time passes, new log file is created, and entries are written there (but check the details on log_filename below).

Size based rotation is also simple – you specify value (for log_rotation_size guc) that, after the log is that big, PostgreSQL will switch to next log file name.

Default values are ‘1 day' for time based, and '10 MB' for size, which means that PostgreSQL will change (well, technically try to change) log file whenever date changes or when current logfile is over 10MB.

Now, I wrote about “trying to change". Why? It's simple, and related to another log_ guc, that is log_filename.

log_filename, is not really file name – it's more of a template. By default it is: postgresql-%Y-%m-%d_%H%M%S.log.

What these %* things mean? These are actually standard escapes related to strftime function.

So, assuming that today is 30th of April, and it's 3:40pm and 25 seconds, logfile created now would be named postgresql-2011-04-30_154025.log.

And there is a potential problem.

For example, if you'll make your PostgreSQL, not contain anything date/time related – then no log rotation will ever happen – because when PostgreSQL will try to rotate the file, it will get the same file name.

The same thing happens in less obvious case – let's say that you got tired of those 6 digits (time) at the end, and you changed your log_filename to postgresql-%Y-%m-%d.log, while keeping log_rotation_age at 1 day, and log_rotation_size 10MB. What that means? That means that you effectively turned off size based log rotation. Because when log will be 10MB, and PostgreSQL will try to rotate it, it will get the same filename.

Of course there is also log_truncate_on_rotation option. When set, PostgreSQL will truncate existing logfile before switching to it. With this you can still have logfile rotation, even when your log_filename seems to prevent it – at the cost of losing older data.

This option (i.e. log_truncate_on_rotation) is pretty commonly used to keep only fixed set of log files. For example, if we'll set PostgreSQL like this:

log_truncate_on_rotation = on
log_filename = 'postgresql-%a.log'
log_rotation_age = '1d'

(%a is abbreviated week day name), we will have situation, where PostgreSQL will keep 7 log files:

  • postgresql-Mon.log
  • postgresql-Tue.log
  • postgresql-Wed.log
  • postgresql-Thu.log
  • postgresql-Fri.log
  • postgresql-Sat.log
  • postgresql-Sun.log

And when day changes, it will clear week-old log file, and write to it.

Similar situation can be done to keep last month worth of logs, or even last day, in 24 files, per hour.

Having written about it all, before I will move to describe what get's logged, and how, let me mention one more option.

There is option called “silent_mode".

This is used to prevent PostgreSQL from printing to STDERR information about errors that prevent it from starting.

Normal errors – even fatal, are normally logged to log files, but if I'll do something that makes Pg unable to start, I will get information about it on the screen regardless of log_destination:

15:27:39 depesz@blob ~
=$ pg_ctl -D data/ start
server starting
 
15:27:49 depesz@blob ~
=$ LOG:  could not bind IPv6 socket: Permission denied
HINT:  Is another postmaster already running on port 80? If not, wait a few seconds and retry.
LOG:  could not bind IPv4 socket: Permission denied
HINT:  Is another postmaster already running on port 80? If not, wait a few seconds and retry.
WARNING:  could not create listen socket for &quot;localhost&quot;
FATAL:  could not create any TCP/IP sockets

In this example I tried to make PostgreSQL listen on port 80, which wouldn't work, but caused pg to abort startup.

If I'll enable silent_mode, those messages will not be sent to terminal.

You might ask – but why it sends to terminal, when I have perfectly good logging setup?

Reason is very simple – increased visibility. Since Pg didn't start – we should know about it immediately, and not only after some time, when something else will break.

If you are sure that your setup is OK, you might turn on silent_mode, but on the other hand – it doesn't break anything, so why touch it?

Before I will move to next segment, let me show my favorite settings:

log_destination = stderr
logging_collector = on
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_truncate_on_rotation = off
log_rotation_age = '1d'
log_rotation_size = '100MB'

Thanks to this, I get all logs (i.e. no previous log get truncated), I have full history of logs (which I then, with cronjob) compress, and archive, and logs are partitioned in files 100MB at most, so it's easy to view them with virtually any pager or even editor.

Now. On to logs content.

First option that we have is log_min_messages. It's named similarly to client_min_messages, so it's easy to confuse, but they are slightly different.

This option is basically “debug level". You choose which messages should be logged. All levels:

  • debug5
  • debug4
  • debug3
  • debug2
  • debug1
  • info
  • notice
  • warning
  • error
  • log
  • fatal
  • panic

(client_min_messages has less levels to choose from, and has different priority of “log" level).

When you set log_min_messages to some level, it means that all messages of this level, and all levels more important will get logged.

This means that setting it to “panic" will get you only panic, but setting it to debug5 will get you everything.

So, let's see it. log_min_messages debug5, and simple query: “select * from pg_class order by relname limit 3″. Logs got:

DEBUG:  StartTransactionCommand
DEBUG:  StartTransaction
DEBUG:  name: unnamed; blockState:       DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1, children: 
DEBUG:  CommitTransactionCommand
DEBUG:  CommitTransaction
DEBUG:  name: unnamed; blockState:       STARTED; state: INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1, children:

Nothing really cool, but just then the autovacuum kicked in, and I got:

DEBUG:  InitPostgres
DEBUG:  my backend id is 3
DEBUG:  StartTransaction
DEBUG:  name: unnamed; blockState:       DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1, children: 
DEBUG:  CommitTransaction
DEBUG:  name: unnamed; blockState:       STARTED; state: INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1, children: 
DEBUG:  autovacuum: processing database &quot;postgres&quot;
DEBUG:  StartTransaction
DEBUG:  name: unnamed; blockState:       DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1, children: 
DEBUG:  pg_statistic: vac: 0 (threshold 118), anl: 0 (threshold 84)
DEBUG:  pg_type: vac: 0 (threshold 107), anl: 0 (threshold 79)
DEBUG:  pg_attribute: vac: 0 (threshold 437), anl: 0 (threshold 244)
DEBUG:  pg_authid: vac: 0 (threshold 50), anl: 0 (threshold 50)
DEBUG:  pg_proc: vac: 0 (threshold 501), anl: 0 (threshold 276)
DEBUG:  pg_class: vac: 0 (threshold 101), anl: 0 (threshold 75)
DEBUG:  pg_index: vac: 0 (threshold 70), anl: 0 (threshold 60)
DEBUG:  pg_operator: vac: 0 (threshold 191), anl: 0 (threshold 120)
DEBUG:  pg_database: vac: 0 (threshold 50), anl: 0 (threshold 50)
DEBUG:  pg_opclass: vac: 0 (threshold 72), anl: 0 (threshold 61)
DEBUG:  pg_am: vac: 0 (threshold 51), anl: 0 (threshold 50)
DEBUG:  pg_amop: vac: 0 (threshold 121), anl: 0 (threshold 86)
DEBUG:  pg_amproc: vac: 0 (threshold 100), anl: 0 (threshold 75)
DEBUG:  pg_aggregate: vac: 0 (threshold 74), anl: 0 (threshold 62)
DEBUG:  pg_rewrite: vac: 0 (threshold 67), anl: 0 (threshold 59)
DEBUG:  pg_namespace: vac: 0 (threshold 51), anl: 0 (threshold 51)
DEBUG:  pg_tablespace: vac: 0 (threshold 50), anl: 0 (threshold 50)
DEBUG:  pg_cast: vac: 0 (threshold 88), anl: 0 (threshold 69)
DEBUG:  CommitTransaction
DEBUG:  name: unnamed; blockState:       STARTED; state: INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1, children: 
DEBUG:  shmem_exit(0): 8 callbacks to make
DEBUG:  proc_exit(0): 2 callbacks to make
DEBUG:  exit(0)
DEBUG:  shmem_exit(-1): 0 callbacks to make
DEBUG:  proc_exit(-1): 0 callbacks to make
DEBUG:  reaping dead processes
DEBUG:  server process (PID 3041) exited with exit code 0

 

Generally – unless you are PostgreSQL developer (i.e. working on making it even better database) – in which case you don't need to read this blogpost – you don't need those DEBUG lines.

As for other levels:

  • INFO – as far as I can tell PostgreSQL itself uses it only in some cases with notify/listen
  • NOTICE – well, this are used for example when you create a table, with PRIMARY KEY, and it notifies you about index creation for the table
  • WARNING – all kinds of warnings, the most common one is about using escapes (\ character) in strings
  • ERROR – all cases which cause Pg to break transaction. Deadlocks, bad values, constraint violations and so on
  • LOG – used to log information about special event (checkpoint, connections)
  • FATAL – errors which cause Pg session that caused it to abort. For example – bad password given when authenticating
  • PANIC – errors which cause all Pg sessions to be aborted. You don't want this.

Please remember that any level can be also generated from PL functions, with RAISE (or it's equivalent) command, like with:

RAISE LOG 'Calculated income for user % is %', user.id, user.income;

Default setting of “warning" is pretty good idea. Unless you really, truly need more information. I wouldn't suggest to change to higher log level (like error) because you will be missing information on some things that might be wrong.

Next setting, similar when it comes to accepted values is log_min_error_statement with default of “error".

This setting means that whenever there is line logged with given log level (or more severe) the statement that caused it should be logged too.

Let's see how it works. Since I have it set to error – it will not log statements for NOTICES, even if i'll log them. So:

log_min_messages = notice
log_min_error_statement = error

First, I ran:

$ create table x (id serial primary key );
NOTICE:  CREATE TABLE will create implicit sequence "x_id_seq" for serial column "x.id"
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "x_pkey" for table "x"
CREATE TABLE

In logs I got only:

NOTICE:  CREATE TABLE will create implicit sequence "x_id_seq" for serial column "x.id"
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "x_pkey" for table "x"

But when I'll do something that will generate error:

$ select 1/0;
ERROR:  division by zero

Logs get:

ERROR:  division by zero
STATEMENT:  select 1/0;

Generally I would say that it makes sense to log statement for all logged messages, so log_min_messages should be the same as log_min_error_statement. Of course – you might have different opinion, I'm just stating my preference.

Without it, based on details, I will get all warnings logged, but without query that caused them which makes finding the warning and fixing it much harder.

Now, how about actually logging normal queries?

To set it all we have 3 settings:

  • log_statement
  • log_duration
  • log_min_duration_statement

First of them, log_statement, is a yes/no setting for logging all queries (or some subset of queries, based on the type).

Possible options as: none (default), no queries are logged because of log_statement, as log_min_duration_statement can override it.

On the other end of scale we can set log_statement to ‘all', so that it will log every single query.

In between on none and all we have two more settings: “ddl" – which means that it will log all queries that change database schema (create, drop, alter), and “mod" – which logs also queries that change data (updates, inserts, deletes).

One thing that you have to know/understand – decision whether it should be logged is based on analysis of the query user sent. So, if I'll set log_statement to “mod", and then issue:

$ insert into x (id) values (DEFAULT);
INSERT 0 1
 
$ create function q () returns void as $$
>> insert into x (id) values (DEFAULT);
>> $$ language sql;
CREATE FUNCTION
 
$ select q();
   q    
--------
 
(1 row)

In logs I will see only:

LOG:  statement: insert into x (id) values (DEFAULT);
LOG:  statement: create function q () returns void as $$
        insert into x (id) values (DEFAULT);
        $$ language sql;

That is – I didn't get information about insert that happened inside q() function, or even that the function got called.

Next option is log_duration. It's a simple on/off switch which lets you log time that it took PostgreSQL to process every query. Every. Single. Query.

What's even worse – since it just logs time, you don't really get much of an info. With log_duration turned on, after running:

$ select 1;
 ?column? 
----------
        1
(1 row)
 
(depesz@[local]:5900) 12:27:39 [postgres] 
$ select pg_sleep(1);
 pg_sleep 
----------
 
(1 row)
 
(depesz@[local]:5900) 12:27:43 [postgres] 
$ select 2;
 ?column? 
----------
        2
(1 row)

I got in logs:

LOG:  duration: 0.398 ms
LOG:  duration: 1001.620 ms
LOG:  duration: 0.088 ms

Which (given lack of statements) doesn't really tell me much. Of course I could log all statements, and then pair statement with time (using something which I'll describe later). Or just log some statements, pair those with times, and ignore other times.

Generally, in real life I've not seen a sensible use-case for log_duration (which doesn't mean that there aren't any, it's just that I haven't stumbled upon them yet).

The last, and (in my opinion) the best option to control printing queries and their time is log_min_duration_statement.

This options takes integer argument, which is interpreted as time in milliseconds. Every query that runs for this number of milliseconds (or longer) finishes – it get's logged.

Let's see it in action: setting to 50 (50 ms), and running:

$ select pg_sleep(0.5);
 pg_sleep 
----------
 
(1 row)
 
$ select 1;
 ?column? 
----------
        1
(1 row)
 
$ select pg_sleep(0.25);
 pg_sleep 
----------
 
(1 row)

I got in logs:

LOG:  duration: 501.378 ms  statement: select pg_sleep(0.5);
LOG:  duration: 250.391 ms  statement: select pg_sleep(0.25);

Please note that select 1 was not logged – it was simply too fast.

One important note though, quite obvious, but perhaps not at the first sight.

log_statement logs query when it starts – i.e. as soon as it hits the db server, it will get logged.

log_min_duration_statement logs the query only after it ended (because it needs the time it took to run).

So, if you have long running query now, don't look for it in logs, unless you have log_statement configured so that it should get logged by it.

There is also one more note. Please see that the format of logged messages for log_min_duration_statement is:

duration: "time-in-ms" ms  statement: "query"

It's not always like this. If the query happened to be logged by log_statement before, it will not be included in the line emitted by log_min_duration_statement:

>$ select name, setting from pg_settings where name in ('log_min_duration_statement', 'log_duration', 'log_statement');
            name            | setting 
----------------------------+---------
 log_duration               | off
 log_min_duration_statement | 50
 log_statement              | mod
(3 rows)
 
$ insert into x (id) select generate_series(10,500000);
INSERT 0 499991

Logged:

LOG:  statement: insert into x (id) select generate_series(10,500000);
LOG:  duration: 5134.431 ms

That is – PostgreSQL remembered that query was already logged, so it didn't pollute the logs with redundant info.

Same thing happens with log_duration – if log_duration notices that log_min_duration_statement already logged time, it doesn't add another line with repeated time.

So, with all this settings, one can easily setup Pg that it will log all modifications (for audit purposes), plus, for performance-analysis reason, all queries longer than some time. This some time will vary from situation to situation, but generally in case of web apps, you shouldn't have queries running for over 100ms. On the other hand in case of data warehouses queries running couple of hours can be seen as normal.

Now, you could notice one important thing in all previous examples. The logs are without any time marks or any other meta information – I mean in the stderr mode (either with or without logging_collector). It's because PostgreSQL is very flexible about what you can log with all your lines.

There is option: log_line_prefix, which contains format string for prefix that will be used for every logged message.

Possible options:

  • %a – application name
  • %u – user name
  • %d – database name
  • %r – remote host and port
  • %h – remote host
  • %p – process ID
  • %t – timestamp without milliseconds
  • %m – timestamp with milliseconds
  • %i – command tag
  • %e – SQL state
  • %c – session ID
  • %l – session line number
  • %s – session start timestamp
  • %v – virtual transaction ID
  • %x – transaction ID (0 if none)
  • %q – stop here in non-session processes
  • %% – ‘%'

That's a lot of possibilities.

A bit more detailed description: application_name is new option in 9.0, described with a bit more details in my previous blogpost.

user name – it will be either name of logged in user, or empty string if there is no logged user for this line (for example messages generated by connections, before authentication, or autovacuum, or checkpoints).

database name – name of database that given session is connected to. Again – can be empty if there is no database selected yet.

remote host and port – it either contains string [local] if the connection is over unix socket, or it contains ip address of client, together with port number it used to connect to postgresql (not the port PostgreSQL is listening on, but the port that got opened for the connection on the other end).

process ID – basically PID of backend process.

command tag – name of the command that is being run – ALTER, SELECT, UPDATE and so on. Just the command itself. In case of not-query lines (like connection information) it contains some other defined strings (for example: “authentication" when authenticating user).

SQL state – error code (from this list). 0000 means everything is OK.

Session ID – strange string, like: “4dc132c9.5a49″ which contains (encoded) session start time, and backend pid.

Session line number – sequential number, for given session, which line it is. Every line get's it incremented by 1, which can easily show if some log lines were removed.

Session start timestamp – timestamp, with 1-second precision, with information when current session started.

Virtual transaction ID – value like “2/4″ – not really useful unless you need some debugging of PostgreSQL code itself.

Transaction ID – since PostgreSQL 8.3 (I think), Not every command/query gets automatically assigned transaction ID. If the query doesn't change anything, and it's not in a long transaction – the transaction ID is “0", which just means that there is no transaction ID given. But when transaction ID is finally given (for example because you're changing stuff in DB), it is just integer.

%q is used to make the line prefix different for session backends (i.e. normal ones), and non-sessions – like autovacuum, or background writer.

%% is used if you'd like to have “%" character somewhere in your line prefix.

Final note on log_line_prefix: usually people add timestamp at the beginning, with %t. While it look ok, it makes certain analysis impossible, so please, use %m instead – it will show the same timestamp, but with better precision.

My favorite log_line_prefix is ‘%m %u@%d %p %r ‘. Please note the trailing space. If I'll not add it, log line will look like this:

2011-05-01 13:18:38.227 CEST depesz@postgres 28202 [local]LOG:  duration: 100.956 ms  statement: select pg_sleep(0.1);

With this space however it will be:

2011-05-01 13:19:20.467 CEST depesz@postgres 28454 [local] LOG:  duration: 100.991 ms  statement: select pg_sleep(0.1);

(note the space separating [local] from “LOG:").

Now, that we have logging configured we can add some more information to it.

First – log_connections – for every connection it logs 2 lines:

2011-05-01 13:27:11.168 CEST [unknown]@[unknown] 31367  LOG:  connection received: host=[local]
2011-05-01 13:27:11.169 CEST depesz@postgres 31367 [local] LOG:  connection authorized: user=depesz database=postgres

First show when the connection was received, and second shows which user logged in, and which database it connected to.

In case I didn't log (bad password for example, I'll get:

2011-05-01 13:30:33.227 CEST [unknown]@[unknown] 32604  LOG:  connection received: host=127.0.0.1 port=60868
2011-05-01 13:30:33.227 CEST depesz@postgres 32604 127.0.0.1(60868) FATAL:  password authentication failed for user "depesz"

Next, log_disconnections, after every session disconnects, it shows some summary information, like this:

2011-05-01 13:30:19.826 CEST depesz@postgres 32167 [local] LOG:  disconnection: session time: 0:00:58.917 user=depesz database=postgres host=[local]

log_hostname turns on internal rev-dns queries. Which means that instead of having IP logged, I will have hostname, like this:

2011-05-01 13:33:34.088 CEST depesz@postgres 1202 blob(60888) LOG:  connection authorized: user=depesz database=postgres

If you're on Linux, and you have update_process_title enabled, it will also show hostname instead of ip in process title:

  PID CMD
 1634 postgres: depesz postgres blob(60895) idle

vs.

  PID CMD
 1705 postgres: depesz postgres 127.0.0.1(37213) idle

While it might look nice, please remember that it adds additional cost to connections (dns query), which I personally don't think is worth it.

On the other hand, follow 3 settings should be (in my, not so humble, opinion) turned on by default:

  • log_checkpoints – whenever checkpoint happens, it will log two lines – when it starts, and when it ends. It looks like this:
    2011-05-01 13:34:07.167 CEST @ 32084  LOG:  checkpoint starting: time
    2011-05-01 13:36:04.422 CEST @ 32084  LOG:  checkpoint complete: wrote 586 buffers (19.1%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=116.957 s, sync=0.087 s, total=117.254 s

    Of course these lines can be separated by some other, but they provide important insight into how checkpoints are working, and if they are too big of a burden for other work.

  • log_lock_waits – logs every lock that backend was waiting too long for. Too long is defined as – longer than deadlock_timeout (1 second by default). To make it happen, let's do:
    // Session 1:
    $ begin;
    BEGIN
     
    *$ truncate x;
    TRUNCATE TABLE
     
    // and keep it open
     
    // Session 2:
    $ select count(*) from x;
    // it waits now for a lock
     
    // wait couple of seconds
     
    // Session 2:
    *$ ROLLBACK;
    ROLLBACK;
     
    // Session 1:
    // Finished, and gave me the count

    After all of this, in logs I saw:

    2011-05-01 13:47:23.900 CEST depesz@postgres 6507 [local] STATEMENT:  $ select count(*) from x;
    2011-05-01 13:47:27.040 CEST depesz@postgres 6507 [local] LOG:  process 6507 still waiting for AccessShareLock on relation 16386 of database 11874 after 1000.027 ms at character 22
    2011-05-01 13:47:27.040 CEST depesz@postgres 6507 [local] STATEMENT:  select count(*) from x;
    2011-05-01 13:48:09.006 CEST depesz@postgres 6507 [local] LOG:  process 6507 acquired AccessShareLock on relation 16386 of database 11874 after 42966.114 ms at character 22
    2011-05-01 13:48:09.006 CEST depesz@postgres 6507 [local] STATEMENT:  select count(*) from x;
    2011-05-01 13:48:09.612 CEST depesz@postgres 6507 [local] LOG:  duration: 43585.964 ms  statement: select count(*) from x;

    Please note that I got information after the query has been waiting for 1 second, and then another one when it finally got the lock, after 43 seconds.

  • log_temp_files – numeric value, in kilobytes, means – log information about every temporary file that get's created that is of this size, or larger. Personally I set it to 0, to get information about all temporary files. Please note that it's not related to temp tables. Temporary files are created by PostgreSQL when it deals with memory intensive operations (sort/hash/group) which do not fit in memory that it can use – limited by work_mem GUC.

    For example. My work_mem is 1MB. So when I'll sort some dataset larger than this, it will use temporary files. Quick test:

    $ select count(*) from (select i from generate_series(1,1000000) i order by i desc limit 500000 offset 0) x;
     count  
    --------
     500000
    (1 row)

    Since it had to sort 1 million integers it used temporary files. And the logs confirm:

    LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp6507.3", size 14008320
    STATEMENT:  select count(*) from (select i from generate_series(1,1000000) i order by i desc limit 500000 offset 0) x;
    LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp6507.2", size 14000000
    STATEMENT:  select count(*) from (select i from generate_series(1,1000000) i order by i desc limit 500000 offset 0) x;
    LOG:  duration: 1385.628 ms  statement: select count(*) from (select i from generate_series(1,1000000) i order by i desc limit 500000 offset 0) x;

    (removed prefixes here as they are useless in this case).

    As we can see PostgreSQL had actually to create 2 temporary files, with total size of ~ 28MB. That's not much, but if I'd set larger work_mem:

    $ set work_mem = '10MB';
    SET
     
    $ select count(*) from (select i from generate_series(1,1000000) i order by i desc limit 500000 offset 0) x;

    Logs show immediately:

    LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp6507.5", size 13950976
    STATEMENT:  select count(*) from (select i from generate_series(1,1000000) i order by i desc limit 500000 offset 0) x;
    LOG:  duration: 1112.501 ms  statement: select count(*) from (select i from generate_series(1,1000000) i order by i desc limit 500000 offset 0) x;

    Only one temp file, with size being only ~ 14MB.

    Sizes of temporary files vary, but generally – when Pg is creating them, you know that you can make the query faster by giving it more memory.

Now, to the last two (log_*) options in postgresql.conf “What to log" section.

log_timezone. Never saw anyone using it, but generally it lets you have the logs with timestamps that are logged in different timezone than the server is.

For example, When I log something now, I get:

2011-05-01 14:02:51.158 CEST [unknown]@[unknown] 12104  LOG:  connection received: host=[local]

But when I'll set log_timezone to ‘America/New_York', I will get:

2011-05-01 08:03:53.938 EDT [unknown]@[unknown] 12476  LOG:  connection received: host=[local]

I think that it could be useful in some cases, but timezones are confusing enough without using non-server timezone in server logs.

Last “what to log" (log_*) option is “log_error_verbosity". It has 3 settings: terse, default and verbose. I think it will be best to show the difference. So I created view on table x, and then tried to drop table x, with modified verbosity. Results:

  • log_error_verbosity = ‘terse':
    ERROR:  cannot drop table x because other objects depend on it
    STATEMENT:  drop table x;
  • log_error_verbosity = ‘default':
    ERROR:  cannot drop table x because other objects depend on it
    DETAIL:  view a depends on table x
    HINT:  Use DROP ... CASCADE to drop the dependent objects too.
    STATEMENT:  drop table x;
  • log_error_verbosity = ‘verbose':
    LOG:  00000: statement: drop table x;
    LOCATION:  exec_simple_query, postgres.c:900
    ERROR:  2BP01: cannot drop table x because other objects depend on it
    DETAIL:  view a depends on table x
    HINT:  Use DROP ... CASCADE to drop the dependent objects too.
    LOCATION:  reportDependentObjects, dependency.c:914
    STATEMENT:  drop table x;

Generally, for logs, I think that you might go with default – mostly because it shows “DETAIL" line, which is quite often very important.

Within the “What to log" section of postgresql.conf, there are also four debug_* options.

One of them – debug_pretty_print doesn't log anything, but changes what is being logged by the other three.

The other three (debug_print_parse, debug_print_rewritten and debug_print_plan) log internal representation of query tree after parse, rewrite (for example if views are used), and then final plan for query. But not in a form for the normal people.

For example. Data logged for query “SELECT 1″ looks like this:

2011-05-01 08:20:33.236 EDT depesz@postgres 18322 [local] LOG:  parse tree:
2011-05-01 08:20:33.236 EDT depesz@postgres 18322 [local] DETAIL:     {QUERY 
           :commandType 1 
           :querySource 0 
           :canSetTag true 
           :utilityStmt &lt;> 
           :resultRelation 0 
           :intoClause &lt;> 
           :hasAggs false 
           :hasWindowFuncs false 
           :hasSubLinks false 
           :hasDistinctOn false 
           :hasRecursive false 
           :hasForUpdate false 
           :cteList &lt;> 
           :rtable &lt;> 
           :jointree 
              {FROMEXPR 
              :fromlist &lt;> 
              :quals &lt;>
              }
           :targetList (
              {TARGETENTRY 
              :expr 
                 {CONST 
                 :consttype 23 
                 :consttypmod -1 
                 :constlen 4 
                 :constbyval true 
                 :constisnull false 
                 :location 7 
                 :constvalue 4 [ 1 0 0 0 0 0 0 0 ]
                 }
              :resno 1 
              :resname ?column? 
              :ressortgroupref 0 
              :resorigtbl 0 
              :resorigcol 0 
              :resjunk false
              }
           )
           :returningList &lt;> 
           :groupClause &lt;> 
           :havingQual &lt;> 
           :windowClause &lt;> 
           :distinctClause &lt;> 
           :sortClause &lt;> 
           :limitOffset &lt;> 
           :limitCount &lt;> 
           :rowMarks &lt;> 
           :setOperations &lt;>
           }

Rewritten query is in this case the same, and plan looks like:

2011-05-01 08:20:33.236 EDT depesz@postgres 18322 [local] STATEMENT:  select 1;
2011-05-01 08:20:33.236 EDT depesz@postgres 18322 [local] LOG:  plan:
2011-05-01 08:20:33.236 EDT depesz@postgres 18322 [local] DETAIL:     {PLANNEDSTMT 
           :commandType 1 
           :hasReturning false 
           :canSetTag true 
           :transientPlan false 
           :planTree 
              {RESULT 
              :startup_cost 0.00 
              :total_cost 0.01 
              :plan_rows 1 
              :plan_width 0 
              :targetlist (
                 {TARGETENTRY 
                 :expr 
                    {CONST 
                    :consttype 23 
                    :consttypmod -1 
                    :constlen 4 
                    :constbyval true 
                    :constisnull false 
                    :location 7 
                    :constvalue 4 [ 1 0 0 0 0 0 0 0 ]
                    }
                 :resno 1 
                 :resname ?column? 
                 :ressortgroupref 0 
                 :resorigtbl 0 
                 :resorigcol 0 
                 :resjunk false
                 }
              )
              :qual &lt;> 
              :lefttree &lt;> 
              :righttree &lt;> 
              :initPlan &lt;> 
              :extParam (b)
              :allParam (b)
              :resconstantqual &lt;>
              }
           :rtable &lt;> 
           :resultRelations &lt;> 
           :utilityStmt &lt;> 
           :intoClause &lt;> 
           :subplans &lt;> 
           :rewindPlanIDs (b)
           :rowMarks &lt;> 
           :relationOids &lt;> 
           :invalItems &lt;> 
           :nParamExec 0
           }

This would finish this blogpost, if it wasn't for 4 options in “Statistics Monitoring" section.

These options, while not useful on a daily basis, prove extremely useful when testing some strangely slow queries.

First of all: log_parser_stats. If I'll turn it on, and issue relatively simple query:

SELECT n.nspname as "Schema",
  c.relname as "Name",
  CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' END as "Type",
  pg_catalog.pg_get_userbyid(c.relowner) as "Owner"
FROM pg_catalog.pg_class c
     LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind IN ('r','v','S','')
      AND n.nspname <> 'pg_catalog'
      AND n.nspname <> 'information_schema'
      AND n.nspname !~ '^pg_toast'
  AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 1,2;

In my logs it added:

2011-05-01 08:32:38.146 EDT depesz@postgres 22944 [local] LOG:  PARSER STATISTICS
2011-05-01 08:32:38.146 EDT depesz@postgres 22944 [local] DETAIL:  ! system usage stats:
        !       0.000230 elapsed 0.000000 user 0.000000 system sec
        !       [0.000000 user 0.000000 sys total]
        !       0/0 [0/0] filesystem blocks in/out
        !       0/94 [0/782] page faults/reclaims, 0 [0] swaps
        !       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
        !       0/0 [2/0] voluntary/involuntary context switches
2011-05-01 08:32:38.147 EDT depesz@postgres 22944 [local] LOG:  PARSE ANALYSIS STATISTICS
2011-05-01 08:32:38.147 EDT depesz@postgres 22944 [local] DETAIL:  ! system usage stats:
        !       0.000844 elapsed 0.000000 user 0.000000 system sec
        !       [0.000000 user 0.000000 sys total]
        !       0/0 [0/0] filesystem blocks in/out
        !       0/155 [0/941] page faults/reclaims, 0 [0] swaps
        !       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
        !       0/0 [2/0] voluntary/involuntary context switches
2011-05-01 08:32:38.147 EDT depesz@postgres 22944 [local] LOG:  REWRITER STATISTICS
2011-05-01 08:32:38.147 EDT depesz@postgres 22944 [local] DETAIL:  ! system usage stats:
        !       0.000007 elapsed 0.000000 user 0.000000 system sec
        !       [0.000000 user 0.000000 sys total]
        !       0/0 [0/0] filesystem blocks in/out
        !       0/2 [0/943] page faults/reclaims, 0 [0] swaps
        !       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
        !       0/0 [2/0] voluntary/involuntary context switches

That's a lot of information. What's worse – it's not really well documented. The best thing I could find was source.

But. Even without it, we see information about time elapsed in parser itself, in parser analytics, and in rewriter. Wee if there were any IO operations, how many context switches happened and other information that can be useful for pinpointing problems (I saw a query once that had parsing time ~ 50% of whole query time.

When I enable log_planner_stats, and run the same query I get:

2011-05-01 08:41:22.779 EDT depesz@postgres 26153 [local] LOG:  PLANNER STATISTICS
2011-05-01 08:41:22.779 EDT depesz@postgres 26153 [local] DETAIL:  ! system usage stats:
        !       0.000925 elapsed 0.000000 user 0.000000 system sec
        !       [0.000000 user 0.000000 sys total]
        !       0/0 [0/224] filesystem blocks in/out
        !       0/163 [0/1188] page faults/reclaims, 0 [0] swaps
        !       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
        !       0/0 [1/0] voluntary/involuntary context switches

and with log_executor_stats:

2011-05-01 08:42:21.000 EDT depesz@postgres 26476 [local] LOG:  EXECUTOR STATISTICS
2011-05-01 08:42:21.000 EDT depesz@postgres 26476 [local] DETAIL:  ! system usage stats:
        !       0.001363 elapsed 0.000000 user 0.000000 system sec
        !       [0.000000 user 0.000000 sys total]
        !       0/0 [0/224] filesystem blocks in/out
        !       0/86 [0/1293] page faults/reclaims, 0 [0] swaps
        !       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
        !       0/0 [1/0] voluntary/involuntary context switches

The last of those options is log_statement_stats. It cannot be enabled together with any other log_*_stats (the other 3 can be combined), but when I'll disable all others, and enable just log_statement_stats, I get:

2011-05-01 08:45:38.840 EDT depesz@postgres 27689 [local] LOG:  QUERY STATISTICS
2011-05-01 08:45:38.840 EDT depesz@postgres 27689 [local] DETAIL:  ! system usage stats:
        !       0.003925 elapsed 0.000000 user 0.000000 system sec
        !       [0.000000 user 0.000000 sys total]
        !       0/0 [0/224] filesystem blocks in/out
        !       0/526 [0/1293] page faults/reclaims, 0 [0] swaps
        !       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
        !       0/0 [1/0] voluntary/involuntary context switches

This is global information per query. We don't know how much time was spent in any of the modules (parser, planner, executor), but we see the global information.

The cool thing about log_*_stats is that while it's too much information to run by default on any production server, it can be changed on-the-fly, per session (as long as you have superuser account). So when you'll see some weirdly slow query, even on production, you can connect with psql, set some log_*_stats to true (it will not change the value of the gucs for other sessions), and run the query to get more data. And all other connections/backends will be working as normal without flooding logs with this additional info.

And that's all folks.

Of course – remember that a lot of things can be set per user or per database. For example – you might have general situation where you don't care about queries, but one particular database is very important, and should be fast, so you do:

ALTER DATABASE important SET log_min_duration_statement = 30;

And all new connections to this database will log all queries over 30ms.

If you need more details – please check the fine manual.

 

 

 

参考:

http://www.depesz.com/2011/05/06/understanding-postgresql-conf-log/

转载于:https://www.cnblogs.com/xiaotengyi/p/4658929.html

<think>我们正在验证Nacos Docker集群配置,特别是关于集群模式(MODE=cluster)、集群节点列表(NACOS_SERVERS)、cluster.conf文件以及外部PostgreSQL驱动挂载的正确性。 根据用户提供的引用和问题,我们将从以下几个方面进行验证: 1. **环境变量配置**:确保设置了`MODE=cluster`和`NACOS_SERVERS`(节点列表)。 2. **cluster.conf文件**:在Nacos集群中,节点列表也可以通过cluster.conf文件配置,但需要注意在Docker环境中如何挂载。 3. **PostgreSQL驱动挂载**:由于Nacos默认只支持MySQL,如果需要使用PostgreSQL,需要挂载外部驱动并修改配置。 4. **端口映射**:Nacos 2.x版本需要额外开放gRPC端口(9848和9849),以及主端口(8848)和用于raft的端口(7848)?但根据引用[1],实际映射的端口是8858(对应容器内8848?)和9858、9859(对应容器内9848和9849?)。注意:Nacos 2.x版本需要开放主端口(默认8848)和gRPC端口(默认9848,以及9849用于raft?)但实际配置中,我们通常将主端口映射为宿主机端口(如8858)和gRPC端口(9858、9859)。注意,在集群内部通信时,节点间需要能够通过容器内端口(8848,9848,9849)通信。 根据引用[1]中的命令,我们注意到: - 映射了8858(宿主机)到容器内的8848(因为Nacos默认端口是8848,所以这里应该是将容器内的8848映射到宿主机的8858) - 同时映射了9858和9859(宿主机)到容器内的9858和9859?实际上,在Nacos 2.0之后,除了8848端口,还需要两个额外的端口用于gRPC通信,分别是9848和9849(或者9848用于客户端gRPC,9849用于集群节点间gRPC?)。但根据官方文档,Nacos2.0开始,需要开放主端口(默认8848)和偏移量1000和1001的端口(即9848和9849)。所以,在容器启动时,需要将容器内的8848、9848、9849都映射到宿主机。但注意,在同一个宿主机上不能同时映射相同的端口,所以不同节点需要映射不同的宿主机端口。 然而,引用[1]中第二个节点的命令: - `-p 8858:8858` 这里可能写错了?因为容器内Nacos服务端口是8848,所以应该是`-p 8858:8848`,将宿主机的8858映射到容器的8848。 - 同理,gRPC端口应该是`-p 9848:9848`和`-p 9849:9849`,但这里映射的是`9858:9858`和`9859:9859`,这显然不对。因为容器内Nacos服务监听的gRPC端口是9848和9849,所以应该将宿主机的端口映射到容器内的9848和9849。例如,第一个节点可以映射为`-p 8848:8848 -p 9848:9848 -p 9849:9849`,第二个节点则映射为`-p 8858:8848 -p 9858:9848 -p 9859:9849`,以此类推。 但是,在引用[1]中,命令是: `-p 8858:8858` 和 `-p 9858:9858`,`-p 9859:9859` 这显然是不正确的,因为容器内的Nacos服务并不会监听8858、9858、9859,而是监听8848、9848、9849。所以正确的映射应该是: `-p 8858:8848 -p 9858:9848 -p 9859:9849` 因此,在验证配置时,我们需要检查端口映射是否正确。 另外,关于集群节点列表的配置,有两种方式: a. 通过环境变量`NACOS_SERVERS`,如:`-e NACOS_SERVERS=192.168.0.111:8848,192.168.0.112:8848,192.168.0.113:8848` b. 通过挂载`cluster.conf`文件,在文件中列出所有节点的IP和端口(容器内端口,即8848)。 注意:在Docker集群部署中,节点间通信需要使用容器内IP或容器名,但这里我们使用宿主机的IP(通过`NACOS_SERVER_IP`指定)和容器内端口(8848)。所以,在`cluster.conf`中应该写宿主机的IP吗?不,应该写容器内能够互相访问的IP。但是,在Docker中,每个容器有自己的IP,而且这些IP通常是不固定的。因此,我们通常使用宿主机IP,并通过`NACOS_SERVER_IP`环境变量指定当前节点的宿主机IP,然后在`cluster.conf`或`NACOS_SERVERS`中配置所有节点的宿主机IP和容器内端口(8848)。但是,节点间通信时,需要确保这些宿主机IP在容器内可访问(即宿主机的网络需要允许容器访问)。 引用[5]提到,Nacos会读取`cluster.conf`文件,并输出日志。但具体如何配置,需要看Nacos的启动脚本。 关于PostgreSQL驱动挂载: - 默认Nacos只支持MySQL,要支持PostgreSQL,需要将PostgreSQL的JDBC驱动挂载到容器的指定目录(如`/home/nacos/plugins/postgresql`或者`/home/nacos/plugins/cmdb`?)。根据引用[3],我们需要修改适配器。通常,我们可以将驱动挂载到容器的`/home/nacos/plugins/postgresql`目录(如果该目录不存在,可以创建)。同时,在`application.properties`中配置数据库postgresql。 验证步骤: 1. **环境变量验证**: - 确保每个容器都设置了`MODE=cluster` - 设置`NACOS_SERVER_IP`为当前节点宿主机的IP(注意:这个IP必须是其他节点容器内可以访问的IP) - 设置`NACOS_SERVERS`为所有节点的宿主机IP和容器内端口(8848),如:`192.168.0.111:8848,192.168.0.112:8848,192.168.0.113:8848` 2. **cluster.conf文件验证**(如果使用文件配置): - 在挂载的conf目录下创建`cluster.conf`文件,内容为每行一个节点的地址(宿主机IP:容器内端口),如: 192.168.0.111:8848 192.168.0.112:8848 192.168.0.113:8848 - 确保该文件挂载到容器内的`/home/nacos/conf/cluster.conf` 3. **PostgreSQL驱动挂载**: - 将PostgreSQL的JDBC驱动(如`postgresql-42.5.4.jar`)挂载到容器内的`/home/nacos/plugins/postgresql`目录(如果没有该目录,需要创建)。注意:在Nacos中,插件目录是`/home/nacos/plugins`,然后我们可以创建子目录,如`postgresql`,并将驱动放入其中。但是,根据Nacos的文档,驱动应该放在`/home/nacos/plugins`目录下,并且需要修改`application.properties`来指定数据源。 - 在`application.properties`中配置: spring.datasource.platform=postgresql db.num=1 db.url.0=jdbc:postgresql://<数据库地址>:5432/nacos?<参数> db.user=用户名 db.password=密码 4. **端口映射验证**: - 确保每个容器都将容器内的8848端口映射到宿主机的不同端口(如节点1:8848->8848,节点2:8858->8848,节点3:8868->8848) - 确保每个容器都将容器内的9848端口映射到宿主机的不同端口(如节点1:9848->9848,节点2:9858->9848,节点3:9868->9848) - 确保每个容器都将容器内的9849端口映射到宿主机的不同端口(如节点1:9849->9849,节点2:9859->9849,节点3:9869->9849) 5. **日志验证**: - 启动容器后,查看日志,确认节点是否成功加入集群。可以检查日志中是否有`The server IP list of Nacos is ...`的输出,并确认列表正确。 - 检查是否有数据库连接成功的日志。 6. **访问验证**: - 通过任意节点的Web界面(宿主机IP:映射的8848端口)访问,查看集群管理中的节点列表是否包含所有节点。 7. **注意版本问题**: - 根据引用[4],如果出现`Nacos cluster is running with 1.X mode`的错误,说明集群模式配置不正确,可能是版本不兼容或配置错误。确保所有节点使用相同的Nacos版本(推荐2.0.4以上),并且正确配置了双写关闭(如果从1.x升级)或者直接使用2.x模式。 8. **关于双写(Double write)**: - 在Nacos 2.x中,默认开启了双写(兼容1.x和2.x客户端)。如果确定所有客户端都支持2.x,可以关闭双写以提升性能。关闭方法:在`application.properties`中设置`nacos.core.protocol.raft.data.doubleWrite.enabled=false`。 9. **挂载目录权限**: - 确保挂载的本地目录(如logs, data, conf)有足够的权限,否则容器可能启动失败。 10. **PostgreSQL驱动版本**: - 确保使用的PostgreSQL驱动版本与数据库版本兼容。 根据以上步骤,我们可以验证Nacos Docker集群配置是否正确。 下面是一个示例的docker run命令(以第二个节点为例,假设宿主机IP为192.168.0.111,数据库地址为192.168.0.100:5432): ```bash docker run \ --name nacos-server-2 \ -d \ --privileged=true \ --restart=always \ -p 8858:8848 \ -p 9858:9848 \ -p 9859:9849 \ -e MODE=cluster \ -e NACOS_SERVER_IP=192.168.0.111 \ -e NACOS_SERVERS="192.168.0.111:8848,192.168.0.112:8848,192.168.0.113:8848" \ -e SPRING_DATASOURCE_PLATFORM=postgresql \ -e DB_NUM=1 \ -e DB_URL_0="jdbc:postgresql://192.168.0.100:5432/nacos?characterEncoding=utf8&connectTimeout=1000&socketTimeout=3000&autoReconnect=true&useUnicode=true&useSSL=false" \ -e DB_USER=postgres \ -e DB_PASSWORD=postgres \ -v /root/docker/cloud/nacos-cluster/nacos2/logs:/home/nacos/logs \ -v /root/docker/cloud/nacos-cluster/nacos2/data:/home/nacos/data \ -v /root/docker/cloud/nacos-cluster/nacos2/conf:/home/nacos/conf \ -v /root/docker/cloud/nacos-cluster/nacos2/plugins/postgresql:/home/nacos/plugins/postgresql \ nacos/nacos-server:v2.3.0 ``` 注意:这里我们通过环境变量配置了数据库,同时挂载了PostgreSQL驱动到`/home/nacos/plugins/postgresql`目录。另外,我们也可以将数据库配置放在挂载的`application.properties`文件中,这样更灵活。 如果使用挂载的`application.properties`文件,则不需要通过环境变量传递数据库配置,只需在文件中配置即可。 最后,我们总结一下验证步骤: 1. 检查环境变量(MODE, NACOS_SERVER_IP, NACOS_SERVERS)是否正确设置。 2. 检查端口映射是否正确(容器内8848,9848,9849分别映射到宿主机的不同端口)。 3. 检查挂载的`cluster.conf`文件(如果使用)内容是否正确。 4. 检查PostgreSQL驱动是否挂载到正确位置,并且`application.properties`中数据库配置正确。 5. 启动后检查日志,确认集群节点列表和数据库连接情况。 6. 通过Web界面验证集群状态。 如果以上步骤都正确,那么Nacos Docker集群应该能够正常运行。 注意:在集群模式下,节点间需要时间进行选举和数据同步,启动后可能需要等待一段时间才能看到所有节点都健康。 相关问题: 1. 如何解决Nacos集群节点无法加入的问题? 2. 如何为Nacos集群配置PostgreSQL数据源? 3. Nacos 2.x版本在Docker中部署时端口映射的注意事项? 4. 如何通过日志诊断Nacos集群启动问题? 5. Nacos集群中如何关闭双写(Double write)模式?</think>### 验证 Nacos Docker 集群配置的正确性 以下是关键配置项的验证要点和常见问题解决方案: #### ✅ 1. **核心环境变量验证** ```yaml # docker-compose.yml 示例 environment: - MODE=cluster # 必须为 cluster 模式 - NACOS_SERVER_IP=192.168.0.111 # 当前节点宿主机IP - NACOS_SERVERS=192.168.0.111:8848,192.168.0.112:8848,192.168.0.113:8848 # 所有节点IP:端口 - SPRING_DATASOURCE_PLATFORM=postgresql # 指定 PostgreSQL ``` **验证方法**: ```bash docker exec -it nacos-server-2 printenv | grep -E "MODE|NACOS_SERVER_IP|NACOS_SERVERS|SPRING" ``` 预期输出应包含正确配置的集群模式和节点列表[^1]。 --- #### ✅ 2. **cluster.conf 文件验证** ```bash # 挂载路径检查 docker exec -it nacos-server-2 ls /home/nacos/conf/cluster.conf # 文件内容验证 docker exec -it nacos-server-2 cat /home/nacos/conf/cluster.conf ``` **正确内容**: ``` 192.168.0.111:8848 192.168.0.112:8848 192.168.0.113:8848 ``` > **注意**: > - 必须使用**宿主机IP**(非容器IP) > - 端口需与 `server.port` 一致(默认8848) > - 文件需通过 `-v` 挂载到容器内[^1][^5] --- #### ✅ 3. **PostgreSQL 驱动挂载验证** ```bash # 检查驱动挂载路径 docker exec -it nacos-server-2 ls /home/nacos/plugins/postgresql # 预期输出 postgresql-42.5.4.jar # 需与数据库版本匹配 ``` **挂载配置**: ```yaml volumes: - /宿主机路径/postgresql-driver:/home/nacos/plugins/postgresql ``` > **关键点**: > - 驱动需放在 **/home/nacos/plugins/postgresql** 目录 > - 需修改 `application.properties` 中的数据库配置[^3] --- #### ✅ 4. **端口映射验证** ```yaml # 必须开放三个端口 ports: - "8858:8848" # Web 控制台端口 - "9858:9848" # gRPC API 端口 - "9859:9849" # 集群 RPC 端口 ``` **验证命令**: ```bash netstat -tuln | grep -E '8858|9858|9859' ``` > **避免常见错误**: > - 若出现 `Nacos cluster is running with 1.X mode` 错误,说明 **gRPC 端口未正确映射**(9848/9849)[^4] --- #### ✅ 5. **日志诊断步骤** ```bash # 检查集群节点识别 docker logs nacos-server-2 | grep "The server IP list" # 预期输出 The server IP list of Nacos is [192.168.0.111:8848, 192.168.0.112:8848, ...] # 检查数据库连接 docker logs nacos-server-2 | grep "DataSource" ``` **常见错误**: - `depends_on contains invalid type` → 升级 docker-compose 版本[^2] - `No DataSource set` → 检查 PostgreSQL 驱动挂载路径 --- ### 配置验证流程图 ```mermaid graph TD A[启动容器] --> B{检查环境变量} B -->|MODE=cluster| C[验证cluster.conf] B -->|错误| H[修正 docker-compose.yml] C --> D[检查端口映射] D -->|8848+9848+9849| E[验证PostgreSQL驱动] E -->|驱动存在| F[检查节点日志] F -->|'IP list'匹配| G[集群正常运行] E -->|驱动缺失| I[挂载JDBC驱动] ``` ### 最终验证命令 ```bash # 完整验证脚本 docker exec -it nacos-server-2 sh -c ' echo "## 环境变量验证 ##"; printenv | grep -E "MODE|NACOS_|SPRING"; echo "\n## 集群文件验证 ##"; ls -l /home/nacos/conf/cluster.conf; cat /home/nacos/conf/cluster.conf; echo "\n## 驱动验证 ##"; ls -l /home/nacos/plugins/postgresql; echo "\n## 端口监听 ##"; netstat -tuln | grep -E "8848|9848|9849" ' ``` ###
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值