· 2015年度PG大象会报名地址:http://postgres2015.eventdove.com/
· PostgreSQL中国社区: http://postgres.cn/index.php/home
· PostgreSQL专业1群: 3336901(已满)
· PostgreSQL专业2群: 100910388
· PostgreSQL专业3群: 150657323
· PostgresChina微信公众号 PostgreSQL用户会微博二维码
一、环境
$ cat /etc/redhat-release
CentOS Linux release 7.0.1406 (Core)
$ uname -a
Linux zhaopin-5-90 3.10.0-123.el7.x86_64 #1 SMP Mon Jun 30 12:09:22 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
$ sudo pcs status
Cluster name: pgcluster
WARNING: corosync and pacemaker node names do not match (IPs used in setup?)
Last updated: Tue Oct 20 11:02:08 2015Last change: Tue Oct 20 10:57:37 2015 by root via crm_attribute on zhaopin-5-92
Stack: corosync
Current DC: zhaopin-5-92 (version 1.1.13-a14efad) - partition with quorum
3 nodes and 5 resources configured
Online: [ zhaopin-5-90 zhaopin-5-91 zhaopin-5-92 ]
Full list of resources:
Master/Slave Set: pgsql-master [pgsql]
Masters: [ zhaopin-5-92 ]
Slaves: [ zhaopin-5-90 zhaopin-5-91 ]
Resource Group: master-group
vip-master(ocf::heartbeat:IPaddr2):Started zhaopin-5-92
Resource Group: slave-group
vip-slave(ocf::heartbeat:IPaddr2):Started zhaopin-5-90
PCSD Status:
zhaopin-5-90 (172.17.5.90): Online
zhaopin-5-91 (172.17.5.91): Online
zhaopin-5-92 (172.17.5.92): Online
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
node1: 172.17.5.90
node2: 172.17.5.91
node3: 172.17.5.92
node4: 172.17.5.93 应急
vip-master: 172.17.5.99
vip-slave: 172.17.5.98
二、PostgreSQL故障
1. 单节点故障
1)master故障
在node3上执行:
$ ps aux | grep postgres | grep -v grep
postgres 29482 0.0 0.2 103216 7900 ? S 10:57 0:00 /usr/bin/postgres -D /data/postgresql/data -c config_file=/data/postgresql/data/postgresql.conf
postgres 29524 0.0 0.0 103216 1980 ? Ss 10:57 0:00 postgres: checkpointer process
postgres 29525 0.0 0.0 103216 1988 ? Ss 10:57 0:00 postgres: writer process
postgres 29526 0.0 0.0 103216 1896 ? Ss 10:57 0:00 postgres: wal writer process
postgres 29527 0.0 0.0 104044 3008 ? Ss 10:57 0:00 postgres: autovacuum launcher process
postgres 29528 0.0 0.0 88912 1792 ? Ss 10:57 0:00 postgres: archiver process last was 00000001000000000000000E
postgres 29529 0.0 0.0 89024 1920 ? Ss 10:57 0:00 postgres: stats collector process
postgres 29790 0.0 0.0 104060 3208 ? Ss 10:57 0:00 postgres: wal sender process postgres 172.17.5.90(35734) streaming 0/F0000E0
postgres 29791 0.0 0.0 104192 3392 ? Ss 10:57 0:00 postgres: wal sender process postgres 172.17.5.91(60722) streaming 0/F0000E0
$ psql -U postgres
psql (9.2.13)
Type "help" for help.
postgres=# select * from pg_stat_replication ;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | state | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state
-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
29790 | 10 | postgres | zhaopin-5-90 | 172.17.5.90 | | 35734 | 2015-10-20 02:57:25.718047+00 | streaming | 0/F0000E0 | 0/F0000E0 | 0/F0000E0 | 0/F0000E0 | 1 | sync
29791 | 10 | postgres | zhaopin-5-91 | 172.17.5.91 | | 60722 | 2015-10-20 02:57:25.718566+00 | streaming | 0/F0000E0 | 0/F0000E0 | 0/F0000E0 | 0/F0000E0 | 2 | potential
(2 rows)
postgres=# \q
杀掉PostgreSQL的进程
$ sudo killall postgres
$ ps aux | grep postgres | grep -v grep
PostgreSQL的进程已经没有了
$ sudo pcs status
Cluster name: pgcluster
WARNING: corosync and pacemaker node names do not match (IPs used in setup?)
Last updated: Tue Oct 20 11:11:54 2015Last change: Tue Oct 20 11:10:31 2015 by root via crm_attribute on zhaopin-5-90
Stack: corosync
Current DC: zhaopin-5-92 (version 1.1.13-a14efad) - partition with quorum
3 nodes and 5 resources configured
Online: [ zhaopin-5-90 zhaopin-5-91 zhaopin-5-92 ]
Full list of resources:
Master/Slave Set: pgsql-master [pgsql]
Masters: [ zhaopin-5-90 ]
Slaves: [ zhaopin-5-91 ]
Stopped: [ zhaopin-5-92 ]
Resource Group: master-group
vip-master(ocf::heartbeat:IPaddr2):Started zhaopin-5-90
Resource Group: slave-group
vip-slave(ocf::heartbeat:IPaddr2):Started zhaopin-5-91
Failed Actions:
* pgsql_start_0 on zhaopin-5-92 'unknown error' (1): call=33, status=complete, exitreason='My data may be inconsistent. You have to remove /var/lib/pgsql/tmp/PGSQL.lock file to force start.',
last-rc-change='Tue Oct 20 11:10:14 2015', queued=0ms, exec=389ms
PCSD Status:
zhaopin-5-90 (172.17.5.90): Online
zhaopin-5-91 (172.17.5.91): Online
zhaopin-5-92 (172.17.5.92): Online
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
查看集群状态可见,master和vip-master已经切换到node1上了
在node1上执行:
$ psql -U postgres
psql (9.2.13)
Type "help" for help.
postgres=# select * from pg_stat_replication ;
pid