Segment检测及故障切换机制
GP Master首先会检测Primary状态,如果Primary不可连通,那么将会检测Mirror状态,Primary/Mirror状态总共有4种:
- Primary活着,Mirror活着。GP Master探测Primary成功之后直接返回,进行下一个Segment检测;
- Primary活着,Mirror挂了。GP Master探测Primary成功之后,通过Primary返回的状态得知Mirror挂掉了(Mirror挂掉之后,Primary将会探测到,将自己变成ChangeTracking模式),这时候更新Master元信息,进行下一个Segment检测;
- Primary挂了,Mirror活着。GP Master探测Primary失败之后探测Mirror,发现Mirror是活着,这时候更新Master上面的元信息,同时使Mirror接管Primary(故障切换),进行下一个Segment检测;
- Primary挂了,Mirror挂了。GP Master探测Primary失败之后探测Mirror,Mirror也是挂了,直到重试最大值,结束这个Segment的探测,也不更新Master元信息了,进行下一个Segment检测。
上面的2-4需要进行gprecoverseg进行segment恢复。
对失败的segment节点;启动时会直接跳过,忽略。
[gpadmin@mdw ~]$ gpstart
==≥ gpstart:mdw:gpadmin-[INFO]:-Starting gpstart with args:
==≥ gpstart:mdw:gpadmin-[INFO]:-Gathering information and validating the environment...
==≥ gpstart:mdw:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 5.0.0 build 1'
==≥ 。。。。。。。。。。。。。。。。。。。。。。。。。。
==≥ gpstart:mdw:gpadmin-[INFO]:-Master Started...
==≥ gpstart:mdw:gpadmin-[INFO]:-Shutting down master
==≥ gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw2 directory /data/gpdata/gpdatam/gpseg0 <<<<<
==≥ gpstart:mdw:gpadmin-[INFO]:---------------------------
==≥ gpstart:mdw:gpadmin-[INFO]:-Master instance parameters
==≥ gpstart:mdw:gpadmin-[INFO]:---------------------------
==≥ gpstart:mdw:gpadmin-[INFO]:-Database = template1
==≥ gpstart:mdw:gpadmin-[INFO]:-Master Port = 1921
==≥ gpstart:mdw:gpadmin-[INFO]:-Master directory = /data/gpdata/pgmaster/gpseg-1
==≥ gpstart:mdw:gpadmin-[INFO]:-Timeout = 600 seconds
==≥ gpstart:mdw:gpadmin-[INFO]:-Master standby = Off
==≥ gpstart:mdw:gpadmin-[INFO]:---------------------------------------
==≥ gpstart:mdw:gpadmin-[INFO]:-Segment instances that will be started
==≥ gpstart:mdw:gpadmin-[INFO]:---------------------------------------
==≥ gpstart:mdw:gpadmin-[INFO]:- Host Datadir Port Role
==≥ gpstart:mdw:gpadmin-[INFO]:- sdw1 /data/gpdata/gpdatap/gpseg0 40000 Primary
==≥ gpstart:mdw:gpadmin-[INFO]:- sdw2 /data/gpdat