HA Cluster—heartbeat v1基础应用-优快云博客

一、HA Cluster定义

高可用集群，英文原文为High Availability Cluster，简称HA Cluster，简单的说，集群（cluster）就是一组计算机，它们作为一个整体向用户提供一组网络资源。这些单个的计算机系统就是集群的节点（node）。

高可用集群的出现是为了使集群的整体服务尽可能可用，从而减少由计算机硬件和软件易错性所带来的损失。如果某个节点失效，它的备援节点将在几秒钟的时间内接管它的职责。高可用集群软件的主要作用就是实现故障检查和业务切换的自动化。

配置HA Cluster之前需要先了解几个基础定义：

ha_aware:如果一个应用程序自己能够利用底层心跳信息传递层的功能完成集群事务决策的程序；也就是说该程序自己具有HA的能力。反之；称为非ha_aware。

不具有该能力时就需要借助其他程序或组件来完成集群功能：

DC(Designated Coordinator)：收集集群中每一个节点的信息；以此来判断当前的节点数、服务数；以及对应服务运行于哪个节点等相关信息；方便管理；是节点选定的协调员；起承上启下的作用。
CRM(Cluster Resource Manager)：CRM对于资源的配置对应的都有一个脚本或应用程序来执行
RA(Resource Agent)：支持接收四个参数{start|stop|restart|status}；对资源进行管理操作
LRM(Local Resource Manager)：执行资源检测等功能

下面直接上图说明它们之间的关系：

CRM(Cluster Resource Manager)：

heartbeat

heartbeat v1: haresources(配置接口：配置文件；文件名haresoucres)

heartbeat v2: crm(各节点运行进程：crmd(tcp/5560)；客户端:crmsh heartbeat-GUI)

heartbeat v3 = heartbeat + pacemaker + cluster-glue

cman + rgmanager:

resource group manager：Failover Domain(故障转移域)

RHCS(RedHat Cluster Suite)：Conga(完全生命周期的配置接口)

二、安装配置heartbeat v1实现web集群服务：

大致规划：

node1：172.16.251.85

node2：172.16.251.86

node3：172.16.251.87(NFS)

VIP： 172.16.251.111

这里没有用到node3；只是作为NFS使用的。

系统版本：CentOS 6.5 x86_84

有几点需要注意的

节点名称	集群每个节点的名称都需要互相能解析；更改/etc/hosts；需与"uname -n"一致
时间同步	可以使用网络时间服务器同步
密钥认证	节点间需要基于ssh认证；方便管理；并非必须
多节点管理	后续需要可以使用ansible软件来管理；基于ssh密钥认证

 
         [root@node1 heartbeat2]
         # uname -n 
        
         node1.soul.com
        
         [root@node1 heartbeat2]
         # cat /etc/hosts 
        
         127.0
         .
         0.1   
         localhost localhost.localdomain localhost4 localhost4.localdomain4 
        
         ::
         1         
         localhost localhost.localdomain localhost6 localhost6.localdomain6 
        
         172.16
         .
         251.85   
         node1.soul.com  node1    
         #node1为别名；各节点都需要配置 
        
         172.16
         .
         251.86   
         node2.soul.com  node2 
        
         172.16
         .
         251.87   
         node3.soul.com  node3 
        
         [root@node1 heartbeat2]
         # 
        
         #

 
   
         #基于SSH认证
        
 
         [root@node1 ~]
         # ssh-keygen -t rsa 
        
 
         Generating public
         /
         private rsa key pair. 
        
 
         Enter 
         file 
         in 
         which to save the key (
         /
         root
         /
         .ssh
         /
         id_rsa): 
        
 
         Enter passphrase (empty 
         for 
         no passphrase):     
         #一路回车留空即可 
        

         Enter same passphrase again:
        
 
         Your identification has been saved 
         in 
         /
         root
         /
         .ssh
         /
         id_rsa. 
        

         ...
        
 
         [root@node1 ~]
         # ssh-copy-id -i .ssh/id_rsa.pub root@node2 
        

         #第一次传输拷贝时需要输入密码；后续就无需输入了
        

         #在对应的node2上执行与node1一样的操作；两台主机相互做认证
        
 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
        

         #测试
        
 
         [root@node1 ~]
         # ssh node2 'date';date 
        
 
         Fri Apr 
         18 
         19
         :
         17
         :
         30 
         CST 
         2014 
        
 
         Fri Apr 
         18 
         19
         :
         17
         :
         30 
         CST 
         2014 
        
 
         [root@node1 ~]
         # 时间一定需要保持一致 
        

  

安装heartbeat基本软件

 
    
         [root@node1 heartbeat2]
         # ls 
        
 
         heartbeat
         -
         2.1
         .
         4
         -
         12.el6
         .x86_64.rpm    
         #核心程序      
        
 
         heartbeat
         -
         ldirectord
         -
         2.1
         .
         4
         -
         12.el6
         .x86_64.rpm    
         #实现director HA的 
        
 
         heartbeat
         -
         pils
         -
         2.1
         .
         4
         -
         12.el6
         .x86_64.rpm    
         #库文件 
        
 
         heartbeat
         -
         devel
         -
         2.1
         .
         4
         -
         12.el6
         .x86_64.rpm   
         #开发组件 
        
 
         heartbeat
         -
         stonith
         -
         2.1
         .
         4
         -
         12.el6
         .x86_64.rpm    
         #资源管理 
        
 
         heartbeat
         -
         gui
         -
         2.1
         .
         4
         -
         12.el6
         .x86_64.rpm    
         #图形配置界面 
        
 
         [root@node1 heartbeat2]
         # 
        

         #由于centos6.5版本的cluster-glue-libs包与pils冲突；但是又必须依赖pils
        

         #所以这里先用yum解决依赖关系；在rpm安装；目前只安装pils、stonith和核心包
        
 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
        

         # yum install perl-TimeDate net-snmp-libs libnet PyXML
        

         # rpm -ivh heartbeat-2.1.4-12.el6.x86_64.rpm ...
        

         #
        
 
         [root@node1 ~]
         # rpm -ql heartbeat    安装完成查看生成的文件 
        
 
         /
         etc
         /
         ha.d 
        
 
         /
         etc
         /
         ha.d
         /
         README.config 
        
 
         /
         etc
         /
         ha.d
         /
         harc 
        
 
         /
         etc
         /
         ha.d
         /
         rc.d 
        
 
         /
         etc
         /
         ha.d
         /
         rc.d
         /
         ask_resources 
        
 
         /
         etc
         /
         ha.d
         /
         rc.d
         /
         hb_takeover 
        
 
         /
         etc
         /
         ha.d
         /
         rc.d
         /
         ip
         -
         request 
        
 
         /
         etc
         /
         ha.d
         /
         rc.d
         /
         ip
         -
         request
         -
         resp 
        
 
         /
         etc
         /
         ha.d
         /
         rc.d
         /
         status 
        

  

配置heartbeat的认证密码

 
         #拷贝配置文件到/etc/ha.d目录
        
         [root@node1 ~]
         # cp /usr/share/doc/heartbeat-2.1.4/{haresources,authkeys,ha.cf} /etc/ha.d/ 
        
         #haresources    资源管理器
        
         #authkeys       认证文件
        
         #ha.cf          主配置文件
        
         [root@node1 ~]
         # cd /etc/ha.d/ 
        
         [root@node1 ha.d]
         # vim authkeys 
        
         auth 
         2    
         #启用第二项sha1加密 
        
         #1 crc
        
         2 
         sha1 a443995f092077eca1b4    
         #密码 
        
         #3 md5 Hello!
        
         [root@node1 ha.d]
         # chmod 600 authkeys     #更改权限；很重要

主配置文件说明

 
         [root@node1 ha.d]
         # vim ha.cf 
        
         #       File to write other messages to
        
         #
        
         logfile 
         /
         var
         /
         log
         /
         ha
         -
         log    
         #日志文件；开启 
        
         keepalive 
         1                
         #定义规定时间发送心跳信息；单位s 
        
         deadtime  
         5                
         #定义单位时间内对方无心跳信息；stonith对方 
        
         warntime  
         3                
         #警告时间 
        
         initdead 
         60                
         #开机后单位时间后传递心跳信息；单位s 
        
         udpport        
         694         
         #监听的端口 
        
         #bcast                     #广播地址
        
         mcast eth0 
         225.0
         .
         40.1 
         694 
         1 
         0    
         #定义多播地址 
        
         #ucast                     #单播
        
         auto_failback on           
         #故障转回 
        
         node    node1.soul.com     
         #节点数 
        
         node    node2.soul.com
        
         ping    
         172.16
         .
         0.1         
         #ping节点仲裁设备；因为节点为2台 
        
         compression    bz2         
         #压缩传送心跳 
        
         compression_threshold 
         2    
         #2K以上才进行压缩 
        
         #本次这样配置；详细信息文件内有说明

定义cluster服务

 
    
         [root@node1 ha.d]
         # vim haresources 
        

         #定义很简单；在文件末尾添加
        
 
         node1.soul.com  
         172.16
         .
         251.111
         /
         16
         /
         eth0  httpd 
        

         #节点            VIP           掩码 网卡    服务
        

         #具体文件内有例子可以参考
        
 
         [root@node1 ha.d]
         # scp -p haresources ha.cf authkeys node2:/etc/ha.d/ 
        

         #复制到node2上；并保持原有属性
        
 
         haresources                                                           
         100
         % 
         5949     
         5.8KB
         /
         s   
         00
         :
         00 
        
 
         ha.cf                                                                 
         100
         %   
         10KB  
         10.3KB
         /
         s   
         00
         :
         00 
        
 
         authkeys                                                              
         100
         %  
         660     
         0.6KB
         /
         s   
         00
         :
         00 
        
 
         [root@node1 ha.d]
         # 
        

  

配置每个节点的httpd服务

 
         [root@node1 ~]
         # vim /var/www/html/index.html 
        
         <h1>Node1.soul.com<
         /
         h1> 
        
         #这里为了试验；所以设置两个index文件不同；且需要关闭服务开启自动启动
        
         #因为集群会自动控***务启动/关闭
        
         [root@node1 ~]
         # chkconfig httpd off 
        
         [root@node1 ~]
         # 
        
         #node2节点配置
        
         [root@node2 ~]
         # vim /var/www/html/index.html 
        
         <h1>Node2.soul.com<
         /
         h1> 
        
         [root@node2 ~]
         # chkconfig httpd off 
        
         [root@node2 ~]
         # 
        
         #配置好后需要进行测试；这里就不列出

三、启动测试heartbeat

启动测试

 
    
         [root@node1 ~]
         # service heartbeat start 
        
 
         Starting High
         -
         Availability services: 
        
 
         2014
         /
         04
         /
         18_21
         :
         01
         :
         59 
         INFO:  Resource 
         is 
         stopped 
        

         Done.
        

         #启动node1
        
 
         [root@node1 ~]
         # ssh node2 'service heartbeat start' 
        
 
         Starting High
         -
         Availability services: 
        
 
         2014
         /
         04
         /
         18_21
         :
         02
         :
         38 
         INFO:  Resource 
         is 
         stopped 
        

         Done.
        

         #启动node2；这里使用的是别名；可以写完整；node2.soul.com
        
 
         [root@node1 ~]
         # 
        
 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
        
 
         [root@node1 ~]
         # tail -40 /var/log/ha-log 
        
 
         IPaddr[
         12178
         ]:  
         2014
         /
         04
         /
         18_21
         :
         02
         :
         54 
         INFO: Using calculated netmask 
         for 
         172.16
         .
         251.111
         : 
         255.255
         .
         0.0 
        
 
         IPaddr[
         12178
         ]:  
         2014
         /
         04
         /
         18_21
         :
         02
         :
         54 
         INFO: 
         eval 
         ifconfig eth0:
         0 
         172.16
         .
         251.111 
         netmask 
         255.255
         .
         0.0 
         broadcast 
         172.16
         .
         255.255 
        
 
         IPaddr[
         12149
         ]:  
         2014
         /
         04
         /
         18_21
         :
         02
         :
         55 
         INFO:  Success 
        
 
         ResourceManager[
         12055
         ]: 
         2014
         /
         04
         /
         18_21
         :
         02
         :
         55 
         info: Running 
         /
         etc
         /
         init.d
         /
         httpd  start 
        
 
         [root@node1 ~]
         # 可以查看日志信息 
        
 
         [root@node1 ~]
         # ss -tunl | grep 80 
        
 
         tcp    LISTEN     
         0      
         128                   
         :::
         80                   
         :::
         * 
        
 
         tcp    LISTEN     
         0      
         128                   
         :::
         34580                
         :::
         * 
        
 
         [root@node1 ~]
         # 
        
 
         [root@node1 ~]
         # ifconfig | grep "172.16.251.111" 
        
 
                   
         inet addr:
         172.16
         .
         251.111  
         Bcast:
         172.16
         .
         255.255  
         Mask:
         255.255
         .
         0.0 
        

         #测试都启动正常
        

  

因为上面设置的是在node1上；所以显示为node1.

测试转移

 
   
         #在node2上停止node1
        
 
         [root@node1 ~]
         # /usr/share/heartbeat/hb_standby local 
        
 
         2014
         /
         04
         /
         18_21
         :
         51
         :
         04 
         Going standby [local]. 
        
 
         [root@node2 ~]
         # 
        
 
         [root@node2 ~]
         # ifconfig 
        
 
         eth0:
         0    
         Link encap:Ethernet  HWaddr 
         00
         :
         0C
         :
         29
         :DF:
         70
         :B6 
        
 
                   
         inet addr:
         172.16
         .
         251.111  
         Bcast:
         172.16
         .
         255.255  
         Mask:
         255.255
         .
         0.0 
        
 
                   
         UP BROADCAST RUNNING MULTICAST  MTU:
         1500  
         Metric:
         1 
        

         #测试node2上已启动服务和VIP
        

  

四、挂载NFS系统；实现共享存储

共享NFS系统

 
         #新建一个node3；实现NFS；配置信息这里就不做解释
        
         [root@node3 ~]
         # mkdir /www/htdocs -pv 
        
         mkdir: created directory `
         /
         www' 
        
         mkdir: created directory `
         /
         www
         /
         htdocs' 
        
         [root@node3 ~]
         # vim /etc/exports 
        
         /
         www
         /
         htdocs     
         172.16
         .
         0.0
         /
         16
         (rw) 
        
         [root@node3 ~]
         # setfacl -m u:48:rwx /www/htdocs/ 
        
         [root@node3 ~]
         # getfacl /www/htdocs/ 
        
         getfacl: Removing leading 
         '/' 
         from 
         absolute path names 
        
         # file: www/htdocs/
        
         # owner: root
        
         # group: root
        
         user::rwx
        
         user:apache:rwx
        
         [root@node3 ~]
         # vim /www/htdocs/index.html 
        
         <h1>Page 
         in 
         NFS Server<
         /
         h1> 
        
         #
        
         [root@node3 ~]
         # service nfs restart 
        
         Shutting down NFS daemon:                                  [  OK  ]
        
         Shutting down NFS mountd:                                  [  OK  ]
        
         Shutting down NFS quotas:                                  [  OK  ]
        
         Shutting down NFS services:                                [  OK  ]

修改集群管理信息

 
   
         #停止heartbeat服务
        
 
         [root@node1 ~]
         # vim /etc/ha.d/haresources 
        
 
         node1.soul.com  
         172.16
         .
         251.111
         /
         16
         /
         eth0  Filesystem::
         172.16
         .
         251.87
         :
         /
         www
         /
         htdocs::
         /
         var
         /
         www
         /
         html::nfs      httpd 
        

         #
        
 
         [root@node1 ~]
         # scp /etc/ha.d/haresources node2:/etc/ha.d/ 
        

         #复制一份到node2
        
 
         [root@node1 ~]
         # service heartbeat start 
        
 
         Starting High
         -
         Availability services: 
        
 
         2014
         /
         04
         /
         18_21
         :
         26
         :
         50 
         INFO:  Resource 
         is 
         stopped 
        

         Done.
        
 
         [root@node1 ~]
         # ssh node2 'service heartbeat start' 
        
 
         Starting High
         -
         Availability services: 
        
 
         2014
         /
         04
         /
         18_21
         :
         26
         :
         59 
         INFO:  Resource 
         is 
         stopped 
        

         Done.
        
 
         [root@node1 ~]
         # 
        

  

测试

 
         [root@node1 ~]
         # /usr/share/heartbeat/hb_standby local 
        
         2014
         /
         04
         /
         18_21
         :
         45
         :
         23 
         Going standby [local]. 
        
         [root@node1 ~]
         # 
        
         #
        
         #在node2上查看80端口已启动；测试页面正常
        
         [root@node2 ~]
         # ss -tunl | grep 80 
        
         tcp    LISTEN     
         0      
         128                   
         :::
         80                   
         :::
         * 
        
         [root@node2 ~]
         #