Keepalived无法绑定VIP故障排查经历

一、 故障描述

我在台湾合作方给定的两台虚拟机上部署HAProxy+Keepalived负载均衡高可用方案。在配置完Keepalived后,重新启动Keepalived,Keepalived没有绑定VIP。

Keepalived执行程序路径为/data/app_platform/keepalived/sbin/keepalived

配置文件路径为/data/app_platform/keepalived/conf/keepalived.conf 

Keepalived的启动脚本为/etc/init.d/keepalived

keepalived.conf的内容

LB1 Master

! Configuration File for keepalived

global_defs {

  notification_email {

    admin@example.com

  }

  notification_email_from lb1@example.com

  smtp_server 127.0.0.1

  smtp_connect_timeout 30

  router_id LB1_MASTER

}

  vrrp_script chk_haproxy {

        script "killall -0 haproxy"

        interval 2

        weight 2

    }

vrrp_instance VI_1 {

    state MASTER

    interface eth1

    virtual_router_id 51

    priority 100

    advert_int 1

    authentication {

        auth_type PASS

        auth_pass 1111

    }

    virtual_ipaddress {

    10.1.1.200/24      brd 10.1.1.255      dev eth1 label eth1:vip

    }

    track_script {

        chk_haproxy

                }

}

重新启动Keepalived查看日志

Mar  3 18:09:00 cv00300005248-1 Keepalived[20138]: Stopping Keepalived v1.2.15 (02/28,2015)

Mar  3 18:09:00 cv00300005248-1 Keepalived[20259]: Starting Keepalived v1.2.15 (02/28,2015)

Mar  3 18:09:00 cv00300005248-1 Keepalived[20260]: Starting Healthcheck child process, pid=20261

Mar  3 18:09:00 cv00300005248-1 Keepalived[20260]: Starting VRRP child process, pid=20262

Mar  3 18:09:00 cv00300005248-1 Keepalived_vrrp[20262]: Registering Kernel netlink reflector

Mar  3 18:09:00 cv00300005248-1 Keepalived_vrrp[20262]: Registering Kernel netlink command channel

Mar  3 18:09:00 cv00300005248-1 Keepalived_vrrp[20262]: Registering gratuitous ARP shared channel

Mar  3 18:09:00 cv00300005248-1 Keepalived_healthcheckers[20261]: Registering Kernel netlink reflector

Mar  3 18:09:00 cv00300005248-1 Keepalived_healthcheckers[20261]: Registering Kernel netlink command channel

Mar  3 18:09:00 cv00300005248-1 Keepalived_healthcheckers[20261]: Configuration is using : 3924 Bytes

Mar  3 18:09:00 cv00300005248-1 Keepalived_healthcheckers[20261]: Using LinkWatch kernel netlink reflector...

Mar  3 18:09:00 cv00300005248-1 Keepalived_vrrp[20262]: Configuration is using : 55712 Bytes

Mar  3 18:09:00 cv00300005248-1 Keepalived_vrrp[20262]: Using LinkWatch kernel netlink reflector...

Mar  3 18:09:18 cv00300005248-1 kernel: __ratelimit: 1964 callbacks suppressed

Mar  3 18:09:18 cv00300005248-1 kernel: Neighbour table overflow.

Mar  3 18:09:18 cv00300005248-1 kernel: Neighbour table overflow.

Mar  3 18:09:18 cv00300005248-1 kernel: Neighbour table overflow.

Mar  3 18:09:18 cv00300005248-1 kernel: Neighbour table overflow.

Mar  3 18:09:18 cv00300005248-1 kernel: Neighbour table overflow.

Mar  3 18:09:18 cv00300005248-1 kernel: Neighbour table overflow.

Mar  3 18:09:18 cv00300005248-1 kernel: Neighbour table overflow.

Mar  3 18:09:18 cv00300005248-1 kernel: Neighbour table overflow.

Mar  3 18:09:18 cv00300005248-1 kernel: Neighbour table overflow.

Mar  3 18:09:18 cv00300005248-1 kernel: Neighbour table overflow.

查看VIP绑定情况

$ ifconfig eth1:vip

eth1:vip  Link encap:Ethernet  HWaddr 00:16:3E:F2:37:6B 

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          Interrupt:13

没有VIP绑定

二、排查过程

1)检查VIP的配置情况

向合作方确认提供的VIP的详细情况

IPADDR  10.1.1.200

NETMASK  255.255.255.0

GATEWAY  10.1.1.1

Brodcast  10.1.1.255

这里设置的是

10.1.1.200/24      brd 10.1.1.255      dev eth1 label eth1:vip

2)检查iptables和selinux的设置情况

$ sudo service iptables stop

$ sudo setenforce 0

setenforce: SELinux is disabled

如果非要开启iptables的话,需要作些设定

iptables -I INPUT -i eth1 -d 224.0.0.0/8 -j ACCEPT

service iptables save

keepalived使用224.0.0.18作为Master和Backup健康检查的通信IP

3)检查相关的内核参数

HAProxy+Keepalived架构需要注意的内核参数有:

# Controls IP packet forwarding

net.ipv4.ip_forward = 1

开启IP转发功能

net.ipv4.ip_nonlocal_bind = 1

开启允许绑定非本机的IP

如果使用LVS的DR或者TUN模式结合Keepalived需要在后端真实服务器上特别设置两个arp相关的参数。这里也设置好。

net.ipv4.conf.lo.arp_ignore = 1

net.ipv4.conf.lo.arp_announce = 2

net.ipv4.conf.all.arp_ignore = 1

net.ipv4.conf.all.arp_announce = 2

4)检查VRRP的设置情况

LB1 Master

state MASTER

    interface eth1

    virtual_router_id 51

    priority 100

LB2 Backup

state BACKUP

    interface eth1

    virtual_router_id 51

    priority 99

Master和Backup的virtual_router_id需要一样,priority需要不一样,数字越大,优先级越高

5)怀疑是编译安装Keepalived版本出现了问题

重新下载并编译2.1.13的版本,并重新启动keepalived,VIP仍然没有被绑定。

线上有个平台的keepalived是通过yum安装的,于是打算先用yum安装keepalived后将配置文件复制过去看看是否可以绑定VIP

rpm -ivh http://ftp.linux.ncsu.edu/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm

yum -y install keepalived

cp /data/app_platform/keepalived/conf/keepalived.conf  /etc/keepalived/keepalived.conf

重新启动keepalived

然后查看日志

Mar  4 16:42:46 xxxxx Keepalived_healthcheckers[17332]: Registering Kernel netlink reflector

Mar  4 16:42:46 xxxxx Keepalived_healthcheckers[17332]: Registering Kernel netlink command channel

Mar  4 16:42:46 xxxxx Keepalived_vrrp[17333]: Opening file '/etc/keepalived/keepalived.conf'.

Mar  4 16:42:46 xxxxx Keepalived_vrrp[17333]: Configuration is using : 65250 Bytes

Mar  4 16:42:46 xxxxx Keepalived_vrrp[17333]: Using LinkWatch kernel netlink reflector...

Mar  4 16:42:46 xxxxx Keepalived_vrrp[17333]: VRRP sockpool: [ifindex(3), proto(112), unicast(0), fd(10,11)]

Mar  4 16:42:46 xxxxx Keepalived_healthcheckers[17332]: Opening file '/etc/keepalived/keepalived.conf'.

Mar  4 16:42:46 xxxxx Keepalived_healthcheckers[17332]: Configuration is using : 7557 Bytes

Mar  4 16:42:46 xxxxx Keepalived_healthcheckers[17332]: Using LinkWatch kernel netlink reflector...

Mar  4 16:42:46 xxxxx Keepalived_vrrp[17333]: VRRP_Script(chk_haproxy) succeeded

Mar  4 16:42:47 xxxxx Keepalived_vrrp[17333]: VRRP_Instance(VI_1) Transition to MASTER STATE

Mar  4 16:42:48 xxxxx Keepalived_vrrp[17333]: VRRP_Instance(VI_1) Entering MASTER STATE

Mar  4 16:42:48 xxxxx Keepalived_vrrp[17333]: VRRP_Instance(VI_1) setting protocol VIPs.

Mar  4 16:42:48 xxxxx Keepalived_vrrp[17333]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 10.1.1.200

Mar  4 16:42:48 xxxxx Keepalived_healthcheckers[17332]: Netlink reflector reports IP 10.1.1.200 added

Mar  4 16:42:53 xxxxx Keepalived_vrrp[17333]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 10.1.1.200

再查看IP绑定情况

$ ifconfig eth1:vip

eth1:vip  Link encap:Ethernet  HWaddr 00:16:3E:F2:37:6B 

          inet addr:10.1.1.200  Bcast:10.1.1.255  Mask:255.255.255.0

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          Interrupt:13

再通过yum将keepalived卸载掉

yum remove keepalived

恢复到原来的启动脚本/etc/init.d/keepalived

重新启动keepalived后还是无法绑定VIP

怀疑是keepalived启动脚本/etc/init.d/keepalived的问题

检查/etc/init.d/keepalived

# Source function library.

. /etc/rc.d/init.d/functions

exec="/data/app_platform/keepalived/sbin/keepalived"

prog="keepalived"

config="/data/app_platform/keepalived/conf/keepalived.conf"

[ -e /etc/sysconfig/$prog ] && . /etc/sysconfig/$prog

lockfile=/var/lock/subsys/keepalived

start() {

    [ -x $exec ] || exit 5

    [ -e $config ] || exit 6

    echo -n $"Starting $prog: "

    daemon $exec $KEEPALIVED_OPTIONS

    retval=$?

    echo

    [ $retval -eq 0 ] && touch $lockfile

    return $retval

}

关键是这一行

daemon $exec $KEEPALIVED_OPTIONS

由于没有复制/etc/sysconfig/keepalived,所以将直接执行damon /data/app_platform/keepalived/sbin/keepalived

由于keepalived默认使用的是/etc/keepalived/keepalived.conf作为配置文件,而这里指定了不同的配置文件,所以要修改成为

daemon $exec -D -f $config

重新启动keepalived,查看日志和VIP绑定情况

$ ifconfig eth1:vip

eth1:vip  Link encap:Ethernet  HWaddr 00:16:3E:F2:37:6B 

          inet addr:10.1.1.200  Bcast:10.1.1.255  Mask:255.255.255.0

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          Interrupt:13

6)将LB2 Backup的keepalived启动脚本也修改一下,观察VIP接管情况

查看LB1 Master

$ ifconfig eth1:vip

eth1:vip  Link encap:Ethernet  HWaddr 00:16:3E:F2:37:6B 

          inet addr:10.1.1.200  Bcast:10.1.1.255  Mask:255.255.255.0

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          Interrupt:13

查看LB2 Backup

$ ifconfig eth1:vip

eth1:vip  Link encap:Ethernet  HWaddr 00:16:3E:F2:37:6B 

          inet addr:10.1.1.200  Bcast:10.1.1.255  Mask:255.255.255.0

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          Interrupt:13

问题出现了,LB1 Master和LB2 Backup都绑定了VIP 10.1.1.200,这是不正常的!!!!

在LB1和LB2上登录10.1.1.200看看

[lb1 ~]$ ssh 10.1.1.200 

Last login: Wed Mar  4 17:31:33 2015 from 10.1.1.200

[lb1 ~]$

[lb2 ~]$ ssh 10.1.1.200

Last login: Wed Mar  4 17:54:57 2015 from 101.95.153.246

[b2  ~]$

在LB1上停掉keepalived,ping下10.1.1.200这个IP,发现无法ping通

在LB2上停掉keepalived,ping下10.1.1.200这个IP,发现也无法ping通

然后开启LB1上的keepalived,LB1上可以ping通10.1.1.200,LB2上不行

开启LB2上的keepalived,LB2上可以ping通10.1.1.200

由此得出,LB1和LB2各自都将VIP 10.1.1.200绑定到本机的eth1网卡上。两台主机并没有VRRP通信,没有VRRP的优先级比较。

7)排查影响VRRP通信的原因

重新启动LB1 Master的Keepalived查看日志

Mar  5 15:45:36 gintama-taiwan-lb1 Keepalived_vrrp[32303]: Configuration is using : 65410 Bytes

Mar  5 15:45:36 gintama-taiwan-lb1 Keepalived_vrrp[32303]: Using LinkWatch kernel netlink reflector...

Mar  5 15:45:36 gintama-taiwan-lb1 Keepalived_vrrp[32303]: VRRP sockpool: [ifindex(3), proto(112), unicast(0), fd(10,11)]

Mar  5 15:45:36 gintama-taiwan-lb1 Keepalived_vrrp[32303]: VRRP_Script(chk_haproxy) succeeded

Mar  5 15:45:37 gintama-taiwan-lb1 Keepalived_vrrp[32303]: VRRP_Instance(VI_1) Transition to MASTER STATE

Mar  5 15:45:38 gintama-taiwan-lb1 Keepalived_vrrp[32303]: VRRP_Instance(VI_1) Entering MASTER STATE

Mar  5 15:45:38 gintama-taiwan-lb1 Keepalived_vrrp[32303]: VRRP_Instance(VI_1) setting protocol VIPs.

Mar  5 15:45:38 gintama-taiwan-lb1 Keepalived_vrrp[32303]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 10.1.1.200

Mar  5 15:45:38 gintama-taiwan-lb1 Keepalived_healthcheckers[32302]: Netlink reflector reports IP 10.1.1.200 added

Mar  5 15:45:43 gintama-taiwan-lb1 Keepalived_vrrp[32303]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 10.1.1.200

发现LB1 Master上的Keepalived直接进入Master状态,然后接管VIP

再重新启动LB2 Backup上的Keepalived,查看日志

Mar  5 15:47:42 gintama-taiwan-lb2 Keepalived_vrrp[30619]: Configuration is using : 65408 Bytes

Mar  5 15:47:42 gintama-taiwan-lb2 Keepalived_vrrp[30619]: Using LinkWatch kernel netlink reflector...

Mar  5 15:47:42 gintama-taiwan-lb2 Keepalived_vrrp[30619]: VRRP_Instance(VI_1) Entering BACKUP STATE

Mar  5 15:47:42 gintama-taiwan-lb2 Keepalived_vrrp[30619]: VRRP sockpool: [ifindex(3), proto(112), unicast(0), fd(10,11)]

Mar  5 15:47:46 gintama-taiwan-lb2 Keepalived_vrrp[30619]: VRRP_Instance(VI_1) Transition to MASTER STATE

Mar  5 15:47:47 gintama-taiwan-lb2 Keepalived_vrrp[30619]: VRRP_Instance(VI_1) Entering MASTER STATE

Mar  5 15:47:47 gintama-taiwan-lb2 Keepalived_vrrp[30619]: VRRP_Instance(VI_1) setting protocol VIPs.

Mar  5 15:47:47 gintama-taiwan-lb2 Keepalived_vrrp[30619]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 10.1.1.200

Mar  5 15:47:47 gintama-taiwan-lb2 Keepalived_healthcheckers[30618]: Netlink reflector reports IP 10.1.1.200 added

Mar  5 15:47:52 gintama-taiwan-lb2 Keepalived_vrrp[30619]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 10.1.1.200

可以看到LB2上的Keepalived先进入BACKUP状态,然后又转为MASTER状态,然后接管VIP

这样就说明VRRP组播有问题。

既然VRRP组播有问题,就尝试使用单播发送VRRP报文。修改LB1和LB2的配置

LB1

添加以下配置

unicast_src_ip  10.1.1.12

    unicast_peer {

                  10.1.1.17

     }

LB2

添加以下配置

unicast_src_ip  10.1.1.17

 unicast_peer {

                  10.1.1.12

  }

unicast_src_ip 表示发送VRRP单播报文使用的源IP地址

unicast_peer 表示对端接收VRRP单播报文的IP地址

然后各自重新加载keepalived,观察日志

LB1

Mar  5 16:13:35 gintama-taiwan-lb1 Keepalived_vrrp[2551]: VRRP_Instance(VI_1) setting protocol VIPs.

Mar  5 16:13:35 gintama-taiwan-lb1 Keepalived_vrrp[2551]: VRRP_Script(chk_haproxy) considered successful on reload

Mar  5 16:13:35 gintama-taiwan-lb1 Keepalived_vrrp[2551]: Configuration is using : 65579 Bytes

Mar  5 16:13:35 gintama-taiwan-lb1 Keepalived_vrrp[2551]: Using LinkWatch kernel netlink reflector...

Mar  5 16:13:35 gintama-taiwan-lb1 Keepalived_vrrp[2551]: VRRP sockpool: [ifindex(3), proto(112), unicast(1), fd(10,11)]

Mar  5 16:13:36 gintama-taiwan-lb1 Keepalived_vrrp[2551]: VRRP_Instance(VI_1) Transition to MASTER STATE

Mar  5 16:13:48 gintama-taiwan-lb1 Keepalived_vrrp[2551]: VRRP_Instance(VI_1) Received lower prio advert, forcing new election

Mar  5 16:13:48 gintama-taiwan-lb1 Keepalived_vrrp[2551]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 10.1.1.200

Mar  5 16:13:48 gintama-taiwan-lb1 Keepalived_vrrp[2551]: VRRP_Instance(VI_1) Received lower prio advert, forcing new election

Mar  5 16:13:48 gintama-taiwan-lb1 Keepalived_vrrp[2551]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 10.1.1.200

LB2

Mar  5 16:13:48 gintama-taiwan-lb2 Keepalived_vrrp[453]: VRRP_Instance(VI_1) Received higher prio advert

Mar  5 16:13:48 gintama-taiwan-lb2 Keepalived_vrrp[453]: VRRP_Instance(VI_1) Entering BACKUP STATE

Mar  5 16:13:48 gintama-taiwan-lb2 Keepalived_vrrp[453]: VRRP_Instance(VI_1) removing protocol VIPs.

Mar  5 16:13:48 gintama-taiwan-lb2 Keepalived_healthcheckers[452]: Netlink reflector reports IP 10.1.1.200 removed

查看VIP绑定情况,发现LB2上的VIP已经移除

在LB1上LB2上执行ping 10.1.1.200这个VIP

[lb1 ~]$ ping -c 5 10.1.1.200

PING 10.1.1.200 (10.1.1.200) 56(84) bytes of data.

64 bytes from 10.1.1.200: icmp_seq=1 ttl=64 time=0.028 ms

64 bytes from 10.1.1.200: icmp_seq=2 ttl=64 time=0.020 ms

64 bytes from 10.1.1.200: icmp_seq=3 ttl=64 time=0.020 ms

64 bytes from 10.1.1.200: icmp_seq=4 ttl=64 time=0.021 ms

64 bytes from 10.1.1.200: icmp_seq=5 ttl=64 time=0.027 ms

--- 10.1.1.200 ping statistics ---

5 packets transmitted, 5 received, 0% packet loss, time 3999ms

rtt min/avg/max/mdev = 0.020/0.023/0.028/0.004 ms

[lb2 ~]$ ping -c 5 10.1.1.200 

PING 10.1.1.200 (10.1.1.200) 56(84) bytes of data.

--- 10.1.1.200 ping statistics ---

5 packets transmitted, 0 received, 100% packet loss, time 14000ms

当LB1接管VIP的时候LB2居然无法ping通VIP,同样将LB1的Keepalived停掉,LB2可以接管VIP,但是在LB1上无法ping通这个VIP

在LB1和LB2上进行抓包

[lb1 ~]$ sudo tcpdump -vvv  -i eth1 host 10.1.1.17

tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes

16:46:04.827357 IP (tos 0xc0, ttl 255, id 328, offset 0, flags [none], proto VRRP (112), length 40)

    10.1.1.12 > 10.1.1.17: VRRPv2, Advertisement, vrid 51, prio 102, authtype simple, intvl 1s, length 20, addrs: 10.1.1.200 auth "1111^@^@^@^@"

16:46:05.827459 IP (tos 0xc0, ttl 255, id 329, offset 0, flags [none], proto VRRP (112), length 40)

    10.1.1.12 > 10.1.1.17: VRRPv2, Advertisement, vrid 51, prio 102, authtype simple, intvl 1s, length 20, addrs: 10.1.1.200 auth "1111^@^@^@^@"

16:46:06.828234 IP (tos 0xc0, ttl 255, id 330, offset 0, flags [none], proto VRRP (112), length 40)

    10.1.1.12 > 10.1.1.17: VRRPv2, Advertisement, vrid 51, prio 102, authtype simple, intvl 1s, length 20, addrs: 10.1.1.200 auth "1111^@^@^@^@"

16:46:07.828338 IP (tos 0xc0, ttl 255, id 331, offset 0, flags [none], proto VRRP (112), length 40)

    10.1.1.12 > 10.1.1.17: VRRPv2, Advertisement, vrid 51, prio 102, authtype simple, intvl 1s, length 20, addrs: 10.1.1.200 auth "1111^@^@^@^@"

[lb2 ~]$ sudo tcpdump -vvv -i eth1 host 10.1.1.12

tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes

16:48:07.000029 IP (tos 0xc0, ttl 255, id 450, offset 0, flags [none], proto VRRP (112), length 40)

    10.1.1.12 > 10.1.1.17: VRRPv2, Advertisement, vrid 51, prio 102, authtype simple, intvl 1s, length 20, addrs: 10.1.1.200 auth "1111^@^@^@^@"

16:48:07.999539 IP (tos 0xc0, ttl 255, id 451, offset 0, flags [none], proto VRRP (112), length 40)

    10.1.1.12 > 10.1.1.17: VRRPv2, Advertisement, vrid 51, prio 102, authtype simple, intvl 1s, length 20, addrs: 10.1.1.200 auth "1111^@^@^@^@"

16:48:08.999252 IP (tos 0xc0, ttl 255, id 452, offset 0, flags [none], proto VRRP (112), length 40)

    10.1.1.12 > 10.1.1.17: VRRPv2, Advertisement, vrid 51, prio 102, authtype simple, intvl 1s, length 20, addrs: 10.1.1.200 auth "1111^@^@^@^@"

16:48:09.999560 IP (tos 0xc0, ttl 255, id 453, offset 0, flags [none], proto VRRP (112), length 40)

    10.1.1.12 > 10.1.1.17: VRRPv2, Advertisement, vrid 51, prio 102, authtype simple, intvl 1s, length 20, addrs: 10.1.1.200 auth "1111^@^@^@^@"

在LB1和LB2所在物理机上的其他虚拟机进行VIP ping测试,同样只能是LB1上绑定的VIP只能是LB1所在的物理机上的虚拟机可以ping通,LB2所在的物理机上的虚拟机无法ping通,反之也是一样

有同行建议说VRRP和DHCP也有关系,经过查看对方提供的VM的IP地址居然是DHCP分配的,但是经过测试,VRRP和DHCP没有关系。线上环境最好不要使用DHCP来获取IP地址。

8)请对方技术人员配合检查VIP无法ping通的问题

最终查明对方的内网居然使用的虚拟网络,网关是没有实际作用的。所以部分虚拟机无法通过10.1.1.1这个网关去访问VIP。

让对方虚拟机提供方的技术人员到服务器调试HAProxy+Keepalived,他们通过网络设置使得10.1.1.200这个VIP可以通过内网访问。但是当我测试时,发现当HAProxy挂掉后,Keepalived无法作VIP的切换。

9)解决当HAProxy挂掉后,Keepalived无法对VIP切换的问题。

经过反复测试,发现当Keepalived挂掉后,VIP可以切换。但是当HAProxy挂掉后,VIP无法切换。

仔细检查配置文件和查阅相关资料,最终确定是Keepalived的weight和priority两个参数的大小设置问题。

原来的配置文件中我设置LB1的weight为2,priority为100。LB2的weight为2,priority为99

对方在调试的时候将LB1的priority更改为160.这样反复测试当LB1的HAProxy挂掉后,VIP都无法迁移到LB2上。将LB1上的priority更改为100就可以了。

这里需要注意的是:

主keepalived的priority值与vrrp_script的weight值相减的数字小于备用keepalived的priority 值即可!

vrrp_script 里的script返回值为0时认为检测成功,其它值都会当成检测失败

* weight 为正时,脚本检测成功时此weight会加到priority上,检测失败时不加。

主失败:

            主 priority < 从 priority + weight 时会切换。

主成功:

            主 priority + weight > 从 priority + weight 时,主依然为主

* weight 为负时,脚本检测成功时此weight不影响priority,检测失败时priority - abs(weight)

主失败:

          主 priority - abs(weight) < 从priority 时会切换主从

主成功:

          主 priority > 从priority 主依然为主。

最终的配置文件为:

! Configuration File for keepalived

global_defs {

  notification_email {

    admin@example.com

  }

  notification_email_from lb1@example.com

  smtp_server 127.0.0.1

  smtp_connect_timeout 30

  router_id LB1_MASTER

}

 vrrp_script chk_haproxy {

    script "killall -0 haproxy"

    interval 2

    weight 2

    }

#设置外网的VIP

vrrp_instance eth0_VIP {

    state MASTER

    interface eth0

    virtual_router_id 51

    priority 100

    advert_int 1

    authentication {

        auth_type PASS

        auth_pass 1111

    }

    unicast_src_ip  8.8.8.6        #使用VRRP的单播

    unicast_peer {

                  8.8.8.7

                }

    virtual_ipaddress {

    8.8.8.8/25  brd 8.8.8.255  dev eth0 label eth0:vip

    }

    track_script {

        chk_haproxy

                }

}

#设置内网的VIP

vrrp_instance eth1_VIP {

    state MASTER

    interface eth1

    virtual_router_id 52

    priority 100

    advert_int 1

    authentication {

        auth_type PASS

        auth_pass 1111

    }

    unicast_src_ip  10.1.1.12

    unicast_peer {

                  10.1.1.17

                }

    virtual_ipaddress {

    10.1.1.200/24    brd 10.1.1.255      dev eth1 label eth1:vip

    }

    track_script {

        chk_haproxy

                }

}

三、排查总结

在配置Keepalived的时候,需要注意以下几点:

A.内核开启IP转发和允许非本地IP绑定功能,如果是使用LVS的DR模式还需设置两个arp相关的参数。

B.如果Keepalived所在网络不允许使用组播,可以使用VRRP单播

C.需要注意主备的weight和priority的值,这两个值如果设置不合理可能会影响VIP的切换。

D.如果使用的配置文件不是默认的配置文件,在启动Keepalived的时候需要使用 -f 参数指定配置文件。

CentOS 6.3下Haproxy+Keepalived+Apache配置笔记 http://www.linuxidc.com/Linux/2013-06/85598.htm

Haproxy + KeepAlived 实现WEB群集 on CentOS 6 http://www.linuxidc.com/Linux/2012-03/55672.htm

Keepalived+Haproxy配置高可用负载均衡 http://www.linuxidc.com/Linux/2012-03/56748.htm

Haproxy+Keepalived构建高可用负载均衡 http://www.linuxidc.com/Linux/2012-03/55880.htm

CentOS 7 上配置LVS + Keepalived + ipvsadm http://www.linuxidc.com/Linux/2014-11/109237.htm

Keepalived高可用集群搭建 http://www.linuxidc.com/Linux/2014-09/106965.htm

Keepalived 的详细介绍:https://www.linuxidc.com/Linux/2013-10/90816.htm

本文永久更新链接地址:

<think>好的,我现在需要解决用户关于Keepalived虚拟IP未正确绑定到网卡的问题。用户提供了几个引用资料,我需要结合这些信息,整理出解决方案。首先,我需要了解Keepalived的基本配置和工作原理,尤其是虚拟IP(VIP)的绑定机制。 根据引用[3]中的配置示例,主备服务器的vrrp_instance配置中指定了interface eth0。这表明虚拟IP应该绑定到eth0网卡。如果用户配置中interface参数设置错误,比如写成了eth1或其他不存在的接口,VIP自然无法正确绑定。因此,首先需要检查配置文件中的interface是否正确。 接下来,引用[2]提到需要检查主备服务器的配置文件是否有错误,比如virtual_router_id是否一致。如果主备的virtual_router_id不一致,可能导致VRRP协议无法正常协商,进而VIP无法绑定。因此,确认virtual_router_id在主备服务器上是否相同是必要的。 另外,引用[3]还提到在备服务器lb02的配置中,state设置为BACKUP,priority为100,而主服务器通常是MASTER状态和更高的优先级。如果优先级设置不当,可能导致脑裂或VIP无法正确切换,但用户的问题更直接的是VIP绑定,所以可能优先级不是主因,但检查配置是否符合规范还是有必要的。 关于故障排查,引用[1]中提到重启Keepalived服务后虚拟IP无法访问,可能涉及服务重启后的配置加载问题。可能需要检查系统日志/var/log/messages或使用journalctl -u keepalived来查看是否有错误信息。比如,如果网卡名称不正确,日志中可能会有相关报错。 此外,引用[4]中的HaVIP和EIP配置示例,说明虚拟IP需要正确配置在内网网卡上,并且可能需要与ECS的网卡对应。如果网络架构中存在多个网卡(如eth0、eth1),确保选择了正确的接口至关重要。 可能的解决方案步骤包括: 1. 检查interface参数是否配置正确,确认网卡存在且名称正确。 2. 验证virtual_router_id是否主备一致。 3. 检查防火墙或SELinux是否阻止了VRRP通信。 4. 查看日志确认错误信息。 5. 手动添加VIP测试,使用ip addr add命令,确认网卡是否可以绑定。 6. 确保keepalived服务正常运行,配置无误后重启服务。 此外,用户可能需要示例配置,比如正确的interface设置和virtual_router_id示例,确保配置语法正确,比如是否有遗漏的花括号或分号。同时,检查是否有其他网络配置冲突,比如是否已经有其他服务占用了相同的VIP。 需要确保回答结构清晰,分步骤说明解决方案,并给出配置示例和故障排查命令,帮助用户逐步解决问题。同时,结合提供的引用资料,特别是引用[2]和[3]中的配置要点,确保建议的准确性和相关性。</think>### Keepalived虚拟IP未正确绑定网卡的解决方案 #### 一、核心配置检查 1. **确认网卡接口名称** 在`/etc/keepalived/keepalived.conf`中检查`vrrp_instance`的`interface`参数,需与实际网卡名称一致(如`eth0`、`ens192`等)[^3]。 **错误示例**: ```bash vrrp_instance VI_1 { interface eth1 # 网卡名称错误 } ``` **正确配置**: ```bash vrrp_instance VI_1 { interface eth0 # 通过ip addr命令确认网卡名称 } ``` 2. **验证virtual_router_id一致性** 主备服务器的`virtual_router_id`必须相同(取值范围1-255),否则VRRP协议无法协商[^2]。 ```bash vrrp_instance VI_1 { virtual_router_id 50 # 主备需均为50 } ``` 3. **检查防火墙配置** VRRP协议使用组播地址`224.0.0.18`,需开放防火墙: ```bash firewall-cmd --permanent --add-rich-rule='rule protocol value="vrrp" accept' firewall-cmd --reload ``` --- #### 二、故障排查步骤 1. **查看VIP绑定状态** 执行命令检查VIP是否绑定到目标网卡: ```bash ip addr show eth0 # 替换为实际网卡名称 ``` 输出中应有类似: ```bash inet 10.0.0.3/32 scope global eth0 ``` 2. **检查Keepalived日志** 通过日志定位具体错误: ```bash journalctl -u keepalived -f # 实时查看日志 ``` 常见错误: - `Interface eth0 doesn't exist` → 网卡名称错误 - `VRRP_Instance(VI_1) ignoring received advertisment...` → `virtual_router_id`不一致 3. **手动绑定VIP测试** 临时添加VIP验证网卡是否支持: ```bash ip addr add 10.0.0.3/32 dev eth0 ``` 若失败,可能因网卡禁用或IP冲突。 --- #### 三、配置示例与修复 1. **主服务器配置** ```bash vrrp_instance VI_1 { state MASTER interface eth0 # 确认与实际网卡一致 virtual_router_id 50 # 主备相同 priority 150 # 主节点优先级更高 virtual_ipaddress { 10.0.0.3/32 # VIP格式需正确 } } ``` 2. **备服务器配置** ```bash vrrp_instance VI_1 { state BACKUP interface eth0 virtual_router_id 50 # 与主节点一致 priority 100 virtual_ipaddress { 10.0.0.3/32 } } ``` --- #### 四、进阶排查 1. **网络隔离问题** 使用`tcpdump`抓包验证VRRP通信: ```bash tcpdump -i eth0 vrrp -n ``` 若无输出,说明主备间VRRP通信被阻断。 2. **SELinux影响** 临时禁用SELinux测试: ```bash setenforce 0 systemctl restart keepalived ``` ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值