试想下如果前面配置的HAProxy主机192.168.0.9突然宕机或者网卡失效,那么虽然RabbitMQ集群没有任何故障,但是对于外界的客户端来说所有的连接都会被断开,结果将是灾难性的。确保负载均衡服务的可靠性同样显得十分的重要。这里就引入Keepalived工具,它能够通过自身健康检查、资源接管功能做高可用(双机热备),实现故障转移。
Keepalived工作在OSI模型中的第3层、第4层和第7层。
工作在第3层是指Keepalived会定期向热备组中的服务器发送一个ICMP数据包来判断某台服务器是否故障,如果故障则将这台服务器从热备组移除。
工作在第4层是指Keepalived以TCP端口的状态判断服务器是否故障,比如检测RabbitMQ的5672端口,如果故障则将这台服务器从热备组中移除。
工作在第7层是指Keepalived根据用户设定的策略(通常是一个自定义的检测脚本)判断服务器上的程序是否正常运行,如果故障将这台服务器从热备组移除。
Keepalived的安装
[root@node1 ~]
[root@node1 ~]
[root@node1 keepalived-1.3.5]
[root@node1 keepalived-1.3.5]
[root@node1 keepalived-1.3.5]
之后将安装过后的Keepalived加入系统服务中,详细步骤如下(注意千万不要输错命令):
[root@node1 ~]
[root@node1 ~]
[root@node1 ~]
[root@node1 ~]
[root@node1 ~]
[root@node1 ~]
[root@node1 ~]
[root@node1 ~]
执行完之后就可以使用如下:
service keepalived restart
service keepalived start
service keepalived stop
service keepalived status
这4个命令来重启、启动、关闭和查看keepalived状态。
配置
接下来我们要修改/etc/keepalived/keepalived.conf文件,在Keepalived的Master上配置详情如下:
global_defs {
router_id NodeA
}
vrrp_script chk_haproxy {
script "/etc/keepalived/check_haproxy.sh"
interval 5
weight 2
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 1
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass root123
}
track_script {
chk_haproxy
}
virtual_ipaddress {
192.168.0.10
}
}
Backup中的配置大致和Master中的相同,不过需要修改global_defs{}的router_id,比如置为NodeB;其次要修改vrrp_instance VI_1{}中的state为BACKUP;最后要将priority设置为小于100的值。注意Master和Backup中的virtual_router_id要保持一致。下面简要的展示下Backup的配置:
global_defs {
router_id NodeB
}
vrrp_script chk_haproxy {
...
}
vrrp_instance VI_1 {
state BACKUP
...
priority 50
...
}
为了防止HAProxy服务挂了,但是Keepalived却还在正常工作而没有切换到Backup上,所以这里需要编写一个脚本来检测HAProxy服务的状态。当HAProxy服务挂掉之后该脚本会自动重启HAProxy的服务,如果不成功则关闭Keepalived服务,如此便可以切换到Backup继续工作。这个脚本就对应了上面配置中vrrp_script chk_haproxy{}的script对应的值,/etc/keepalived/check_haproxy.sh的内容如代码清单所示。
#!/bin/bash
if [ $(ps -C haproxy --no-header | wc -l) -eq 0 ];then
haproxy -f /opt/haproxy-1.7.8/haproxy.cfg
fi
sleep 2
if [ $(ps -C haproxy --no-header | wc -l) -eq 0 ];then
service keepalived stop
fi
查看Keepalived的运行情况
可以通过tail -f /var/log/messages -n 200命令查看相应的Keepalived日志输出。Master启动日志如下:
Oct 4 23:01:51 node1 Keepalived[30553]: Starting Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2
Oct 4 23:01:51 node1 Keepalived[30553]: Unable to resolve default script username
Oct 4 23:01:51 node1 Keepalived[30553]: Opening file
Oct 4 23:01:51 node1 Keepalived[30554]: Starting Healthcheck child process, pid=30555
Oct 4 23:01:51 node1 Keepalived[30554]: Starting VRRP child process, pid=30556
Oct 4 23:01:51 node1 Keepalived_healthcheckers[30555]: Opening file
Oct 4 23:01:51 node1 Keepalived_vrrp[30556]: Registering Kernel netlink reflector
Oct 4 23:01:51 node1 Keepalived_vrrp[30556]: Registering Kernel netlink command channel
Oct 4 23:01:51 node1 Keepalived_vrrp[30556]: Registering gratuitous ARP shared channel
Oct 4 23:01:51 node1 Keepalived_vrrp[30556]: Opening file
Oct 4 23:01:51 node1 Keepalived_vrrp[30556]: VRRP_Instance(VI_1) removing protocol VIPs.
Oct 4 23:01:51 node1 Keepalived_vrrp[30556]: SECURITY VIOLATION - scripts are being executed but script_security not enabled.
Oct 4 23:01:51 node1 Keepalived_vrrp[30556]: Using LinkWatch kernel netlink reflector...
Oct 4 23:01:51 node1 Keepalived_vrrp[30556]: VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(10,11)]
Oct 4 23:01:51 node1 Keepalived_vrrp[30556]: VRRP_Instance(VI_1) Transition to MASTER STATE
Oct 4 23:01:52 node1 Keepalived_vrrp[30556]: VRRP_Instance(VI_1) Entering MASTER STATE
Oct 4 23:01:52 node1 Keepalived_vrrp[30556]: VRRP_Instance(VI_1) setting protocol VIPs.
Master启动之后可以通过ip add show命令查看添加的VIP(加粗部分,Backup节点是没有VIP的):
[root@node1 ~]
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UP qlen 1000
link/ether fa:16:3e:5e:7a:f7 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.8/18 brd 10.198.255.255 scope global eth0
inet 192.168.0.10/32 scope global eth0
inet6 fe80::f816:3eff:fe5e:7af7/ scope link
valid_lft forever preferred_lft forever
在Master节点执行 service keepalived stop模拟异常关闭的情况,观察Master的日志:
Oct 4 22:58:32 node1 Keepalived[27609]: Stopping
Oct 4 22:58:32 node1 Keepalived_vrrp[27611]: VRRP_Instance(VI_1) sent 0 priority
Oct 4 22:58:32 node1 Keepalived_vrrp[27611]: VRRP_Instance(VI_1) removing protocol VIPs.
Oct 4 22:58:32 node1 Keepalived_healthcheckers[27610]: Stopped
Oct 4 22:58:33 node1 Keepalived_vrrp[27611]: Stopped
Oct 4 22:58:33 node1 Keepalived[27609]: Stopped Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2
Oct 4 22:58:34 node1 ntpd[1313]: Deleting interface #13 eth0, 192.168.0.10#123, interface stats: received=0, sent=0, dropped=0, active_time=532 secs
Oct 4 22:58:34 node1 ntpd[1313]: peers refreshed
对应的Master上的VIP也会消失:
[root@node1 ~]
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UP qlen 1000
link/ether fa:16:3e:5e:7a:f7 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.8/18 brd 10.198.255.255 scope global eth0
inet6 fe80::f816:3eff:fe5e:7af7/ scope link
valid_lft forever preferred_lft forever
Master关闭后,Backup会提升为新的Master,对应的日志为:
Oct 4 22:58:15 node2 Keepalived_vrrp[2352]: VRRP_Instance(VI_1) Transition to MASTER STATE
Oct 4 22:58:16 node2 Keepalived_vrrp[2352]: VRRP_Instance(VI_1) Entering MASTER STATE
Oct 4 22:58:16 node2 Keepalived_vrrp[2352]: VRRP_Instance(VI_1) setting protocol VIPs.
可以看到新的Master节点上虚拟出了VIP如下所示:
[root@node2 ~]
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UP qlen 1000
link/ether fa:16:3e:23:ac:ec brd ff:ff:ff:ff:ff:ff
inet 192.168.0.9/18 brd 10.198.255.255 scope global eth0
inet 192.168.0.10/32 scope global eth0
inet6 fe80::f816:3eff:fe23:acec/ scope link
valid_lft forever preferred_lft forever