keepalived集群高可用部署参考
需求描述:某服务A(后面都用A来表示该服务)需要高可用服务需求。当主服务故障时,需要切换到备服务上。目前为一主一备,后续为一主多备
需求提炼:
- 部署keepalived服务进行健康检查、故障切换
- 多备情况下,根据vrrp选举特性,备机权重值、state均设置为一致
- 编写脚本对进行服务进行定时检查,并动态调整keepalived服务启停。
- 考虑到频繁启停可能影响正常服务,设置keepalived集群节点均为非抢占模式
安装keepalived服务
-
下载keepalived源码安装包
wget https://www.keepalived.org/software/keepalived-2.0.10.tar.gz
-
解压安装
tar zxvf keepalived-2.0.10.tar.gz cd keepalived-2.0.10 make && make install #make的时候可能会遇到依赖问题,需要先解决依赖包问题再进行编译 #检查是否安装成功 keepalived --version
-
修改配置
! Configuration File for keepalivedglobal_defs {notification_email {acassen@firewall.locfailover@firewall.locsysadmin@firewall.loc}notification_email_from Alexandre.Cassen@firewall.locsmtp_server 192.168.200.1smtp_connect_timeout 30#路由id,需要保证集群内id唯一router_id LVS_DEVEL_2}vrrp_instance VI_1 {#初始状态state BACKUP#vip关联接口interface eth0#虚拟路由局域网id,需要保证为局域网内唯一virtual_router_id 131#选举权重值priority 100#非抢占模式nopreempt#心跳周期advert_int 1#集群鉴权信息authentication {auth_type PASSauth_pass 1993}#vip配置,可以设置多个virtual_ipaddress {10.25.74.124}}
-
启停与状态检查
#启停 systemctl start|stop|restart keepalived #状态检查:1.查看ip地址 ip addr | grep eth0 #状态检查:2.查看日志 tail -n 200 /var/log/message | grep Keepalived
检查脚本
把下面脚本分别部署到keepalived各节点上
vim /etc/keepalived/check_port.sh
#!/bin/bash
#shell script for check service alive
#author: joe
#um:xianweijian323while true;docount_9999=`netstat -lnpt | grep 9999| wc -l`count_kpa=`ps -ef | grep keepalived| grep -v grep | wc -l`if [ $count_9999 -gt 0 -a $count_kpa -eq 0 ];thensystemctl start keepalivedecho "`date +'%Y-%m-%d %H:%M:%S'` [WARN] 9999’s service is alive,but Keepalived service is dead.startup Keepalived serivice"elif [ $count_9999 -eq 0 -a $count_kpa -gt 0 ];thensystemctl stop keepalivedecho "`date +'%Y-%m-%d %H:%M:%S'` [WARN] 9999’s service is dead,but Keepalived service is active.shutdown Keepalived serivice"
# else
# echo "`date +'%Y-%m-%d %H:%M:%S'` [INFO] all serivce is actived.Nothing to do"fisleep 1
donenohup bash check_port.sh > ./check_port.log 2>&1 &
测试
场景一:master节点A服务不存在、异常,keepalived服务正常
现象:检查脚本触发systemctl stop keepalived命令,master节点上keepalived服务停止,vip漂移到vrrp优先级更高的节点上
场景二:节点A服务重新正常启动,keepalived服务不存在、或者机器故障
现象:检查脚本触发systemctl start keepalived命令,节点上keepalived服务启动,加入集群并参与master选举。无论节点权重优先级是否为最高,都不会抢占正常服务的keepalived节点
附加
keepalived集群没有解决网络故障导致的脑裂问题。
解决方案:
- 可以利用第三方仲裁来防止脑裂
- 可以使用vrrp_script、track_script来解决脑裂的处理问题,编写脚本定时ping上层网关地址,如果连续N个包不通,则关闭keepalived服务