第1章 Keepalived高可用服务单实例实战
1.1 配置Keepalived实现单实例单IP自动漂移接管
实际上是将高可用对的两台机器应用服务同时开启,但是只让有VIP一端的服务器提供服务,若主服务器宕机VIP会自动漂移到备用服务器上,此时用户的请求直接发送到备用服务器上,而无需面临启动对应服务(事先开启应用服务)。
1.2 配置Keepalived主服务器
1.2.1 编写keepalived.conf配置文件
[root@lb01 ~]# cp /etc/keepalived/keepalived.conf{,.bak} [root@lb01 ~]# vim /etc/keepalived/keepalived.conf ! Configuration File for keepalived global_defs { notification_email { [email protected] } notification_email_from [email protected] smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id lb01 #id为lb01,不同的keepalived.conf此ID要唯一 } vrrp_instance VI_1 { #实例名字为VI_1,相同实例的备节点名字要和这个相同 state MASTER #状态为MASTER,备节点状态需要为BACKUP interface eth0 #通信接口为eth0,此参数备节点设置和主节点相同 virtual_router_id 55 #实例ID为55,keepalived.conf里唯一 priority 150 #优先级为150,备节点的优先级必须比此数字低 advert_int 1 #通信检查间隔时间1秒 authentication { auth_type PASS #PASS认证类型,此参数备节点设置和主节点相同 auth_pass 1111 #密码是1111,此参数备节点设置和主节点相同 } virtual_ipaddress { 10.0.0.12/24 dev eth0 label eth0:1 #虚拟IP,即VIP为10.0.0.12,子网掩码为24位,绑定接口为eth0, #别名为eth0:1,此参数备节点设置和主节点相同 } }
1.2.2 启动Keepalived服务
[root@lb01 ~]# service keepalived start Starting keepalived: [ OK ]
1.2.3 检查是否有虚拟IP 10.0.0.12
[root@lb01 ~]# ip address show | grep 10.0.0.12 inet 10.0.0.12/24 scope global secondary eth0:1
1.3 配置Keepalived备服务器
1.3.1 编写keepalived.conf配置文件
[root@lb02 ~]# cp /etc/keepalived/keepalived.conf{,.bak} [root@lb02 ~]# vim /etc/keepalived/keepalived.conf ! Configuration File for keepalived global_defs { notification_email { [email protected] } notification_email_from [email protected] smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id lb02 #与Master不同 } vrrp_instance VI_1 { state BACKUP #与Master不同 interface eth0 virtual_router_id 55 priority 100 #与Master不同 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 10.0.0.12/24 dev eth0 label eth0:1 } }
1.3.2 启动Keepalived服务
[root@lb02 ~]# /etc/init.d/keepalived start Starting keepalived: [ OK ]
1.3.3 检查是否有虚拟IP 10.0.0.12
[root@lb02 ~]# ip addr | grep 10.0.0.12 [root@lb02 ~]# #这里没有任何结果,因为lb02为BACKUP,当主节点活着的时候它不会接管VIP 10.0.0.12
注意:如果查看BACKUP节点VIP有10.0.0.12的信息则说明高可用裂脑了,裂脑是两台服务器争抢同一资源导致的。
- 如果出现出现裂脑,一般先排查两个地方:
- 主备两台服务器之间是否通信正常,如果不正常是否有iptables防火墙阻挡
- 主备两台服务器对应的conf配置文件是否有错误?例如:是否同一实例的virtual_router_id配置的不一致。
1.4 进行切换实验
1.4.1 停掉MASTER服务器keepalived服务
[root@lb01 ~]# ip addr | grep 10.0.0.12 inet 10.0.0.12/24 scope global secondary eth0:1 [root@lb01 ~]# service keepalived stop Stopping keepalived: [ OK ] [root@lb01 ~]# ip addr | grep 10.0.0.12 [root@lb01 ~]# # VIP 10.0.0.12消失了
1.4.2 查看BACKUP服务器VIP情况
[root@lb02 ~]# ip addr | grep 10.0.0.12 [root@lb02 ~]# ip addr | grep 10.0.0.12 inet 10.0.0.12/24 scope global secondary eth0:1 #BACKUP服务器已经绑定了VIP
1.4.3 重启MASTER服务器keepalived服务
[root@lb01 ~]# service keepalived start Starting keepalived: [ OK ] [root@lb01 ~]# ip addr | grep 10.0.0.12 inet 10.0.0.12/24 scope global secondary eth0:1 #MASTER重新接替了VIP
1.4.4 查看BACKUP服务器VIP情况
[root@lb02 ~]# ip addr | grep 10.0.0.12 [root@lb02 ~]# #BACKUP已经将VIP释放
说明:这里仅实现了VIP的自动漂移,因此只适合两台服务器提供的服务均保持开启的应用场景,这也是工作中常用的高可用解决方案。
1.5 单实例主备模式Keepalived配置文件对比
Keepalived配置参数 | MASTER节点特殊参数 | BACKUP节点参数 |
router_id(唯一标识) | router_id lb01 | router_1d lb02 |
state(角色状态) | state MASTER | state BACKUP |
priority(竞选优先级) | priority 150 | priority 100 |
第2章 Keepalived高可用服务器的“裂脑”问题
2.1 裂脑介绍
由于某些原因导致两台高可用服务器对在指定时间内无法检测到对方的心跳消息,各自取得资源及服务的所有权,而此时的两台高可用服务器对都还活着并在正常运行,这样就会导致同一个IP或服务在两端同时存在而发生冲突,最严重的是两台主机占用同一个VIP地址,当用户写入数据时可能会分别写入到两端,这可能会导致服务器两端的数据不一致或造成数据丢失,这种情况就称为裂脑。
2.2 导致裂脑发生的原因
- 高可用服务器对之间心跳线链路发生故障导致无法正常通信。
- 心跳线坏了(包括断了,老化)
- 网卡及相关驱动坏了,IP配置及冲突问题(网卡直连)
- 心跳线间连接的设备故障(网卡及交换机)
- 仲裁机器出问题(采用仲裁方案)
- 高可用服务器上开启了iptables防火墙阻挡了心跳信息传输。
- 高可用服务器上心跳网卡地址等信息配置不正确导致发送心跳失败。
- Keepalived配置里同一VRRP实例中virtual_router_id两端参数配置不一致
- 其他服务器配置不当等原因,如心跳方式不同,心跳广播冲突,软件bug等。
2.3 预防裂脑的常见方案
- 同时使用串行电缆和以太网电缆连接,同时用两条心跳线路,保证一条线路坏了另一条依然能传递心跳消息。
- 当检测到裂脑时强行关闭一个心跳节点(需特殊设备支持,如Stonith、fence),相当于备节点接收不到心跳信息,通过单独的线路发送关机命令关闭主节点电源。
- 做好对裂脑的监控报警,在问题发生时人为第一时间介入仲裁,降低损失。
2.4 解决Keeplived裂脑的常见方案
对于前端web负载均衡器的高可用裂脑问题对普通业务的影响是可以忍受的,如果是数据库或者存储的业务一般出现裂脑问题就非常严重了。
- 如果开启了防火墙,一定要让心跳信息通过,一般通过允许IP段的形式解决。
- 可以拉一条以太网网线或者串口线作为主备节点心跳线路的冗余。
- 开发检测程序通过监控软件(例如Nagios)检测裂脑。
2.5 裂脑判断思想
2.5.1 简单判断思想
只要备节点出现VIP报警,这个报警有两种情况:一是主机宕机备机进行接管,二是主机没宕机,发生裂脑。不管属于那种情况都进行报警,人为查看判断及解决。
2.5.2 比较严谨的判断
备节点出现对应VIP,并且主节点及对应服务还活着就说明裂脑了。
2.6 裂脑监控脚本示例
2.6.1 编写脚本
#!/bin/bash lb_IP=10.0.0.5 lb_VIP=10.0.0.12 while true do #检查master主机是否存活 ping -c 2 -W 3 ${lb_IP} &> /dev/null #检测backup主机是否也存活 #if [ $ -eq 0 -a `ip address show | grep -c "${backup_IP}"` -ne 0 ];then if [ $ -eq 0 -a `ip address show | grep "${backup_IP}" | wc -l` -ne 0 ];then echo "keepalived is error !!!" else echo "keepalived is ok !!!" fi sleep 3 done
2.6.2 在备节点进行测试
- 正常情况下:
[root@lb02 ~]# sh /server/scripts/check_keepalived.sh keepalived is ok !!! keepalived is ok !!! keepalived is ok !!! ...省略部分输出...
- 主节点宕机情况:
[root@lb02 ~]# sh /server/scripts/check_keepalived.sh keepalived is ok !!! keepalived is error !!! keepalived is error !!! keepalived is error !!! ...省略部分输出...
- 主节点关闭情况下:
[root@lb02 ~]# sh /server/scripts/check_keepalived.sh keepalived is error !!! keepalived is error !!! keepalived is ok !!! keepalived is ok !!! keepalived is ok !!! ...省略部分输出...
- 主节点keepalived服务重启后:
[root@lb02 ~]# sh /server/scripts/check_keepalived.sh keepalived is ok !!! keepalived is ok !!! keepalived is ok !!! keepalived is ok !!! ...省略部分输出...
第3章 Keepalived双实例双主模式配置
多实例多业务双向主备模式:即A业务在lb01上是主模式,在lb02上是备模式;B在lb01上是备模式,在lb02上是主模式。
3.1 主机架构规划
主机名 | IP地址 | 说明 |
lb01 | 10.0.0.5 | VIP:10.0.0.12(用于绑定A服务www.leon.com域名) |
lb02 | 10.0.0.6 | VIP:10.0.0.13(用于绑定A服务bbs.leon.com域名) |
3.3 lb01主机配置
[root@lb01 ~]# cp /etc/keepalived/keepalived.conf{,.bak-single} [root@lb01 ~]# vim /etc/keepalived/keepalived.conf ! Configuration File for keepalived global_defs { notification_email { [email protected] } notification_email_from [email protected] smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id lb01 } vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 55 priority 150 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 10.0.0.12/24 dev eth0 label eth0:1 } } vrrp_instance VI_2 { state BACKUP interface eth0 virtual_router_id 56 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 10.0.0.13/24 dev eth0 label eth0:2 } }
3.4 lb02主机配置
[root@lb02 ~]# cp /etc/keepalived/keepalived.conf{,.bak-single} [root@lb02 ~]# vim /etc/keepalived/keepalived.conf ! Configuration File for keepalived global_defs { notification_email { [email protected] } notification_email_from [email protected] smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id lb02 } vrrp_instance VI_1 { state BACKUP interface eth0 virtual_router_id 55 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 10.0.0.12/24 dev eth0 label eth0:1 } } vrrp_instance VI_2 { state MASTER interface eth0 virtual_router_id 56 priority 150 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 10.0.0.13/24 dev eth0 label eth0:2 } }
3.5 重启Keepalived服务
3.5.1 lb01服务器情况
#平滑重启,建议重启方式 [root@lb01 ~]# service keepalived reload Reloading keepalived: [ OK ] [root@lb01 ~]# ip addr | egrep "10.0.0.12|10.0.0.13" inet 10.0.0.12/24 scope global secondary eth0:1
3.5.2 lb02服务器情况
#强制重启,工作中尽量使用平滑重启 [root@lb02 ~]# service keepalived restart Stopping keepalived: [ OK ] Starting keepalived: [ OK ] [root@lb02 ~]# ip addr | egrep "10.0.0.12|10.0.0.13" inet 10.0.0.13/24 scope global secondary eth0:2
3.6 进行切换实验
3.6.1 停止lb01的keepalived服务
[root@lb01 ~]# service keepalived stop Stopping keepalived: [ OK ] [root@lb01 ~]# ip addr | egrep "10.0.0.12|10.0.0.13" [root@lb01 ~]#
3.6.2 查看lb02的keepalived情况
[root@lb02 ~]# ip addr | egrep "10.0.0.12|10.0.0.13" inet 10.0.0.13/24 scope global secondary eth0:2 inet 10.0.0.12/24 scope global secondary eth0:1
3.6.3 重启lb01的keepalived服务
[root@lb01 ~]# service keepalived start Starting keepalived: [ OK ] [root@lb01 ~]# ip addr | egrep "10.0.0.12|10.0.0.13" inet 10.0.0.12/24 scope global secondary eth0:1
3.6.4 查看lb02的keepalived情况
[root@lb02 ~]# ip addr | egrep "10.0.0.12|10.0.0.13" inet 10.0.0.13/24 scope global secondary eth0:2
3.6.5 停止lb02的keepalived服务
[root@lb02 ~]# service keepalived stop Stopping keepalived: [ OK ] [root@lb02 ~]# ip addr | egrep "10.0.0.12|10.0.0.13" [root@lb02 ~]#
3.6.6 查看lb01的keepalived情况
[root@lb01 ~]# ip addr | egrep "10.0.0.12|10.0.0.13" inet 10.0.0.12/24 scope global secondary eth0:1 inet 10.0.0.13/24 scope global secondary eth0:2
3.6.7 重启lb02的keepalived服务
[root@lb02 ~]# service keepalived start Starting keepalived: [ OK ] [root@lb02 ~]# ip addr | egrep "10.0.0.12|10.0.0.13" inet 10.0.0.13/24 scope global secondary eth0:2
3.6.8 查看lb02的keepalived情况
[root@lb01 ~]# ip addr | egrep "10.0.0.12|10.0.0.13" inet 10.0.0.12/24 scope global secondary eth0:1
3.7 双实例双主模式的配置文件对比
[root@lb01 ~]# diff keepalived.conf.lb01 keepalived.conf.lb02 10c10 < router_id lb01 --- > router_id lb02 14c14 < state MASTER --- > state BACKUP 17c17 < priority 150 --- > priority 100 29c29 < state BACKUP --- > state MASTER 32c32 < priority 100 --- > priority 150
第4章 Nginx负载均衡配合Keepalived服务实战
4.1 主机架构图
4.2 lb01和lb02上配置负载均衡
[root@lb01 ~]# vim /usr/local/nginx/conf/nginx.conf worker_processes 1; events { worker_connections 1024; } http { include mime.types; default_type application/octet-stream; sendfile on; keepalive_timeout 65; upstream server_pools { server 10.0.0.7:80; server 10.0.0.8:80; server 10.0.0.9:80; } server { listen 10.0.0.12:80; server_name www.leon.com; location / { proxy_pass http://server_pools; proxy_set_header Host $host; proxy_set_header X-Forwarded-For $remote_addr; } } server { listen 10.0.0.13:80; server_name bbs.leon.com; location / { proxy_pass http://server_pools; proxy_set_header Host $host; proxy_set_header X-Forwarded-For $remote_addr; } } } #重启nginx [root@lb01 ~]# nginx -s stop [root@lb01 ~]# nginx
4.3 lb01上配置Keepalived服务
[root@lb01 ~]# vim /etc/keepalived/keepalived.conf ! Configuration File for keepalived global_defs { notification_email { [email protected] } smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id lb01 } vrrp_script chk_nginx_proxy { script "/server/scripts/chk_nginx_proxy.sh" interval 2 weight 2 } vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 55 priority 150 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 10.0.0.12/24 dev eth0 label eth0:1 } track_script { chk_nginx_proxy } } vrrp_instance VI_2 { state BACKUP interface eth0 virtual_router_id 56 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 10.0.0.13/24 dev eth0 label eth0:2 } }
4.4 lb02上配置Keepalived服务
[root@lb02 ~]# vim /etc/keepalived/keepalived.conf ! Configuration File for keepalived global_defs { notification_email { [email protected] } notification_email_from [email protected] smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id lb02 } vrrp_instance VI_1 { state BACKUP interface eth0 virtual_router_id 55 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 10.0.0.12/24 dev eth0 label eth0:1 } } vrrp_instance VI_2 { state MASTER interface eth0 virtual_router_id 56 priority 150 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 10.0.0.13/24 dev eth0 label eth0:2 } }
4.5 更改客户端hosts
10.0.0.12 www.leon.com bbs.leon.com
4.6 访问测试
4.6.1 负载均衡服务器均正常工作时:
4.6.2 lb01负载均衡器宕机的时候:
[root@lb01 ~]# service keepalived stop Stopping keepalived: [ OK ] [root@lb01 scripts]# ip addr | grep 10.0.0 inet 10.0.0.5/24 brd 10.0.0.255 scope global eth0 [root@lb02 ~]# ip addr | grep 10.0.0 #lb02负载均衡器已经接管服务 inet 10.0.0.6/24 brd 10.0.0.255 scope global eth0 inet 10.0.0.13/24 scope global secondary eth0:2 inet 10.0.0.12/24 scope global secondary eth0:1
- 已经切换为lb02负载设备:
4.6.3 lb01负载均衡器再次启动后:
[root@lb01 ~]# service keepalived start Starting keepalived: [ OK ] [root@lb01 scripts]# ip addr | grep 10.0.0 inet 10.0.0.5/24 brd 10.0.0.255 scope global eth0 inet 10.0.0.12/24 scope global secondary eth0:1 #lb01主机已经重新接管服务 [root@lb02 ~]# ip addr | grep 10.0.0 inet 10.0.0.6/24 brd 10.0.0.255 scope global eth0 inet 10.0.0.13/24 scope global secondary eth0:2
4.6.4 lb02负载均衡器宕机时:
[root@lb02 ~]# service keepalived stop Stopping keepalived: [ OK ] [root@lb02 ~]# ip addr | grep 10.0.0 inet 10.0.0.6/24 brd 10.0.0.255 scope global eth0 [root@lb01 scripts]# ip addr | grep 10.0.0 #lb01主机接管服务 inet 10.0.0.5/24 brd 10.0.0.255 scope global eth0 inet 10.0.0.12/24 scope global secondary eth0:1 inet 10.0.0.13/24 scope global secondary eth0:2
4.6.5 lb02负载均衡器再次启动后:
[root@lb02 ~]# service keepalived start Starting keepalived: [ OK ] [root@lb02 ~]# ip addr | grep 10.0.0 inet 10.0.0.6/24 brd 10.0.0.255 scope global eth0 inet 10.0.0.13/24 scope global secondary eth0:2 #lb02已经重新接管服务 [root@lb01 scripts]# ip addr | grep 10.0.0 inet 10.0.0.5/24 brd 10.0.0.255 scope global eth0 inet 10.0.0.12/24 scope global secondary eth0:1

我的微信
如果有技术上的问题可以扫一扫我的微信