第1章 MHA介绍
1.1 软件简介
MHA能够在较短的时间内实现自动故障检测和故障转移,通常在10-30秒以内,在复制架构中MHA能够很好地解决复制过程中的数据一致性问题,由于不需要在现有的replication中添加额外的服务器,仅需要一个manager节点,而一个Manager能管理多套复制,所以能大大地节约服务器的数量,另外MHA安装简单,无性能损耗,以及不需要修改现有的复制部署也是它的优势之处。
MHA还提供在线主库切换的功能,能够安全地切换当前运行的主库到一个新的主库中(通过将从库提升为主库),大概0.5-2秒内即可完成。
MHA由两部分组成:MHA Manager(管理节点)和MHA Node(数据节点),MHA Manager可以独立部署在一台独立的机器上管理多个Master-Slave集群,也可以部署在一台Slave上。当Master出现故障时,它可以自动将最新数据的Slave提升为新的Master,然后将所有其他的Slave重新指向新的Master,整个故障转移过程对应用程序是完全透明的。
1.2 工作流程
- 把宕机的master二进制日志保存下来
- 找到binlog位置点最新的slave
- 在binlog位置点最新的slave上用relay-log(差异日志)修复其他slave
- 将宕机的master上保存下来的二进制日志恢复到含有最新位置点的slave上
- 将含有最新位置点binlog所在的slave提升为master
- 将其他slave重新指向新提升的master,并开启主从复制
1.3 MHA工具介绍
MHA软件由两部分组成,Manager工具包和Node工具包具体说明如下:
Manager工具包主要包括以下几个工具:
masterha_check_ssh # 检查MHA的SSH-key 配置状况 masterha_check_repl # 检查MySQL主从复制情况 masterha_manger # 启动MHA masterha_check_status # 检测当前MHA运行状态 masterha_master_monitor # 检测master是否宕机 masterha_master_switch # 控制故障转移(自动或者手动) masterha_conf_host # 添加或删除配置的server信息 masterha_secondary_check # 从远程服务器建立TCP连接 masterha_stop # 停止MHA
Node工具包(这些工具通常由MHA Manager的脚本触发,无需人为操作)主要包括以下几个工具:
save_binary_logs # 保存宕机的master的binlog apply_diff_relay_logs # 识别relay log的差异 filter_mysqlbinlog # 防止回滚事件 purge_relay_logs # 清除中继日志(不会阻塞SQL线程)
1.4 MHA优点总结
- 自动故障转移块
- 主库崩溃不存在数据一致性问题
- 不需要对当前mysql环境做重大修改
- 不需要添加额外的服务器(仅一台manager就可管理上百个replication)
- 性能优秀,可工作在半同步复制和异步复制,当监控mysql状态时,仅需要每隔N秒向master发送ping包(默认3秒),所以对性能无影响,你可以理解为MHA的性能和简单的主从复制框架性能一样
- 只要replication支持的存储引擎,MHA都支持,不会局限于innodb
第2章 MySQL环境准备
2.1 环境检查
[[email protected] ~]# cat /etc/redhat-release CentOS release 6.9 (Final) [[email protected] ~]# uname -r 2.6.32-696.el6.x86_64 [[email protected] ~]# hostname -I 10.0.0.41 172.16.1.41
2.2 安装MySQL
2.3 配置主从复制
参照《MySQL主从同步》。
2.3.1 修改配置文件开启GTID
2.3.1.1 主库配置文件
[[email protected] ~]# vim /etc/my.cnf [mysqld] server-id = 41 basedir = /usr/local/mysql datadir = /usr/local/mysql/data socket = /tmp/mysql.sock port = 3306 log-bin = /usr/local/mysql/data/mysql-bin gtid_mod=ON log_slave_updates enforce_gtid_consistency [mysqld_safe] log-error= /tmp/mysql.err pid-file= /usr/local/mysql/data/mysql.pid [client] port = 3306 socket = /tmp/mysql.sock [[email protected] ~]# /etc/init.d/mysqld restart
2.3.1.2 从库db02配置文件
[[email protected] ~]# vim /etc/my.cnf [mysqld] server-id = 42 basedir = /usr/local/mysql datadir = /usr/local/mysql/data socket = /tmp/mysql.sock port = 3306 log-bin = /usr/local/mysql/data/mysql-bin gtid_mod=ON # 开启GTID log_slave_updates enforce_gtid_consistency relay_log_purge = 0 # 禁用relay-log read_only = 1 # 设置从库只读 [mysqld_safe] log-error= /tmp/mysql.err pid-file= /usr/local/mysql/data/mysql.pid [client] port = 3306 socket = /tmp/mysql.sock [[email protected] ~]# /etc/init.d/mysqld restart
2.3.1.3 从库db03配置文件
[[email protected] ~]# vim /etc/my.cnf [mysqld] server-id = 43 basedir = /usr/local/mysql datadir = /usr/local/mysql/data socket = /tmp/mysql.sock port = 3306 log-bin = /usr/local/mysql/data/mysql-bin gtid_mod=ON log_slave_updates enforce_gtid_consistency relay_log_purge = 0 [mysqld_safe] log-error= /tmp/mysql.err pid-file= /usr/local/mysql/data/mysql.pid [client] port = 3306 socket = /tmp/mysql.sock [[email protected] ~]# /etc/init.d/mysqld restar
2.3.2 查看GTID的状态
mysql> show global variables like '%gtid%'; +---------------------------------+----------------------------------------+ | Variable_name | Value | +---------------------------------+----------------------------------------+ | binlog_gtid_simple_recovery | OFF | | enforce_gtid_consistency | ON | | gtid_executed | 0b1a934f-a67f-11e7-95e2-000c2982d598:1 | | gtid_mode | ON | | gtid_owned | | | gtid_purged | | | simplified_binlog_gtid_recovery | OFF | +---------------------------------+----------------------------------------+ 7 rows in set (0.11 sec)
2.3.3 从库连接主库命令
CHANGE MASTER TO MASTER_HOST='10.0.0.41', MASTER_USER='repl', MASTER_PASSWORD='123456', MASTER_PORT=3306, MASTER_AUTO_POSITION=1;
2.3.4 问题汇总
【错误信息】:
ERROR 1776 (HY000): Parameters MASTER_LOG_FILE, MASTER_LOG_POS, RELAY_LOG_FILE and RELAY_LOG_POS cannot be set when MASTER_AUTO_POSITION is active.
【解决办法】:
不要使用MASTER_LOG_FILE和MASTER_LOG_POS参数,使用以下参数代替: MASTER_AUTO_POSITION=1;
第3章 部署MHA
3.1 环境准备(所有节点)
3.1.1 安装依赖包
[[email protected] ~]# yum install -y perl-DBD-MySQL
3.1.2 安装node包
[[email protected] ~]# rpm -ivh /server/tools/mha4mysql-node-0.56-0.el6.noarch.rpm
3.1.3 【仅主库】上创建mha用户
[[email protected] ~]# mysql -uroot -p123456 mysql> grant all privileges on *.* to [email protected]'10.0.0.%' identified by '123456';
3.1.4 创建软链接(防止MHA复制时报错)
[[email protected] ~]# ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog [[email protected] ~]# ln -s /usr/local/mysql/bin/mysql /usr/local/mysql
3.2 部署管理节点
3.2.1 安装依赖包
[[email protected] ~]# yum install -y perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes
3.2.2 安装管理包
[[email protected] ~]# rpm -ivh /server/tools/mha4mysql-manager-0.56-0.el6.noarch.rpm
3.2.3 创建MHA目录
[[email protected] ~]# mkdir -p /etc/mha [[email protected] ~]# mkdir -p /var/log/mha/mysql
3.2.4 编辑配置文件
[[email protected] ~]# vim /etc/mha/mysql.cnf [server default] manager_log=/var/log/mha/mysql/manager manager_workdir=/var/log/mha/mysql master_binlog_dir=/usr/local/mysql/data user=mha password=123456 ping_interval=2 repl_password=123456 repl_user=repl ssh_user=root [server1] hostname=10.0.0.41 port=3306 [server2] candidate_master=1 check_repl_delay=0 hostname=10.0.0.42 port=3306 [server3] hostname=10.0.0.43 port=3306
- 配置文件解析:
[server default] # 设置manager的日志 manager_log=/var/log/mha/mysql/manager # 设置manager的工作目录 manager_workdir=/var/log/mha/mysql # 设置master保存binlog的位置,以便MHA可以找到master的日志,这里设置为mysql的数据目录 master_binlog_dir=/usr/local/mysql/data # 设置自动failover时候的切换脚本 master_ip_failover_script=/server/scripts/master_ip_failover.sh # 设置手动切换时候的切换脚本 master_ip_online_change_script=/server/scripts/master_ip_online_change.sh # 设置监控用户 user=mha # 设置mysql中root用户的密码,这个密码是前文中创建监控用户的那个密码 password=123456 # 设置监控主库发送ping包的时间间隔,尝试三次没有回应的时候自动进行failover ping_interval=2 # 设置远程mysql发生切换时binlog的保存位置 remote_workdir=/tmp # 设置复制环境中的复制用户名 repl_user=repl # 设置复制用户的密码 repl_password=123456 # 设置发生切换后发送的报警的脚本 report_script=/server/scripts/send_report.sh # 一旦MHA到server2的监控之间出现问题,MHA Manager将会尝试从server3登录到server2 secondary_check_script=/usr/local/bin/masterha_secondary_check -s server3 -s server2 --user=mha --master_host=server2 --master_ip=10.0.0.41 --master_port=3306 # 设置故障发生后关闭故障主机的脚本(该脚本的作用是关闭主机防止发生脑裂) shutdown_script=/server/scripts/shutdown_bad_hosts.sh # 设置ssh的登录用户名 ssh_user=root [server1] hostname=10.0.0.41 port=3306 [server2] hostname=10.0.0.42 port=3306 # 设置为候选master,如果设置该参数以后发生主从切换以后将会将此从库提升为主库 # 即使这个主库不是集群中event最新的slave candidate_master=1 # 默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的 # master,因为对于这个slave的恢复需要花费很长时间,通过设置check_repl_delay=0,MHA触发切换在选择 # 一个新的master的时候将会忽略复制延时,这个参数对于设置了candidate_master=1的主机非常有用,因为 # 这个候选主在切换的过程中一定是新的master check_repl_delay=0 [server3] hostname=10.0.0.43 port=3306
3.3 配置SSH信任(所有节点)
3.3.1 开启XShell的【发送键输入到所有会话】
3.3.2 创建密码
[[email protected] ~]# ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
3.3.3 分发密钥
[[email protected] ~]# ssh-copy-id -i ~/.ssh/id_dsa.pub [email protected] [[email protected] ~]# ssh-copy-id -i ~/.ssh/id_dsa.pub [email protected] [[email protected] ~]# ssh-copy-id -i ~/.ssh/id_dsa.pub [email protected]
注意:以上命令每台主机都要操作一次,否则会连接失败。
3.4 启动前测试
[[email protected] ~]# masterha_check_ssh --conf=/etc/mha/mysql.cnf Thu Nov 30 18:06:24 2017 - [info] All SSH connection tests passed successfully. # 此提示表示成功 [[email protected] ~]# masterha_check_repl --conf=/etc/mha/mysql.cnf MySQL Replication Health is OK. #出现此提示表示测试成功
3.5 启动MHA
[[email protected] ~]# nohup masterha_manager --conf=/etc/mha/mysql.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/mysql/manager.log 2>&1 &
3.6 切换master测试
3.6.1 切换之前
# 查看db02数据库信息 [[email protected] ~]# mysql -uroot -p123456 mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 10.0.0.41 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000003 Read_Master_Log_Pos: 590 Relay_Log_File: db02-relay-bin.000008 Relay_Log_Pos: 603 Relay_Master_Log_File: mysql-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes # 查看db03数据库信息 [[email protected] ~]# mysql -uroot -p123456 mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 10.0.0.41 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000003 Read_Master_Log_Pos: 440 Relay_Log_File: db03-relay-bin.000002 Relay_Log_Pos: 408 Relay_Master_Log_File: mysql-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes
3.6.2 关闭主库
[[email protected] ~]# /etc/init.d/mysqld stop Shutting down MySQL...... SUCCESS!
3.6.3 切换之后
# 查看db02数据库信息 [[email protected] ~]# mysql -uroot -p123456 mysql> show slave status\G Empty set (0.00 sec) # 此时从库线程已经为空 # 查看db03数据库信息 [[email protected] ~]# mysql -uroot -p123456 mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 10.0.0.42 # 主库已经切换为42 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000004 Read_Master_Log_Pos: 440 Relay_Log_File: db03-relay-bin.000002 Relay_Log_Pos: 408 Relay_Master_Log_File: mysql-bin.000004 Slave_IO_Running: Yes Slave_SQL_Running: Yes
3.7 查看日志文件
[[email protected] ~]# tail -f /var/log/mha/mysql/manager Thu Nov 30 19:57:58 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.0.42', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx'; # 主库恢复后使用此命令(手动修改密码)将主库变为从库,再重新运行MHA启动命令即可恢复高可用 ----- Failover Report ----- # 可以通过以下信息进行监控 mysql: MySQL Master failover 10.0.0.41(10.0.0.41:3306) to 10.0.0.42(10.0.0.42:3306) succeeded Master 10.0.0.41(10.0.0.41:3306) is down! Check MHA Manager logs at db03:/var/log/mha/mysql/manager for details. Started automated(non-interactive) failover. Selected 10.0.0.42(10.0.0.42:3306) as a new master. 10.0.0.42(10.0.0.42:3306): OK: Applying all logs succeeded. 10.0.0.43(10.0.0.43:3306): OK: Slave started, replicating from 10.0.0.42(10.0.0.42:3306) 10.0.0.42(10.0.0.42:3306): Resetting slave info succeeded. Master failover to 10.0.0.42(10.0.0.42:3306) completed successfully.
3.8 切换后恢复
3.8.1 查看日志获取恢复命令
[[email protected] ~]# grep -i "change master to" /var/log/mha/mysql/manager Thu Nov 30 22:09:22 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.0.42', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';
3.8.2 启动原主库及恢复
[[email protected] ~]# /etc/init.d/mysqld start [[email protected] ~]# mysql -uroot -p123456 # 执行从上面日志文件中获取的命令 mysql> CHANGE MASTER TO MASTER_HOST='10.0.0.42', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='123456'; Query OK, 0 rows affected, 2 warnings (0.22 sec) mysql> start slave; Query OK, 0 rows affected (0.04 sec)
3.8.3 编辑HMA服务端配置文件
[[email protected] ~]# vim /etc/mha/mysql.cnf [server default] manager_log=/var/log/mha/mysql/manager manager_workdir=/var/log/mha/mysql master_binlog_dir=/usr/local/mysql/data master_ip_failover_script=/server/scripts/master_ip_failover.sh password=123456 ping_interval=2 repl_password=123456 repl_user=repl ssh_user=root user=mha [server1] hostname=10.0.0.41 port=3306 [server2] hostname=10.0.0.42 port=3306 [server3] hostname=10.0.0.43 port=3306
3.8.4 验证并启动MHA
[[email protected] ~]# nohup masterha_manager --conf=/etc/mha/mysql.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/mysql/manager.log 2>&1 &
第4章 配置VIP漂移
4.1 IP漂移的两种方式
- 通过keepalived的方式
- 通过MHA自带脚本方式
4.2 MHA脚本方式
4.2.1 修改配置文件
[[email protected] ~]# vim /etc/mha/mysql.cnf [server default] master_ip_failover_script=/server/scripts/master_ip_failover.sh
4.2.2 编辑脚本
[[email protected] ~]# vim /server/scripts/master_ip_failover.sh #!/usr/bin/env perl use strict; use warnings FATAL => 'all'; use Getopt::Long; my ( $command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port ); my $vip = '10.0.0.101/24'; # 编辑此处VIP信息 my $key = '0'; # 编辑此处网卡标签 my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip"; my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down"; GetOptions( 'command=s' => \$command, 'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, ); exit &main(); sub main { print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n"; if ( $command eq "stop" || $command eq "stopssh" ) { my $exit_code = 1; eval { print "Disabling the VIP on old master: $orig_master_host \n"; &stop_vip(); $exit_code = 0; }; if ([email protected]) { warn "Got Error: [email protected]\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { my $exit_code = 10; eval { print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); $exit_code = 0; }; if ([email protected]) { warn [email protected]; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { print "Checking the Status of the script.. OK \n"; exit 0; } else { &usage(); exit 1; } } sub start_vip() { `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`; } sub stop_vip() { return 0 unless ($ssh_user); `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } sub usage { print"Usage: master_ip_failover.sh --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n"; }
4.2.3 脚本增加可执行权限
[[email protected] ~]# chmod +x /server/scripts/master_ip_failover.sh [[email protected] ~]# /etc/init.d/mysqld restart
4.2.4 在主库上手动绑定VIP
[[email protected] ~]# ifconfig eth0:0 10.0.0.101/24 [[email protected] ~]# ip a | grep eth0 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 inet 10.0.0.41/24 brd 10.0.0.255 scope global eth0 inet 10.0.0.101/24 brd 10.0.0.255 scope global secondary eth0:0
4.2.5 测试ip漂移
# 关闭主库 [[email protected] ~]# service mysqld stop Shutting down MySQL...... SUCCESS! # 查看从库VIP信息 [[email protected] ~]# ip a | grep eth0 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 inet 10.0.0.42/24 brd 10.0.0.255 scope global eth0 inet 10.0.0.101/24 brd 10.0.0.255 scope global secondary eth0:0 # 查看主库VIP信息 [[email protected] ~]# ip a | grep eth0 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 inet 10.0.0.41/24 brd 10.0.0.255 scope global eth0
第5章 配置binlog-server
5.1 修改mha配置文件
[[email protected] ~]# vim /etc/mha/mysql.cnf [binlog] no_master=1 hostname=10.0.0.43 master_binlog_dir=/usr/local/mysql/data/mysql/binlog/
5.2 备份binlog
# 创建备份binglog目录 [[email protected] ~]# mkdir -p /usr/local/mysql/data/mysql/binlog/ [[email protected] ~]# cd /usr/local/mysql/data/mysql/binlog/ # 备份binlog [[email protected] binlog]# mysqlbinlog -R --host=10.0.0.41 --user=mha --password=123456 --raw --stop-never mysql-bin.000001 &
5.3 测试binglog备份
# 查看备份的binglog [[email protected] binlog]# ll total 16 -rw-rw---- 1 root root 372 Nov 30 22:30 mysql-bin.000001 -rw-rw---- 1 root root 143 Nov 30 22:30 mysql-bin.000002 -rw-rw---- 1 root root 613 Nov 30 22:30 mysql-bin.000003 -rw-rw---- 1 root root 120 Nov 30 22:30 mysql-bin.000004 # 登录主库刷新binglog [[email protected] ~]# mysql -uroot -p123456 mysql> flush logs; # 再次查看binglog备份 [[email protected] binlog]# ll total 20 -rw-rw---- 1 root root 372 Nov 30 22:30 mysql-bin.000001 -rw-rw---- 1 root root 143 Nov 30 22:30 mysql-bin.000002 -rw-rw---- 1 root root 613 Nov 30 22:30 mysql-bin.000003 -rw-rw---- 1 root root 238 Nov 30 22:33 mysql-bin.000004 -rw-rw---- 1 root root 120 Nov 30 22:33 mysql-bin.000005

我的微信
如果有技术上的问题可以扫一扫我的微信