MySQL高可用MHA(十三)

第1章 MHA介绍

1.1 软件简介

MHA能够在较短的时间内实现自动故障检测和故障转移,通常在10-30秒以内,在复制架构中MHA能够很好地解决复制过程中的数据一致性问题,由于不需要在现有的replication中添加额外的服务器,仅需要一个manager节点,而一个Manager能管理多套复制,所以能大大地节约服务器的数量,另外MHA安装简单,无性能损耗,以及不需要修改现有的复制部署也是它的优势之处。

MHA还提供在线主库切换的功能,能够安全地切换当前运行的主库到一个新的主库中(通过将从库提升为主库),大概0.5-2秒内即可完成。

MHA由两部分组成:MHA Manager(管理节点)和MHA Node(数据节点),MHA Manager可以独立部署在一台独立的机器上管理多个Master-Slave集群,也可以部署在一台Slave上。当Master出现故障时,它可以自动将最新数据的Slave提升为新的Master,然后将所有其他的Slave重新指向新的Master,整个故障转移过程对应用程序是完全透明的。

1.2 工作流程

  • 把宕机的master二进制日志保存下来
  • 找到binlog位置点最新的slave
  • 在binlog位置点最新的slave上用relay-log(差异日志)修复其他slave
  • 将宕机的master上保存下来的二进制日志恢复到含有最新位置点的slave上
  • 将含有最新位置点binlog所在的slave提升为master
  • 将其他slave重新指向新提升的master,并开启主从复制

1.3 MHA工具介绍

MHA软件由两部分组成,Manager工具包和Node工具包具体说明如下:

Manager工具包主要包括以下几个工具:

masterha_check_ssh              # 检查MHA的SSH-key 配置状况
masterha_check_repl             # 检查MySQL主从复制情况
masterha_manger                 # 启动MHA
masterha_check_status           # 检测当前MHA运行状态
masterha_master_monitor         # 检测master是否宕机
masterha_master_switch          # 控制故障转移(自动或者手动)
masterha_conf_host              # 添加或删除配置的server信息
masterha_secondary_check        # 从远程服务器建立TCP连接
masterha_stop                   # 停止MHA

Node工具包(这些工具通常由MHA Manager的脚本触发,无需人为操作)主要包括以下几个工具:

save_binary_logs                # 保存宕机的master的binlog
apply_diff_relay_logs           # 识别relay log的差异
filter_mysqlbinlog              # 防止回滚事件
purge_relay_logs                # 清除中继日志(不会阻塞SQL线程)

1.4 MHA优点总结

  • 自动故障转移块
  • 主库崩溃不存在数据一致性问题
  • 不需要对当前mysql环境做重大修改
  • 不需要添加额外的服务器(仅一台manager就可管理上百个replication)
  • 性能优秀,可工作在半同步复制和异步复制,当监控mysql状态时,仅需要每隔N秒向master发送ping包(默认3秒),所以对性能无影响,你可以理解为MHA的性能和简单的主从复制框架性能一样
  • 只要replication支持的存储引擎,MHA都支持,不会局限于innodb

第2章 MySQL环境准备

2.1 环境检查

[root@db01 ~]# cat /etc/redhat-release
CentOS release 6.9 (Final)
[root@db01 ~]# uname -r
2.6.32-696.el6.x86_64
[root@db01 ~]# hostname -I
10.0.0.41 172.16.1.41

2.2 安装MySQL

参照《MySQL数据库多种安装方法介绍》

2.3 配置主从复制

参照《MySQL主从同步》

2.3.1 修改配置文件开启GTID

2.3.1.1 主库配置文件

[root@db01 ~]# vim /etc/my.cnf
[mysqld]
server-id = 41

basedir = /usr/local/mysql
datadir = /usr/local/mysql/data
socket = /tmp/mysql.sock
port = 3306

log-bin = /usr/local/mysql/data/mysql-bin

gtid_mod=ON
log_slave_updates
enforce_gtid_consistency

[mysqld_safe]
log-error= /tmp/mysql.err
pid-file= /usr/local/mysql/data/mysql.pid

[client]
port = 3306
socket = /tmp/mysql.sock

[root@db01 ~]# /etc/init.d/mysqld restart

2.3.1.2 从库db02配置文件

[root@db02 ~]# vim /etc/my.cnf
[mysqld]
server-id = 42

basedir = /usr/local/mysql
datadir = /usr/local/mysql/data
socket = /tmp/mysql.sock
port = 3306

log-bin = /usr/local/mysql/data/mysql-bin

gtid_mod=ON                     # 开启GTID
log_slave_updates
enforce_gtid_consistency

relay_log_purge = 0             # 禁用relay-log
read_only = 1                   # 设置从库只读

[mysqld_safe]
log-error= /tmp/mysql.err
pid-file= /usr/local/mysql/data/mysql.pid

[client]
port = 3306
socket = /tmp/mysql.sock

[root@db02 ~]# /etc/init.d/mysqld restart

2.3.1.3 从库db03配置文件

[root@db03 ~]# vim /etc/my.cnf
[mysqld]
server-id = 43

basedir = /usr/local/mysql
datadir = /usr/local/mysql/data
socket = /tmp/mysql.sock
port = 3306

log-bin = /usr/local/mysql/data/mysql-bin

gtid_mod=ON
log_slave_updates
enforce_gtid_consistency

relay_log_purge = 0

[mysqld_safe]
log-error= /tmp/mysql.err
pid-file= /usr/local/mysql/data/mysql.pid

[client]
port = 3306
socket = /tmp/mysql.sock

[root@db03 ~]# /etc/init.d/mysqld restar

2.3.2 查看GTID的状态

mysql> show global variables like '%gtid%';
+---------------------------------+----------------------------------------+
| Variable_name                   | Value                                  |
+---------------------------------+----------------------------------------+
| binlog_gtid_simple_recovery     | OFF                                    |
| enforce_gtid_consistency        | ON                                     |
| gtid_executed                   | 0b1a934f-a67f-11e7-95e2-000c2982d598:1 |
| gtid_mode                       | ON                                     |
| gtid_owned                      |                                        |
| gtid_purged                     |                                        |
| simplified_binlog_gtid_recovery | OFF                                    |
+---------------------------------+----------------------------------------+
7 rows in set (0.11 sec)

2.3.3 从库连接主库命令

CHANGE MASTER TO
  MASTER_HOST='10.0.0.41',
  MASTER_USER='repl',
  MASTER_PASSWORD='123456',
  MASTER_PORT=3306,
  MASTER_AUTO_POSITION=1;

2.3.4 问题汇总

【错误信息】:

ERROR 1776 (HY000): Parameters MASTER_LOG_FILE, MASTER_LOG_POS, RELAY_LOG_FILE and RELAY_LOG_POS cannot be set when MASTER_AUTO_POSITION is active.

【解决办法】:

不要使用MASTER_LOG_FILE和MASTER_LOG_POS参数,使用以下参数代替:
MASTER_AUTO_POSITION=1;

第3章 部署MHA

3.1 环境准备(所有节点)

3.1.1 安装依赖包

[root@db01 ~]# yum install -y perl-DBD-MySQL

3.1.2 安装node包

[root@db01 ~]# rpm -ivh /server/tools/mha4mysql-node-0.56-0.el6.noarch.rpm

3.1.3 【仅主库】上创建mha用户

[root@db01 ~]# mysql -uroot -p123456
mysql> grant all privileges on *.* to mha@'10.0.0.%' identified by '123456';

3.1.4 创建软链接(防止MHA复制时报错)

[root@db01 ~]# ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
[root@db01 ~]# ln -s /usr/local/mysql/bin/mysql /usr/local/mysql

3.2 部署管理节点

3.2.1 安装依赖包

[root@db03 ~]# yum install -y perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes

3.2.2 安装管理包

[root@db03 ~]# rpm -ivh /server/tools/mha4mysql-manager-0.56-0.el6.noarch.rpm

3.2.3 创建MHA目录

[root@db03 ~]# mkdir -p /etc/mha
[root@db03 ~]# mkdir -p /var/log/mha/mysql

3.2.4 编辑配置文件

[root@db03 ~]# vim /etc/mha/mysql.cnf
[server default]
manager_log=/var/log/mha/mysql/manager
manager_workdir=/var/log/mha/mysql
master_binlog_dir=/usr/local/mysql/data
user=mha
password=123456
ping_interval=2
repl_password=123456
repl_user=repl
ssh_user=root

[server1]
hostname=10.0.0.41
port=3306

[server2]
candidate_master=1
check_repl_delay=0
hostname=10.0.0.42
port=3306

[server3]
hostname=10.0.0.43
port=3306
  • 配置文件解析:
[server default]
# 设置manager的日志
manager_log=/var/log/mha/mysql/manager
# 设置manager的工作目录
manager_workdir=/var/log/mha/mysql
# 设置master保存binlog的位置,以便MHA可以找到master的日志,这里设置为mysql的数据目录
master_binlog_dir=/usr/local/mysql/data
# 设置自动failover时候的切换脚本
master_ip_failover_script=/server/scripts/master_ip_failover.sh
# 设置手动切换时候的切换脚本
master_ip_online_change_script=/server/scripts/master_ip_online_change.sh
# 设置监控用户
user=mha
# 设置mysql中root用户的密码,这个密码是前文中创建监控用户的那个密码
password=123456
# 设置监控主库发送ping包的时间间隔,尝试三次没有回应的时候自动进行failover
ping_interval=2
# 设置远程mysql发生切换时binlog的保存位置
remote_workdir=/tmp
# 设置复制环境中的复制用户名
repl_user=repl
# 设置复制用户的密码
repl_password=123456
# 设置发生切换后发送的报警的脚本
report_script=/server/scripts/send_report.sh
# 一旦MHA到server2的监控之间出现问题,MHA Manager将会尝试从server3登录到server2
secondary_check_script=/usr/local/bin/masterha_secondary_check -s server3 -s server2 --user=mha --master_host=server2 --master_ip=10.0.0.41 --master_port=3306
# 设置故障发生后关闭故障主机的脚本(该脚本的作用是关闭主机防止发生脑裂)
shutdown_script=/server/scripts/shutdown_bad_hosts.sh
# 设置ssh的登录用户名
ssh_user=root

[server1]
hostname=10.0.0.41
port=3306

[server2]
hostname=10.0.0.42
port=3306
# 设置为候选master,如果设置该参数以后发生主从切换以后将会将此从库提升为主库
# 即使这个主库不是集群中event最新的slave
candidate_master=1
# 默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的
# master,因为对于这个slave的恢复需要花费很长时间,通过设置check_repl_delay=0,MHA触发切换在选择
# 一个新的master的时候将会忽略复制延时,这个参数对于设置了candidate_master=1的主机非常有用,因为
# 这个候选主在切换的过程中一定是新的master
check_repl_delay=0

[server3]
hostname=10.0.0.43
port=3306

3.3 配置SSH信任(所有节点)

3.3.1 开启XShell的【发送键输入到所有会话】

3.3.2 创建密码

[root@db01 ~]# ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

3.3.3 分发密钥

[root@db01 ~]# ssh-copy-id -i ~/.ssh/id_dsa.pub root@10.0.0.41
[root@db01 ~]# ssh-copy-id -i ~/.ssh/id_dsa.pub root@10.0.0.42
[root@db01 ~]# ssh-copy-id -i ~/.ssh/id_dsa.pub root@10.0.0.43

注意:以上命令每台主机都要操作一次,否则会连接失败。

3.4 启动前测试

[root@db03 ~]# masterha_check_ssh --conf=/etc/mha/mysql.cnf
Thu Nov 30 18:06:24 2017 - [info] All SSH connection tests passed successfully. # 此提示表示成功
[root@db03 ~]# masterha_check_repl --conf=/etc/mha/mysql.cnf
MySQL Replication Health is OK.                     #出现此提示表示测试成功

3.5 启动MHA

[root@db03 ~]# nohup masterha_manager --conf=/etc/mha/mysql.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/mysql/manager.log 2>&1 &

3.6 切换master测试

3.6.1 切换之前

# 查看db02数据库信息
[root@db02 ~]# mysql -uroot -p123456
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.0.0.41
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000003
          Read_Master_Log_Pos: 590
               Relay_Log_File: db02-relay-bin.000008
                Relay_Log_Pos: 603
        Relay_Master_Log_File: mysql-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
# 查看db03数据库信息
[root@db03 ~]# mysql -uroot -p123456
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.0.0.41
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000003
          Read_Master_Log_Pos: 440
               Relay_Log_File: db03-relay-bin.000002
                Relay_Log_Pos: 408
        Relay_Master_Log_File: mysql-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

3.6.2 关闭主库

[root@db01 ~]# /etc/init.d/mysqld stop
Shutting down MySQL...... SUCCESS!

3.6.3 切换之后

# 查看db02数据库信息
[root@db02 ~]# mysql -uroot -p123456
mysql> show slave status\G
Empty set (0.00 sec)                            # 此时从库线程已经为空
# 查看db03数据库信息
[root@db03 ~]# mysql -uroot -p123456
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.0.0.42        # 主库已经切换为42
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000004
          Read_Master_Log_Pos: 440
               Relay_Log_File: db03-relay-bin.000002
                Relay_Log_Pos: 408
        Relay_Master_Log_File: mysql-bin.000004
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

3.7 查看日志文件

[root@db03 ~]# tail -f /var/log/mha/mysql/manager
Thu Nov 30 19:57:58 2017 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.0.42', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';
# 主库恢复后使用此命令(手动修改密码)将主库变为从库,再重新运行MHA启动命令即可恢复高可用
----- Failover Report -----         # 可以通过以下信息进行监控

mysql: MySQL Master failover 10.0.0.41(10.0.0.41:3306) to 10.0.0.42(10.0.0.42:3306) succeeded

Master 10.0.0.41(10.0.0.41:3306) is down!

Check MHA Manager logs at db03:/var/log/mha/mysql/manager for details.

Started automated(non-interactive) failover.
Selected 10.0.0.42(10.0.0.42:3306) as a new master.
10.0.0.42(10.0.0.42:3306): OK: Applying all logs succeeded.
10.0.0.43(10.0.0.43:3306): OK: Slave started, replicating from 10.0.0.42(10.0.0.42:3306)
10.0.0.42(10.0.0.42:3306): Resetting slave info succeeded.
Master failover to 10.0.0.42(10.0.0.42:3306) completed successfully.

3.8 切换后恢复

3.8.1 查看日志获取恢复命令

[root@db03 ~]# grep -i "change master to" /var/log/mha/mysql/manager
Thu Nov 30 22:09:22 2017 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.0.42', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';

3.8.2 启动原主库及恢复

[root@db01 ~]# /etc/init.d/mysqld start
[root@db01 ~]# mysql -uroot -p123456
# 执行从上面日志文件中获取的命令
mysql> CHANGE MASTER TO MASTER_HOST='10.0.0.42', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='123456';
Query OK, 0 rows affected, 2 warnings (0.22 sec)

mysql> start slave;
Query OK, 0 rows affected (0.04 sec)

3.8.3 编辑HMA服务端配置文件

[root@db03 ~]# vim /etc/mha/mysql.cnf
[server default]
manager_log=/var/log/mha/mysql/manager
manager_workdir=/var/log/mha/mysql
master_binlog_dir=/usr/local/mysql/data
master_ip_failover_script=/server/scripts/master_ip_failover.sh
password=123456
ping_interval=2
repl_password=123456
repl_user=repl
ssh_user=root
user=mha

[server1]
hostname=10.0.0.41
port=3306

[server2]
hostname=10.0.0.42
port=3306

[server3]
hostname=10.0.0.43
port=3306

3.8.4 验证并启动MHA

[root@db03 ~]# nohup masterha_manager --conf=/etc/mha/mysql.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/mysql/manager.log 2>&1 &

第4章 配置VIP漂移

4.1 IP漂移的两种方式

  • 通过keepalived的方式
  • 通过MHA自带脚本方式

4.2 MHA脚本方式

4.2.1 修改配置文件

[root@db03 ~]# vim /etc/mha/mysql.cnf
[server default]
master_ip_failover_script=/server/scripts/master_ip_failover.sh

4.2.2 编辑脚本

[root@db03 ~]# vim /server/scripts/master_ip_failover.sh
#!/usr/bin/env perl

use strict;
use warnings FATAL => 'all';

use Getopt::Long;

my (
    $command, $ssh_user, $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);

my $vip = '10.0.0.101/24';              # 编辑此处VIP信息
my $key = '0';                          # 编辑此处网卡标签
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";

GetOptions(
    'command=s' => \$command,
    'ssh_user=s' => \$ssh_user,
    'orig_master_host=s' => \$orig_master_host,
    'orig_master_ip=s' => \$orig_master_ip,
    'orig_master_port=i' => \$orig_master_port,
    'new_master_host=s' => \$new_master_host,
    'new_master_ip=s' => \$new_master_ip,
    'new_master_port=i' => \$new_master_port,
);

exit &main();

sub main {
    print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
    if ( $command eq "stop" || $command eq "stopssh" ) {
        my $exit_code = 1;
        eval {
            print "Disabling the VIP on old master: $orig_master_host \n";
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "start" ) {
        my $exit_code = 10;
        eval {
             print "Enabling the VIP - $vip on the new master - $new_master_host \n";
             &start_vip();
             $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}
sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
    return 0 unless ($ssh_user);
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
    print"Usage: master_ip_failover.sh --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

4.2.3 脚本增加可执行权限

[root@db03 ~]# chmod +x /server/scripts/master_ip_failover.sh
[root@db03 ~]# /etc/init.d/mysqld restart

4.2.4 在主库上手动绑定VIP

[root@db01 ~]# ifconfig eth0:0 10.0.0.101/24
[root@db01 ~]# ip a | grep eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    inet 10.0.0.41/24 brd 10.0.0.255 scope global eth0
    inet 10.0.0.101/24 brd 10.0.0.255 scope global secondary eth0:0

4.2.5 测试ip漂移

# 关闭主库
[root@db01 ~]# service mysqld stop
Shutting down MySQL...... SUCCESS!
# 查看从库VIP信息
[root@db02 ~]# ip a | grep eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    inet 10.0.0.42/24 brd 10.0.0.255 scope global eth0
inet 10.0.0.101/24 brd 10.0.0.255 scope global secondary eth0:0
# 查看主库VIP信息
[root@db01 ~]# ip a | grep eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
inet 10.0.0.41/24 brd 10.0.0.255 scope global eth0

第5章 配置binlog-server

5.1 修改mha配置文件

[root@db03 ~]# vim /etc/mha/mysql.cnf
[binlog]
no_master=1
hostname=10.0.0.43
master_binlog_dir=/usr/local/mysql/data/mysql/binlog/

5.2 备份binlog

# 创建备份binglog目录
[root@db03 ~]# mkdir -p /usr/local/mysql/data/mysql/binlog/
[root@db03 ~]# cd /usr/local/mysql/data/mysql/binlog/
# 备份binlog
[root@db03 binlog]# mysqlbinlog -R --host=10.0.0.41 --user=mha --password=123456 --raw --stop-never mysql-bin.000001 &

5.3 测试binglog备份

# 查看备份的binglog
[root@db03 binlog]# ll
total 16
-rw-rw---- 1 root root 372 Nov 30 22:30 mysql-bin.000001
-rw-rw---- 1 root root 143 Nov 30 22:30 mysql-bin.000002
-rw-rw---- 1 root root 613 Nov 30 22:30 mysql-bin.000003
-rw-rw---- 1 root root 120 Nov 30 22:30 mysql-bin.000004
# 登录主库刷新binglog
[root@db01 ~]# mysql -uroot -p123456
mysql> flush logs;
# 再次查看binglog备份
[root@db03 binlog]# ll
total 20
-rw-rw---- 1 root root 372 Nov 30 22:30 mysql-bin.000001
-rw-rw---- 1 root root 143 Nov 30 22:30 mysql-bin.000002
-rw-rw---- 1 root root 613 Nov 30 22:30 mysql-bin.000003
-rw-rw---- 1 root root 238 Nov 30 22:33 mysql-bin.000004
-rw-rw---- 1 root root 120 Nov 30 22:33 mysql-bin.000005
温馨提示:本文最后更新于2022-12-20 20:57:41,已超过431天没有更新。某些文章具有时效性,若文章内容或图片资源有错误或已失效,请联系站长。谢谢!
转载请注明本文链接:https://blog.leonshadow.cn/763482/1939.html
© 版权声明
THE END
喜欢就支持一下吧
点赞0 分享