MHA高可用方案实行
1.1MHA简介
MHA 在监控到 master 节点故障时,会提升其中拥有最新数据的 slave 节点成为新的master 节点,在此期间,MHA 会通过于其它从节点获取额外信息来避免一致性方面的问题。MHA 还提供了 master 节点的在线切换功能,即按需切换 master/slave 节点。
1.2MHA 服务的两种角色
MHA Manager(管理节点)和 MHA Node(数据节点):
MHA Manager:
通常单独部署在一台独立机器上管理多个 master/slave 集群(组),每个 master/slave 集群称作一个 application,用来管理统筹整个集群。
MHA node:
运行在每台 MySQL 服务器上(master/slave/manager),它**通过监控具备解析和清理 logs 功能的脚本来加快故障转移。
主要是接收管理节点所发出指令的代理,代理需要运行在每一个 mysql 节点上。简单讲 node 就是用来收集从节点服务器上所生成的 bin-log 。对比打算提升为新的主节点之上的从节点的是否拥有并完成操作,如果没有发给新主节点在本地应用后提升为主节点。
1.3MHA工作原理
MHA是在我们的主从数据库之间,主库出现问题的时候我们可以快速的在主从同步的基础上将主服务器的二进制日志进行保存,将中继日志应用到别的库中保存。从而达到我们重新进行选举master的工作,保证我们的数据不被破坏。然后将之前的主库和其他从库向心选举的主进行复制。从而再次达到正常主从同步。(但是之前的主库恢复过后不会再成为主库,只能作为从库继续运行)
MHA高可用实现过程
2.1相关配置
manager | 192.168.177.129 |
master | 192.168.177.130 |
slave1 | 192.168.177.131 |
slave2 | 192.168.177.132 |
用master与slave1和slave2之间作为主从同步,用manager配置MHA
master配置
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.socklog-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pidserver_id=1
log-bin=mysql-bin
log-slave-updates=1
skip-slave-start=1
skip_name_resolve
gtid-mode = on
enforce-gtid-consistency = true
relay-log=relay-log[mysql]
prompt = (\\u@\\h) [\d] >\\
no_auto_rehash[root@master ~]# systemctl restart mysqld
slave配置
###slave1
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.socklog-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pidserver_id=2
log-bin=mysql-bin
log-slave-updates=1
skip_name_resolve
gtid-mode = on
enforce-gtid-consistency = true
relay-log=relay-log
read_only=ON
relay_log_purge = 0[mysql]
prompt = (\\u@\\h) [\d] >\\
no_auto_rehash###slave2
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.socklog-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pidserver_id=3
log-bin=mysql-bin
log-slave-updates=1
skip_name_resolve
gtid-mode = on
enforce-gtid-consistency = true
relay-log=relay-log
read_only=ON
relay_log_purge = 0[mysql]
prompt = (\\u@\\h) [\d] >\\
no_auto_rehash
主从同步配置完之后,再次开机如果不能同步,我们可以使用先stop slave,然后reset slave最后在重新开启slave再次查看我们主从同步状态。
2.2配置用户授权
在master上进行
create user rep@'%' IDENTIFIED WITH MYSQL_NATIVE_PASSWORD BY 'Mhn@1234';grant replication slave,replication client on *.* to rep@'%';
2.3配置互相免密钥
使用xshell工具同时对多个主机配置
[root@master ~]# vim /etc/hosts
192.168.177.129 manager
192.168.177.130 master
192.168.177.131 slave1
192.168.177.132 slave2master上
[root@master ~]# ssh-keygen -f ~/.ssh/id_rsa -P '' -q
[root@master ~]# ssh-copy-id managerslave1上
[root@slave1 ~]# ssh-keygen -f ~/.ssh/id_rsa -P '' -q
[root@slave1 ~]# ssh-copy-id managerslave2上
[root@slave2 ~]# ssh-keygen -f ~/.ssh/id_rsa -P '' -q
[root@slave2 ~]# ssh-copy-id managermanager上
[root@manager ~]# ssh-keygen -f ~/.ssh/id_rsa -P '' -q
[root@manager ~]# ssh-copy-id master
[root@manager ~]# ssh-copy-id slave1
[root@manager ~]# ssh-copy-id slave2
[root@manager ~]# ssh-copy-id manager[root@manager ~]# scp ~/.ssh/authorized_keys master:/root/.ssh/authorized_keys
authorized_keys 100% 1573 3.7MB/s 00:00
[root@manager ~]# scp ~/.ssh/authorized_keys slave1:/root/.ssh/authorized_keys
authorized_keys 100% 1573 1.3MB/s 00:00
[root@manager ~]# scp ~/.ssh/authorized_keys slave2:/root/.ssh/authorized_keys
authorized_keys 100% 1573 2.4MB/s 00:00
现在可以在manager主机上看到我们四台主机相互免密钥。
2.4安装MHA安装包
现在我们就可以下载MHA的两个安装包(manager和node)。
在manager上需要安装
在master、slave1、slave2上只需要安装node即可
[root@manager ~]# scp mha4mysql-node-0.58-0.el7.centos.noarch.rpm master:/root/
mha4mysql-node-0.58-0.el7.centos.noarch.rpm 100% 35KB 11.8MB/s 00:00
[root@manager ~]# scp mha4mysql-node-0.58-0.el7.centos.noarch.rpm slave1:/root/
mha4mysql-node-0.58-0.el7.centos.noarch.rpm 100% 35KB 17.3MB/s 00:00
[root@manager ~]# scp mha4mysql-node-0.58-0.el7.centos.noarch.rpm slave2:/root/
mha4mysql-node-0.58-0.el7.centos.noarch.rpm 100% 35KB 18.9MB/s 00:00
在master、slave1、slave2上直接yum install mha*.rpm安装
第一次报错:
第二次报错:
我们直接安装MySQL的icu-data-files包和MySQL的libs-compat包即可安装成功node安装包
在manager我们直接安装yum install mha*.rpm -y
报错:
查了一下需要安装yum install epel-release -y这个依赖
现在就将安装包安装完毕了。
2.5配置MHA文件
接下来可以给我们的manager用户写入配置文件,在master上进行授权
[root@manager ~]# mkdir /etc/mha
[root@manager ~]# mkdir -p /var/log/mha/app1
[root@manager ~]# vim /etc/mha/app1.cnf[server default] //适用于slave1,2,3个
user=mhaadmin //MHA管理用户
password=Mha@2001 //MHA管理密码
manager_workdir=/var/log/mha/app1 //MHA_master的工作路径
manager_log=/var/log/mha/app1/manager.log //MHA_master的日志文件
ssh_user=root //基于ssh的密钥验证
repl_user=rep //数据库用户名
repl_password=Mhn@2001 //数据库密码
ping_interval=1 //ping时间间隔
[server1] //节点2
hostname=192.168.177.130 //节点2的主机地址
ssh_port=22 //节点2的ssh端口
candidate_master=1 //等于1代表如果坏可以成为master
[server2]
hostname=192.168.177.131
ssh_port=22
candidate_master=1
[server3]
hostname=192.168.177.132
ssh_port=22
candidate_master=1
接下来根据我们配置文件中写的用户配置在我们的master主机上
(root@localhost) [(none)] >create user mhaadmin@'%' identified with mysql_native_password by 'Mha@2001';
Query OK, 0 rows affected (0.01 sec)(root@localhost) [(none)] >grant all on *.* to mhaadmin@'%';
Query OK, 0 rows affected (0.01 sec)
2.6检查远程连接和主从同步
我们接下来就需要检查一下远程链接和主从同步是否正确。
[root@manager ~]# masterha_check_ssh --conf=/etc/mha/app1.cnf
root@192.168.177.131(192.168.177.131:22) to root@192.168.177.130(192.168.177.130:22)..
Warning: Permanently added '192.168.177.130' (ECDSA) to the list of known hosts.
Thu Mar 7 14:39:14 2024 - [debug] ok.
Thu Mar 7 14:39:14 2024 - [debug] Connecting via SSH from root@192.168.177.131(192.168.177.131:22) to root@192.168.177.132(192.168.177.132:22)..
Warning: Permanently added '192.168.177.132' (ECDSA) to the list of known hosts.
Thu Mar 7 14:39:14 2024 - [debug] ok.
Thu Mar 7 14:39:15 2024 - [debug]
Thu Mar 7 14:39:14 2024 - [debug] Connecting via SSH from root@192.168.177.132(192.168.177.132:22) to root@192.168.177.131(192.168.177.131:22)..
Warning: Permanently added '192.168.177.131' (ECDSA) to the list of known hosts.
Thu Mar 7 14:39:15 2024 - [debug] ok.
Thu Mar 7 14:39:15 2024 - [info] All SSH connection tests passed successfully.[root@manager ~]# masterha_check_repl --conf=/etc/mha/app1.cnf
Thu Mar 7 14:43:23 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Mar 7 14:43:23 2024 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Thu Mar 7 14:43:23 2024 - [info] Reading server configuration from /etc/mha/app1.cnf..
Thu Mar 7 14:43:23 2024 - [info] MHA::MasterMonitor version 0.58.
Thu Mar 7 14:43:24 2024 - [info] GTID failover mode = 1
Thu Mar 7 14:43:24 2024 - [info] Dead Servers:
Thu Mar 7 14:43:24 2024 - [info] Alive Servers:
Thu Mar 7 14:43:24 2024 - [info] 192.168.177.130(192.168.177.130:3306)
Thu Mar 7 14:43:24 2024 - [info] 192.168.177.131(192.168.177.131:3306)
Thu Mar 7 14:43:24 2024 - [info] 192.168.177.132(192.168.177.132:3306)
Thu Mar 7 14:43:24 2024 - [info] Alive Slaves:
Thu Mar 7 14:43:24 2024 - [info] 192.168.177.131(192.168.177.131:3306) Version=8.0.35 (oldest major version between slaves) log-bin:enabled
Thu Mar 7 14:43:24 2024 - [info] GTID ON
Thu Mar 7 14:43:24 2024 - [info] Replicating from 192.168.177.130(192.168.177.130:3306)
Thu Mar 7 14:43:24 2024 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Mar 7 14:43:24 2024 - [info] 192.168.177.132(192.168.177.132:3306) Version=8.0.35 (oldest major version between slaves) log-bin:enabled
Thu Mar 7 14:43:24 2024 - [info] GTID ON
Thu Mar 7 14:43:24 2024 - [info] Replicating from 192.168.177.130(192.168.177.130:3306)
Thu Mar 7 14:43:24 2024 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Mar 7 14:43:24 2024 - [info] Current Alive Master: 192.168.177.130(192.168.177.130:3306)
Thu Mar 7 14:43:24 2024 - [info] Checking slave configurations..
Thu Mar 7 14:43:24 2024 - [info] Checking replication filtering settings..
Thu Mar 7 14:43:24 2024 - [info] binlog_do_db= , binlog_ignore_db=
Thu Mar 7 14:43:24 2024 - [info] Replication filtering check ok.
Thu Mar 7 14:43:24 2024 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Thu Mar 7 14:43:24 2024 - [info] Checking SSH publickey authentication settings on the current master..
Thu Mar 7 14:43:24 2024 - [info] HealthCheck: SSH to 192.168.177.130 is reachable.
Thu Mar 7 14:43:24 2024 - [info]
192.168.177.130(192.168.177.130:3306) (current master)+--192.168.177.131(192.168.177.131:3306)+--192.168.177.132(192.168.177.132:3306)Thu Mar 7 14:43:24 2024 - [info] Checking replication health on 192.168.177.131..
Thu Mar 7 14:43:24 2024 - [info] ok.
Thu Mar 7 14:43:24 2024 - [info] Checking replication health on 192.168.177.132..
Thu Mar 7 14:43:24 2024 - [info] ok.
Thu Mar 7 14:43:24 2024 - [warning] master_ip_failover_script is not defined.
Thu Mar 7 14:43:24 2024 - [warning] shutdown_script is not defined.
Thu Mar 7 14:43:24 2024 - [info] Got exit code 0 (Not master dead).MySQL Replication Health is OK.
3启动MHA
命令行启动
[root@node1 ~]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
使用脚本来启动
#!/bin/bash
# chkconfig: 35 80 20
# description: MHA management script.STARTEXEC="/usr/bin/masterha_manager --conf"
STOPEXEC="/usr/bin/masterha_stop --conf"
CONF="/etc/mha/app1.cnf"
process_count=`ps -ef |grep -w masterha_manager|grep -v grep|wc -l`
PARAMS="--ignore_last_failover"case "$1" instart)if [ $process_count -gt 1 ]thenecho "masterha_manager exists, process is already running"elseecho "Starting Masterha Manager"$STARTEXEC $CONF $PARAMS < /dev/null > /var/log/mha/app1/manager.log 2>&1 &fi;;stop)if [ $process_count -eq 0 ]thenecho "Masterha Manager does not exist, process is not running"elseecho "Stopping ..."$STOPEXEC $CONFwhile(true)doprocess_count=`ps -ef |grep -w masterha_manager|grep -v grep|wc -l`if [ $process_count -gt 0 ]thensleep 1elsebreakfidoneecho "Master Manager stopped"fi;;*)echo "Please use start or stop as first argument";;
esac
[root@manager ~]# chmod +x /etc/init.d/masterha_managerd
[root@manager ~]# chkconfig --add masterha_managerd
[root@manager ~]# chkconfig masterha_managerd on
[root@manager ~]# systemctl start masterha_managerd
[root@manager ~]# systemctl status masterha_managerd
● masterha_managerd.service - SYSV: MHA management script.Loaded: loaded (/etc/rc.d/init.d/masterha_managerd; bad; vendor preset: disabled)Active: active (running) since 四 2024-03-07 14:54:25 CST; 17s agoDocs: man:systemd-sysv-generator(8)Process: 10423 ExecStart=/etc/rc.d/init.d/masterha_managerd start (code=exited, status=0/SUCCESS)Tasks: 1CGroup: /system.slice/masterha_managerd.service└─10430 perl /usr/bin/masterha_manager --conf /etc/mha/app1.cnf --ignore_last...3月 07 14:54:25 manager systemd[1]: Starting SYSV: MHA management script....
3月 07 14:54:25 manager masterha_managerd[10423]: Starting Masterha Manager
3月 07 14:54:25 manager systemd[1]: Started SYSV: MHA management script..[root@manager ~]# ps -ef | grep -w masterha_managerd
root 10530 9389 0 14:55 pts/1 00:00:00 grep --color=auto -w masterha_managerd[root@manager ~]# tail /var/log/mha/app1/manager.log
192.168.177.130(192.168.177.130:3306) (current master)+--192.168.177.131(192.168.177.131:3306)+--192.168.177.132(192.168.177.132:3306)Thu Mar 7 14:54:27 2024 - [warning] master_ip_failover_script is not defined.
Thu Mar 7 14:54:27 2024 - [warning] shutdown_script is not defined.
Thu Mar 7 14:54:27 2024 - [info] Set master ping interval 1 seconds.
Thu Mar 7 14:54:27 2024 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Thu Mar 7 14:54:27 2024 - [info] Starting ping health check on 192.168.177.130(192.168.177.130:3306)..
Thu Mar 7 14:54:27 2024 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..#查看master节点的状态
[root@manager ~]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:10430) is running(0:PING_OK), master:192.168.177.130
3.1配置VIP
脑裂
脑裂是指在主备切换时,由于切换不彻底或其他原因,导致客户端和Slave误以为出现两个active master,最终使得整个集群处于混乱状态。
vip配置可以采用两种方式,一种通过keepalived的方式管理虚拟ip的浮动;另外一种通过脚本方式启动虚拟ip的方式 (即不需要keepalived或者heartbeat类似的软件).
为了防止脑裂发生,推荐生产环境采用脚本的方式来管理虚拟ip,而不是使用keepalived来完成。
[root@manager ~]# vim /usr/local/bin/master_ip_failover
#!/usr/bin/env perluse strict;
use warnings FATAL => 'all';use Getopt::Long;my ($command, $ssh_user, $orig_master_host, $orig_master_ip,$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);my $vip = '192.168.177.210/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";GetOptions('command=s' => \$command,'ssh_user=s' => \$ssh_user,'orig_master_host=s' => \$orig_master_host,'orig_master_ip=s' => \$orig_master_ip,'orig_master_port=i' => \$orig_master_port,'new_master_host=s' => \$new_master_host,'new_master_ip=s' => \$new_master_ip,'new_master_port=i' => \$new_master_port,
);exit &main();sub main {print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";if ( $command eq "stop" || $command eq "stopssh" ) {my $exit_code = 1;eval {print "Disabling the VIP on old master: $orig_master_host \n";&stop_vip();$exit_code = 0;};if ($@) {warn "Got Error: $@\n";exit $exit_code;}exit $exit_code;}elsif ( $command eq "start" ) {my $exit_code = 10;eval {print "Enabling the VIP - $vip on the new master - $new_master_host \n";&start_vip();$exit_code = 0;};if ($@) {warn $@;exit $exit_code;}exit $exit_code;}elsif ( $command eq "status" ) {print "Checking the Status of the script.. OK \n";exit 0;}else {&usage();exit 1;}
}sub start_vip() {`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {return 0 unless ($ssh_user);`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}sub usage {print"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}[root@manager ~]# chmod +x /usr/local/bin/master_ip_failover [root@manager ~]# vim /etc/mha/app1.cnf
[server default]
添加:
master_ip_failover_script=/usr/local/bin/master_ip_failover
添加一个IP
[root@manager ~]# ifconfig ens33:1 192.168.177.210/24
[root@manager ~]# ifconfig -a | grep -A 2 "ens33:1"
ens33:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500inet 192.168.177.210 netmask 255.255.255.0 broadcast 192.168.177.255ether 00:0c:29:dc:28:da txqueuelen 1000 (Ethernet)
配置邮箱服务
[root@manager ~]# cat /usr/local/bin/send_report
#!/usr/bin/perl# Copyright (C) 2011 DeNA Co.,Ltd.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA## Note: This is a sample script and is not complete. Modify the script based on your environment.use strict;
use warnings FATAL => 'all';
use Mail::Sender;
use Getopt::Long;#new_master_host and new_slave_hosts are set only when recovering master succeeded
my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );my $smtp='smtp.qq.com'; # 这里的smtp可以查看要使用的邮箱的smtp值, 或者百度
my $mail_from='from@qq.com'; # 填写邮箱
my $mail_user='from@qq.com'; # 填写邮箱
my $mail_pass='password'; # 注意这里的密码是邮箱开启smtp服务时设定的密码, 不是邮箱的登陆密码
#my $mail_to=['to1@qq.com','to2@qq.com'];
my $mail_to='to@qq.com'; # 接受邮件的邮箱GetOptions('orig_master_host=s' => \$dead_master_host,'new_master_host=s' => \$new_master_host,'new_slave_hosts=s' => \$new_slave_hosts,'subject=s' => \$subject,'body=s' => \$body,
);# Do whatever you want here
mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);sub mailToContacts {my ($smtp, $mail_from, $mail_user, $mail_pass, $mail_to, $subject, $msg ) = @_;open my $DEBUG, ">/var/log/mha/app1/mail.log" # 这里的路径需要修改,改成一个真是存在的路径即可or die "Can't open the debug file:$!\n";my $sender = new Mail::Sender {ctype => 'text/plain;charset=utf-8',encoding => 'utf-8',smtp => $smtp,from => $mail_from,auth => 'LOGIN',TLS_allowed => '0',authid => $mail_user,authpwd => $mail_pass,to => $mail_to,subject => $subject,debug => $DEBUG};$sender->MailMsg({msg => $msg,debug => $DEBUG}) or print $Mail::Sender::Error;return 1;
}exit 0;[root@node1 ~]# chmod +x /usr/local/bin/send_report
[root@node1 ~]# touch /var/log/mha/app1/mail.log
更改manager配置文件
[root@node1 ~]# vim /etc/mha/app1.cnf
[server default]
report_script=/usr/local/bin/send_report重启mha
[root@node1 ~]# systemctl restart masterha_managerd
测试邮箱功能
在manager上我们跟踪日志
tail -f /var/log/mha/app1/manager.log在master上我们停止MySQL服务
systemctl stop mysqld
邮箱功能正常
模拟master恢复正常
[root@master ~]# systemctl start mysqld
[root@master ~]# mysql -uroot -pMysql@123(root@localhost) [(none)] >change master to-> master_host='192.168.177.131',-> master_user='rep',-> master_password='Mysql@123',-> master_auto_position=1;
Query OK, 0 rows affected, 7 warnings (0.01 sec)(root@localhost) [(none)] >start slave;
Query OK, 0 rows affected, 1 warning (0.01 sec)
我们查看同步状态
发现主从没有成功
这个原因是我们没有设置正确的主从关系,更改后就可以了
这个就是我们刚才在master重新向新master配主从时密码与在最开始的master上的不正确
主从就恢复了
在manager上查看我们的三个节点有没有被删除,如果没有删除只需要重启服务就可以
可以看到我们现在的master是192.168.177.131