一、什么是高可用
1.企业级高可用标准:全年无故障时间
全年无故障时间 | 全年故障时间 | 具体时间 | |
99.9% | 0.1% | 525.6 min | keeplive+双主 (切换需要人为干预) |
99.99% | 0.01% | 52.56 min | MHA (半自动化) |
99.999% | 0.001% | 5.256 min | PXC、MGR、MGC (自动化) |
99.9999% | 0.0001% | 0.5256 min | 自动化、云化、平台化 |
二、MHA介绍
1.MHA工作原理
1.监控
通过masterha_master_monitor,每隔ping_interval秒监测一次master心跳。如果监控不到心跳,一共给4次机会
2.选主
主库宕机谁来接管?
1. 所有从节点日志都是一致的,默认会以配置文件的顺序去选择一个新主。
2. 从节点日志不一致,自动选择最接近于主库的从库
3. 如果对于某节点设定了权重(candidate_master=1),权重节点会优先选择。
但是此节点日志量落后主库100M日志的话,也不会被选择。可以配合check_repl_delay=0,关闭日志量的检查,强制选择候选节点。
(1) ping_interval=1
#设置监控主库,发送ping包的时间间隔,尝试三次没有回应的时候自动进行failover
(2) candidate_master=1
#设置为候选master,如果设置该参数以后,发生主从切换以后将会将此从库提升为主库,即使这个主库不是集群中事件最新的slave
(3)check_repl_delay=0
#默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的master,因为对于这个slave的恢复需要花费很长时间,通过设置check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时,这个参数对于设置了candidate_master=1的主机非常有用,因为这个候选主在切换的过程中一定是新的master
3.数据日志补偿
1)如果ssh通
各个从节点,通过save_binary_logs立即保存缺失部分的binlog到var/tmp/xxxx
2)如果ssh不通
从节点调用apply_diff_relay_logs,计算两个节点的relay log差异
4.故障转移
解除原油主从关系,构建新的主从关系
5.自动将故障节点,从配置文件剔除
6.MHA自杀
7.应用透明-vip
8.数据补偿补充方案:binlog-server
9.故障提醒:邮件、钉钉
2.MHA架构介绍
1主2从,master:db01 slave:db02 db03 ):
MHA 高可用方案软件构成
Manager软件:选择一个从节点安装
Node软件:所有节点都要安装
3.MHA 软件构成
Manager工具包主要包括以下几个工具:
masterha_manger 启动MHA
masterha_check_ssh 检查MHA的SSH配置状况
masterha_check_repl 检查MySQL复制状况
masterha_master_monitor 检测master是否宕机
masterha_check_status 检测当前MHA运行状态
masterha_master_switch 控制故障转移(自动或者手动)
masterha_conf_host 添加或删除配置的server信息Node工具包主要包括以下几个工具:
这些工具通常由MHA Manager的脚本触发,无需人为操作
save_binary_logs 保存和复制master的二进制日志
apply_diff_relay_logs 识别差异的中继日志事件并将其差异的事件应用于其他的
purge_relay_logs 清除中继日志(不会阻塞SQL线程)
4.MHA集群
三、MHA环境搭建
1.准备一主两从的环境
2.所有节点创建软连接
[root@localhost bin]# ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
[root@localhost bin]# ln -s /usr/local/mysql/bin/mysql /usr/bin/mysql
[root@localhost bin]# ls -l /usr/bin/mysql*
lrwxrwxrwx 1 root root 26 5月 30 15:27 /usr/bin/mysql -> /usr/local/mysql/bin/mysql
lrwxrwxrwx 1 root root 32 5月 30 15:27 /usr/bin/mysqlbinlog -> /usr/local/mysql/bin/mysqlbinlog
3.配置各节点互信(各节点之间无密码SSH登录)
db01:
rm -rf /root/.ssh
ssh-keygen
cd /root/.ssh
mv id_rsa.pub authorized_keys
scp -r /root/.ssh 192.168.20.120:/root
scp -r /root/.ssh 192.168.20.231:/root
各节点验证
db01:
ssh 192.168.20.132 date
ssh 192.168.20.120 date
ssh 192.168.20.231 date
db02:
ssh 192.168.20.132 date
ssh 192.168.20.120 date
ssh 192.168.20.231 date
db03:
ssh 192.168.20.132 date
ssh 192.168.20.120 date
ssh 192.168.20.231 date
4.下载mha软件
mha官网:https://code.google.com/archive/p/mysql-master-ha/
github下载地址mysql5.7以下可用:https://github.com/yoshinorim/mha4mysql-manager/wiki/Downloads
mysql8.0:https://pan.baidu.com/s/1-yo1KjZZUvbHxrcI9Yl7-Q 提取码fr50
5.所有节点安装Node软件依赖包
[root@localhost opt]# rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm
准备中... ################################# [100%]
正在升级/安装...1:mha4mysql-node-0.58-0.el7.centos ################################# [100%]
[root@localhost opt]#
6.创建mha需要的用户——mha和replmha
mysql> create user mha@'%' identified with mysql_native_password by'ok';
Query OK, 0 rows affected (0.00 sec)mysql> GRANT ALL PRIVILEGES ON *.* TO 'mha'@'%' WITH GRANT OPTION;
Query OK, 0 rows affected (0.01 sec)mysql> select user,host,authentication_string,plugin from mysql.user;
+------------------+-----------+------------------------------------------------------------------------+-----------------------+
| user | host | authentication_string | plugin |
+------------------+-----------+------------------------------------------------------------------------+-----------------------+
| mha | % | *31330A9B24799CC9566A39CBD78CEF60E26C906F | mysql_native_password |
| replmha | % | $A$005$)fcB"IXZmo{qr%)71hR72t5sUOD3H27kNo8uGnWX8/mkwadbMlpTdSyw9B | caching_sha2_password |
| mysql.infoschema | localhost | $A$005$THISISACOMBINATIONOFINVALIDSALTANDPASSWORDTHATMUSTNEVERBRBEUSED | caching_sha2_password |
| mysql.session | localhost | $A$005$THISISACOMBINATIONOFINVALIDSALTANDPASSWORDTHATMUSTNEVERBRBEUSED | caching_sha2_password |
| mysql.sys | localhost | $A$005$THISISACOMBINATIONOFINVALIDSALTANDPASSWORDTHATMUSTNEVERBRBEUSED | caching_sha2_password |
| repl | localhost | *31330A9B24799CC9566A39CBD78CEF60E26C906F | mysql_native_password |
| root | localhost | | caching_sha2_password |
| yizuo | localhost | *31330A9B24799CC9566A39CBD78CEF60E26C906F | mysql_native_password |
+------------------+-----------+------------------------------------------------------------------------+-----------------------+
8 rows in set (0.00 sec)
具有复制权限的用户(repl)必须在所有节点上都创建一次,具有管理权限的用户也是一样,在从库做任何操作之前记得set sql_log_bin=0——>关闭binlog日志;
7.Manager软件安装(db03)
# 在管理节点安装依赖软件
yum install -y perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager
# 在管理节点安装mha4mysql-manager
rpm -ivh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
8.配置文件准备(db03)
创建配置文件目录mkdir -p /etc/mha
创建日志目录mkdir -p /var/log/mha/app1
编辑mha配置文件
vim /etc/mha/app1.cnf[server default]
manager_log=/var/log/mha/app1/manager
manager_workdir=/var/log/mha/app1
master_binlog_dir=/data/3306/binlog
user=replmha
password=ok
ping_interval=2
repl_password=ok
repl_user=mha
ssh_user=root
[server1]
hostname=192.168.20.132
port=3306
[server2]
hostname=192.168.20.120
port=3306
[server3]
hostname=192.168.20.231
port=3306
MHA主要配置文件说明
manager_workdir=/var/log/masterha/app1.log:设置manager的工作目录
manager_log=/var/log/masterha/app1/manager.log:设置manager的日志文件,主日志文件,出了问题看这个
master_binlog_dir=/data/mysql:设置master 保存binlog的位置,以便MHA可以找到master的日志,主库的二进制日志目录
master_ip_failover_script= /usr/local/bin/master_ip_failover:设置自动failover时候的切换脚本
master_ip_online_change_script= /usr/local/bin/master_ip_online_change:设置手动切换时候的切换脚本
user=root:设置监控mysql的用户
password=dayi123:设置监控mysql的用户,需要授权能够在manager节点远程登录
ping_interval=1:设置监控主库,发送ping包的时间间隔,默认是3秒,尝试三次没有回应的时候自动进行railover
remote_workdir=/tmp:设置远端mysql在发生切换时binlog的保存位置
repl_user=repl :设置mysql中用于复制的用户密码
repl_password=replication:设置mysql中用于复制的用户
report_script=/usr/local/send_report:设置发生切换后发送的报警的脚本
shutdown_script="":设置故障发生后关闭故障主机脚本(该脚本的主要作用是关闭主机放在发生脑裂,这里没有使用)
ssh_user=root //设置ssh的登录用户名
candidate_master=1:在节点下设置,设置当前节点为候选的master
slave check_repl_delay=0 :在节点配置下设置,默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的master;这个选项对于对于设置了candidate_master=1的主机非常有用
————————————————
原文链接:https://blog.csdn.net/dayi_123/article/details/83690608
9.互信检查(db03)
[root@localhost opt]# masterha_check_ssh --conf=/etc/mha/app1.cnf
Thu May 30 16:25:13 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu May 30 16:25:13 2024 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Thu May 30 16:25:13 2024 - [info] Reading server configuration from /etc/mha/app1.cnf..
Thu May 30 16:25:13 2024 - [info] Starting SSH connection tests..
Thu May 30 16:25:14 2024 - [debug]
Thu May 30 16:25:13 2024 - [debug] Connecting via SSH from root@192.168.20.132(192.168.20.132:22) to root@192.168.20.120(192.168.20.120:22)..
Thu May 30 16:25:14 2024 - [debug] ok.
Thu May 30 16:25:14 2024 - [debug] Connecting via SSH from root@192.168.20.132(192.168.20.132:22) to root@192.168.20.231(192.168.20.231:22)..
Thu May 30 16:25:14 2024 - [debug] ok.
Thu May 30 16:25:15 2024 - [debug]
Thu May 30 16:25:14 2024 - [debug] Connecting via SSH from root@192.168.20.120(192.168.20.120:22) to root@192.168.20.132(192.168.20.132:22)..
Thu May 30 16:25:14 2024 - [debug] ok.
Thu May 30 16:25:14 2024 - [debug] Connecting via SSH from root@192.168.20.120(192.168.20.120:22) to root@192.168.20.231(192.168.20.231:22)..
Thu May 30 16:25:15 2024 - [debug] ok.
Thu May 30 16:25:16 2024 - [debug]
Thu May 30 16:25:14 2024 - [debug] Connecting via SSH from root@192.168.20.231(192.168.20.231:22) to root@192.168.20.132(192.168.20.132:22)..
Thu May 30 16:25:15 2024 - [debug] ok.
Thu May 30 16:25:15 2024 - [debug] Connecting via SSH from root@192.168.20.231(192.168.20.231:22) to root@192.168.20.120(192.168.20.120:22)..
Thu May 30 16:25:15 2024 - [debug] ok.
Thu May 30 16:25:16 2024 - [info] All SSH connection tests passed successfully.
10.主从检查(db03)
masterha_check_repl --conf=/etc/mha/app1.cnf
[root@localhost app1]# masterha_check_repl --conf=/etc/mha/app1.cnf
Thu May 30 19:09:30 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu May 30 19:09:30 2024 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Thu May 30 19:09:30 2024 - [info] Reading server configuration from /etc/mha/app1.cnf..
Thu May 30 19:09:30 2024 - [info] MHA::MasterMonitor version 0.58.
Thu May 30 19:09:31 2024 - [info] GTID failover mode = 1
Thu May 30 19:09:31 2024 - [info] Dead Servers:
Thu May 30 19:09:31 2024 - [info] Alive Servers:
Thu May 30 19:09:31 2024 - [info] 192.168.20.132(192.168.20.132:3306)
Thu May 30 19:09:31 2024 - [info] 192.168.20.120(192.168.20.120:3306)
Thu May 30 19:09:31 2024 - [info] 192.168.20.231(192.168.20.231:3306)
Thu May 30 19:09:31 2024 - [info] Alive Slaves:
Thu May 30 19:09:31 2024 - [info] 192.168.20.120(192.168.20.120:3306) Version=8.0.20 (oldest major version between slaves) log-bin:enabled
Thu May 30 19:09:31 2024 - [info] GTID ON
Thu May 30 19:09:31 2024 - [info] Replicating from 192.168.20.132(192.168.20.132:3306)
Thu May 30 19:09:31 2024 - [info] 192.168.20.231(192.168.20.231:3306) Version=8.0.20 (oldest major version between slaves) log-bin:enabled
Thu May 30 19:09:31 2024 - [info] GTID ON
Thu May 30 19:09:31 2024 - [info] Replicating from 192.168.20.132(192.168.20.132:3306)
Thu May 30 19:09:31 2024 - [info] Current Alive Master: 192.168.20.132(192.168.20.132:3306)
Thu May 30 19:09:31 2024 - [info] Checking slave configurations..
Thu May 30 19:09:31 2024 - [info] read_only=1 is not set on slave 192.168.20.120(192.168.20.120:3306).
Thu May 30 19:09:31 2024 - [info] read_only=1 is not set on slave 192.168.20.231(192.168.20.231:3306).
Thu May 30 19:09:31 2024 - [info] Checking replication filtering settings..
Thu May 30 19:09:31 2024 - [info] binlog_do_db= , binlog_ignore_db=
Thu May 30 19:09:31 2024 - [info] Replication filtering check ok.
Thu May 30 19:09:31 2024 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Thu May 30 19:09:31 2024 - [info] Checking SSH publickey authentication settings on the current master..
Thu May 30 19:09:32 2024 - [info] HealthCheck: SSH to 192.168.20.132 is reachable.
Thu May 30 19:09:32 2024 - [info]
192.168.20.132(192.168.20.132:3306) (current master)+--192.168.20.120(192.168.20.120:3306)+--192.168.20.231(192.168.20.231:3306)Thu May 30 19:09:32 2024 - [info] Checking replication health on 192.168.20.120..
Thu May 30 19:09:32 2024 - [info] ok.
Thu May 30 19:09:32 2024 - [info] Checking replication health on 192.168.20.231..
Thu May 30 19:09:32 2024 - [info] ok.
Thu May 30 19:09:32 2024 - [warning] master_ip_failover_script is not defined.
Thu May 30 19:09:32 2024 - [warning] shutdown_script is not defined.
Thu May 30 19:09:32 2024 - [info] Got exit code 0 (Not master dead).MySQL Replication Health is OK.
1) 主从检查踩过的坑——Attempt to reload DBD/mysql.pm aborted
[root@localhost data]# masterha_check_repl --conf=/etc/mha/app1.cnf
Thu May 30 17:43:25 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu May 30 17:43:25 2024 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Thu May 30 17:43:25 2024 - [info] Reading server configuration from /etc/mha/app1.cnf..
Thu May 30 17:43:25 2024 - [info] MHA::MasterMonitor version 0.58.
Thu May 30 17:43:25 2024 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln301] install_driver(mysql) failed: Attempt to reload DBD/mysql.pm aborted.
Compilation failed in require at (eval 37) line 3.at /usr/share/perl5/vendor_perl/MHA/DBHelper.pm line 208.at /usr/share/perl5/vendor_perl/MHA/Server.pm line 166.
Thu May 30 17:43:25 2024 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln301] install_driver(mysql) failed: Attempt to reload DBD/mysql.pm aborted.
Compilation failed in require at (eval 37) line 3.at /usr/share/perl5/vendor_perl/MHA/DBHelper.pm line 208.at /usr/share/perl5/vendor_perl/MHA/Server.pm line 166.
Thu May 30 17:43:25 2024 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln301] install_driver(mysql) failed: Attempt to reload DBD/mysql.pm aborted.
Compilation failed in require at (eval 37) line 3.at /usr/share/perl5/vendor_perl/MHA/DBHelper.pm line 208.at /usr/share/perl5/vendor_perl/MHA/Server.pm line 166.
Thu May 30 17:43:26 2024 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln309] Got fatal error, stopping operations
Thu May 30 17:43:26 2024 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 329.
Thu May 30 17:43:26 2024 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Thu May 30 17:43:26 2024 - [info] Got exit code 1 (Not master dead).MySQL Replication Health is NOT OK!
解决办法:
install_driver(mysql) failed: Attempt to reload DBD/mysql.pm aborted 问题解决方案-CSDN博客
操作后出现新问题:
chmod 755 blib/arch/auto/DBD/mysql/mysql.so
Manifying 2 pod documentsDVEEDEN/DBD-mysql-5.005.tar.gz/usr/bin/make -- OK
Running make test
"/usr/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- mysql.bs blib/arch/auto/DBD/mysql/mysql.bs 644
PERL_DL_NONLAZY=1 "/usr/bin/perl" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/00base.t .............................. 1/6 # Driver version is 5.005
t/00base.t .............................. ok
t/01caching_sha2_prime.t ................ ok
t/05dbcreate.t .......................... 1/2 # Database 'test' accessible
t/05dbcreate.t .......................... ok
t/10connect.t ........................... 1/? # mysql_clientinfo is: 8.0.20
# mysql_clientversion is: 80020
# mysql_serverversion is: 80020
# mysql_hostinfo is: Localhost via UNIX socket
# mysql_serverinfo is: 8.0.20
# mysql_stat is: Uptime: 64536 Threads: 4 Questions: 1664 Slow queries: 200 Opens: 667 Flush tables: 4 Open tables: 408 Queries per second avg: 0.025
# mysql_protoinfo is: 10
# SQL_DBMS_VER is 8.0.20
# Default storage engine is: InnoDB
t/10connect.t ........................... ok
t/15reconnect.t ......................... ok
t/16dbi-get_info.t ...................... ok
t/17quote.t ............................. ok
t/20createdrop.t ........................ ok
t/25lockunlock.t ........................ ok
t/29warnings.t .......................... ok
t/30insertfetch.t ....................... ok
t/31insertid.t .......................... ok
t/32insert_error.t ...................... ok
t/35limit.t ............................. ok
t/35prepare.t ........................... ok
t/40bindparam.t ......................... ok
t/40bindparam2.t ........................ ok
t/40bit.t ............................... ok
t/40blobs.t ............................. ok
t/40catalog.t ........................... ok
t/40keyinfo.t ........................... ok
t/40listfields.t ........................ ok
t/40nulls.t ............................. ok
t/40nulls_prepare.t ..................... ok
t/40numrows.t ........................... ok
t/40server_prepare.t .................... ok
t/40server_prepare_crash.t .............. ok
t/40server_prepare_error.t .............. ok
t/40types.t ............................. ok
t/41bindparam.t ......................... ok
t/41blobs_prepare.t ..................... ok
t/41int_min_max.t ....................... ok
t/42bindparam.t ......................... ok
t/43count_params.t ...................... ok
t/50chopblanks.t ........................ ok
t/50commit.t ............................ ok
t/51bind_type_guessing.t ................ 1/98 DBD::mysql::st execute failed: Data truncated for column 'dd' at row 1 at t/51bind_type_guessing.t line 136.
DBD::mysql::st execute failed: Data truncated for column 'nn' at row 1 at t/51bind_type_guessing.t line 114.
DBD::mysql::st execute failed: Data truncated for column 'dd' at row 1 at t/51bind_type_guessing.t line 136.
DBD::mysql::st execute failed: Incorrect integer value: '+' for column 'nn' at row 1 at t/51bind_type_guessing.t line 114.
DBD::mysql::st execute failed: Data truncated for column 'dd' at row 1 at t/51bind_type_guessing.t line 136.
DBD::mysql::st execute failed: Incorrect integer value: '.' for column 'nn' at row 1 at t/51bind_type_guessing.t line 114.
DBD::mysql::st execute failed: Data truncated for column 'dd' at row 1 at t/51bind_type_guessing.t line 136.
DBD::mysql::st execute failed: Incorrect integer value: 'e5' for column 'nn' at row 1 at t/51bind_type_guessing.t line 114.
DBD::mysql::st execute failed: Data truncated for column 'dd' at row 1 at t/51bind_type_guessing.t line 136.
t/51bind_type_guessing.t ................ ok
t/52comment.t ........................... ok
t/53comment.t ........................... ok
t/55utf8.t .............................. ok
t/55utf8mb4.t ........................... ok
t/56connattr.t .......................... ok
t/57trackgtid.t ......................... skipped: GTID tracking not enabled
t/60leaks.t ............................. skipped: Skip $ENV{EXTENDED_TESTING} is not set
t/65segfault.t .......................... ok
t/65types.t ............................. ok
t/70takeimp.t ........................... ok
t/71impdata.t ........................... ok
t/75supported_sql.t ..................... ok
t/76multi_statement.t ................... ok
t/80procs.t ............................. ok
t/81procs.t ............................. ok
t/85init_command.t ...................... ok
t/86_bug_36972.t ........................ ok
t/87async.t ............................. ok
t/88async-multi-stmts.t ................. ok
t/89async-method-check.t ................ ok
t/91errcheck.t .......................... ok
t/92ssl_backronym_vulnerability.t ....... skipped: Server supports SSL connections, cannot test false-positive enforcement
t/92ssl_optional.t ...................... skipped: Server supports SSL connections, cannot test fallback to plain text
t/92ssl_riddle_vulnerability.t .......... skipped: Server supports SSL connections, cannot test false-positive enforcement
t/99_bug_server_prepare_blob_null.t ..... ok
t/99compression.t ....................... ok
t/gh352.t ............................... 1/2 DBD::mysql::db prepare failed: Statement not active at t/gh352.t line 27.
t/gh352.t ............................... ok
t/gh360.t ............................... ok
t/manifest.t ............................ skipped: these tests are for release testing
t/pod.t ................................. 1/3
# Failed test 'POD test for blib/lib/DBD/mysql.pm'
# at /usr/share/perl5/vendor_perl/Test/Pod.pm line 186.
# blib/lib/DBD/mysql.pm (1350): You forgot a '=back' before '=head1'
# Looks like you failed 1 test of 3.
t/pod.t ................................. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/3 subtests
t/rt110983-valid-mysqlfd.t .............. ok
t/rt118977-zerofill.t ................... ok
t/rt25389-bin-case.t .................... ok
t/rt50304-column_info_parentheses.t ..... ok
t/rt61849-bind-param-buffer-overflow.t .. ok
t/rt75353-innodb-lock-timeout.t ......... ok
t/rt83494-quotes-comments.t ............. ok
t/rt85919-fetch-lost-connection.t ....... ok
t/rt86153-reconnect-fail-memory.t ....... skipped: $ENV{EXTENDED_TESTING} is not set
t/rt88006-bit-prepare.t ................. ok
t/rt91715.t ............................. ok
t/version.t ............................. 1/? # mysql_get_client_version: 80020
t/version.t ............................. ok Test Summary Report
-------------------
t/pod.t (Wstat: 256 Tests: 3 Failed: 1)Failed test: 1Non-zero exit status: 1
Files=79, Tests=2449, 41 wallclock secs ( 0.36 usr 0.12 sys + 4.32 cusr 0.98 csys = 5.78 CPU)
Result: FAIL
Failed 1/79 test programs. 1/2449 subtests failed.
make: *** [test_dynamic] 错误 255DVEEDEN/DBD-mysql-5.005.tar.gz/usr/bin/make test -- NOT OK
//hint// to see the cpan-testers results for installing this module, try:reports DVEEDEN/DBD-mysql-5.005.tar.gz
Running make installmake test had returned bad status, won't install without force
为了解决该问题:install_driver(mysql) failed: Attempt to reload DBD/mysql.pm aborted. 异常解决_attempt to reload clone.pm aborted.-CSDN博客
[root@localhost bin]# ldd /usr/lib64/perl5/vendor_perl/auto/DBD/mysql/mysql.solinux-vdso.so.1 => (0x00007fffa75b4000)libmysqlclient.so.18 => not foundlibpthread.so.0 => /lib64/libpthread.so.0 (0x00007f1a492b5000)libz.so.1 => /lib64/libz.so.1 (0x00007f1a4909f000)libm.so.6 => /lib64/libm.so.6 (0x00007f1a48d9d000)libssl.so.10 => /lib64/libssl.so.10 (0x00007f1a48b2b000)libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007f1a486ca000)libdl.so.2 => /lib64/libdl.so.2 (0x00007f1a484c6000)libc.so.6 => /lib64/libc.so.6 (0x00007f1a480f9000)/lib64/ld-linux-x86-64.so.2 (0x00007f1a496ed000)libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007f1a47eac000)libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f1a47bc4000)libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f1a479c0000)libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f1a4778d000)libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007f1a4757f000)libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f1a4737b000)libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f1a47162000)libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f1a46f3b000)libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f1a46cd9000)
又发现libmysqlclient.so.18 => not found,于是继续解决
在其他安装了mysql数据库的服务器上查找下这个文件
[root@localhost opt]# find / -name libmysqlclient.so.18
find: ‘/run/user/1000/gvfs’: 权限不够
/usr/lib64/mysql/libmysqlclient.so.18
[root@localhost opt]# sz /usr/lib64/mysql/libmysqlclient.so.18
将文件传到出问题的服务器上,cp到指定目录下,重新ldd,发现不再not found:
[root@localhost opt]# cp libmysqlclient.so.18 /usr/lib/
[root@localhost opt]#
[root@localhost opt]# cp libmysqlclient.so.18 /usr/lib64/
[root@localhost opt]# ldd /usr/lib64/perl5/vendor_perl/auto/DBD/mysql/mysql
ldd: /usr/lib64/perl5/vendor_perl/auto/DBD/mysql/mysql: 没有那个文件或目录
[root@localhost opt]# ldd /usr/lib64/perl5/vendor_perl/auto/DBD/mysql/mysql.solinux-vdso.so.1 => (0x00007fff47db4000)libmysqlclient.so.18 => /lib64/libmysqlclient.so.18 (0x00007f3697cce000)libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f3697ab2000)libz.so.1 => /lib64/libz.so.1 (0x00007f369789c000)libm.so.6 => /lib64/libm.so.6 (0x00007f369759a000)libssl.so.10 => /lib64/libssl.so.10 (0x00007f3697328000)libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007f3696ec7000)libdl.so.2 => /lib64/libdl.so.2 (0x00007f3696cc3000)libc.so.6 => /lib64/libc.so.6 (0x00007f36968f6000)libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f36965ee000)libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f36963d8000)/lib64/ld-linux-x86-64.so.2 (0x00007f369844e000)libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007f369618b000)libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f3695ea3000)libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f3695c9f000)libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f3695a6c000)libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007f369585e000)libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f369565a000)libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f3695441000)libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f369521a000)libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f3694fb8000)
再运行一遍主从检查:很好,这个问题过了,出现了下一个问题
[root@localhost opt]# masterha_check_repl --conf=/etc/mha/app1.cnf
Thu May 30 17:56:26 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu May 30 17:56:26 2024 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Thu May 30 17:56:26 2024 - [info] Reading server configuration from /etc/mha/app1.cnf..
Thu May 30 17:56:26 2024 - [info] MHA::MasterMonitor version 0.58.
Thu May 30 17:56:27 2024 - [info] GTID failover mode = 1
Thu May 30 17:56:27 2024 - [info] Dead Servers:
Thu May 30 17:56:27 2024 - [info] 192.168.20.231(192.168.20.231:3306)
Thu May 30 17:56:27 2024 - [info] Alive Servers:
Thu May 30 17:56:27 2024 - [info] 192.168.20.132(192.168.20.132:3306)
Thu May 30 17:56:27 2024 - [info] 192.168.20.120(192.168.20.120:3306)
Thu May 30 17:56:27 2024 - [info] Alive Slaves:
Thu May 30 17:56:27 2024 - [info] 192.168.20.120(192.168.20.120:3306) Version=8.0.20 (oldest major version between slaves) log-bin:enabled
Thu May 30 17:56:27 2024 - [info] GTID ON
Thu May 30 17:56:27 2024 - [info] Replicating from 192.168.20.132(192.168.20.132:3306)
Thu May 30 17:56:27 2024 - [info] Current Alive Master: 192.168.20.132(192.168.20.132:3306)
Thu May 30 17:56:27 2024 - [info] Checking slave configurations..
Thu May 30 17:56:27 2024 - [info] read_only=1 is not set on slave 192.168.20.120(192.168.20.120:3306).
Thu May 30 17:56:27 2024 - [info] Checking replication filtering settings..
Thu May 30 17:56:27 2024 - [info] binlog_do_db= , binlog_ignore_db=
Thu May 30 17:56:27 2024 - [info] Replication filtering check ok.
Thu May 30 17:56:27 2024 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln398] 192.168.20.120(192.168.20.120:3306): User replmha does not exist or does not have REPLICATION SLAVE privilege! Other slaves can not start replication from this host.
Thu May 30 17:56:27 2024 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/share/perl5/vendor_perl/MHA/ServerManager.pm line 1403.
Thu May 30 17:56:27 2024 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Thu May 30 17:56:27 2024 - [info] Got exit code 1 (Not master dead).MySQL Replication Health is NOT OK!
2)主从检查踩过的坑——User replmha does not exist or does not have REPLICATION SLAVE privilege! Other slaves can not start replication from this host.
解决方法:具有复制权限的用户(repl)必须在所有节点上都创建一次,具有管理权限的用户也是一样,在从库做任何操作之前记得set sql_log_bin=0——>关闭binlog日志;
3)主从检查踩过的坑—— Server 192.168.20.231(192.168.20.231:3306) is dead, but must be alive! Check server settings.
管理机的my.cnf端口设置错了啊啊。3307改回3306好了
其他可能出现的问题:https://www.cnblogs.com/xuliuzai/p/11980273.html
MYSQL 高可用集群搭建 ---MHA_mha下载-CSDN博客
4)主从检查踩过的坑——Fri May 31 18:25:52 2024 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln180] Got MySQL error when connecting 192.168.20.231(192.168.20.231:3306) :1130:Host '192.168.20.231' is not allowed to connect to this MySQL server, but this is not a MySQL crash. Check MySQL server settings.
这个错误表示客户端主机(IP地址为192.168.20.231)试图连接到MySQL服务器,但是服务器的权限设置不允许该主机进行连接。这通常是因为MySQL的用户权限配置不正确,导致该主机的用户无法从该主机上进行连接。
检查后发现,由于整个数据库文件都是备份过来的,备份过来的用户表中的replmha用户不能真实使用,在从库中需要删除后重新create replmha用户,重新赋予权限后解决了该问题
在从库做任何操作之前记得set sql_log_bin=0——>关闭binlog日志;
mysql> drop user replmha@'%';
Query OK, 0 rows affected (0.00 sec)mysql> select user,host,authentication_string,plugin from mysql.user;
+------------------+-----------+------------------------------------------------------------------------+-----------------------+
| user | host | authentication_string | plugin |
+------------------+-----------+------------------------------------------------------------------------+-----------------------+
| mha | % | *31330A9B24799CC9566A39CBD78CEF60E26C906F | mysql_native_password |
| yiyi | % | $A$005$^oaN;+gtM.v?}dzN9ur30WU8M8ZKEMmqPx00qANDdp3WuzcAvu4DbDz6 | caching_sha2_password |
| mysql.infoschema | localhost | $A$005$THISISACOMBINATIONOFINVALIDSALTANDPASSWORDTHATMUSTNEVERBRBEUSED | caching_sha2_password |
| mysql.session | localhost | $A$005$THISISACOMBINATIONOFINVALIDSALTANDPASSWORDTHATMUSTNEVERBRBEUSED | caching_sha2_password |
| mysql.sys | localhost | $A$005$THISISACOMBINATIONOFINVALIDSALTANDPASSWORDTHATMUSTNEVERBRBEUSED | caching_sha2_password |
| repl | localhost | *31330A9B24799CC9566A39CBD78CEF60E26C906F | mysql_native_password |
| root | localhost | | caching_sha2_password |
| yizuo | localhost | *31330A9B24799CC9566A39CBD78CEF60E26C906F | mysql_native_password |
+------------------+-----------+------------------------------------------------------------------------+-----------------------+
8 rows in set (0.00 sec)mysql> create user replmha@'%' identified with mysql_native_password by'ok';
Query OK, 0 rows affected (0.00 sec)mysql> select user,host,authentication_string,plugin from mysql.user;
+------------------+-----------+------------------------------------------------------------------------+-----------------------+
| user | host | authentication_string | plugin |
+------------------+-----------+------------------------------------------------------------------------+-----------------------+
| mha | % | *31330A9B24799CC9566A39CBD78CEF60E26C906F | mysql_native_password |
| replmha | % | *31330A9B24799CC9566A39CBD78CEF60E26C906F | mysql_native_password |
| yiyi | % | $A$005$^oaN;+gtM.v?}dzN9ur30WU8M8ZKEMmqPx00qANDdp3WuzcAvu4DbDz6 | caching_sha2_password |
| mysql.infoschema | localhost | $A$005$THISISACOMBINATIONOFINVALIDSALTANDPASSWORDTHATMUSTNEVERBRBEUSED | caching_sha2_password |
| mysql.session | localhost | $A$005$THISISACOMBINATIONOFINVALIDSALTANDPASSWORDTHATMUSTNEVERBRBEUSED | caching_sha2_password |
| mysql.sys | localhost | $A$005$THISISACOMBINATIONOFINVALIDSALTANDPASSWORDTHATMUSTNEVERBRBEUSED | caching_sha2_password |
| repl | localhost | *31330A9B24799CC9566A39CBD78CEF60E26C906F | mysql_native_password |
| root | localhost | | caching_sha2_password |
| yizuo | localhost | *31330A9B24799CC9566A39CBD78CEF60E26C906F | mysql_native_password |
+------------------+-----------+------------------------------------------------------------------------+-----------------------+
9 rows in set (0.00 sec)mysql> GRANT ALL PRIVILEGES ON *.* TO 'replmha'@'%' WITH GRANT OPTION;
Query OK, 0 rows affected (0.00 sec)mysql> show master status;
+---------------+----------+--------------+------------------+-------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+-------------------------------------------+
| binlog.000002 | 156 | | | 93909ace-1b58-11ef-81d8-000c2912a662:1-31 |
+---------------+----------+--------------+------------------+-------------------------------------------+
1 row in set (0.00 sec)mysql> set sql_log_bin=1;
Query OK, 0 rows affected (0.00 sec)mysql> show master status;
+---------------+----------+--------------+------------------+-------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+-------------------------------------------+
| binlog.000002 | 156 | | | 93909ace-1b58-11ef-81d8-000c2912a662:1-31 |
+---------------+----------+--------------+------------------+-------------------------------------------+
1 row in set (0.00 sec)
11.开启MHA(db03)
nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null> /var/log/mha/app1/manager.log 2>&1 &
[root@localhost /]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null> /var/log/mha/app1/manager.log 2>&1 &
[1] 88762 //没有报错说明是成功的
[root@localhost /]#
12.查看MHA状态(db03)
[root@localhost /]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:88762) is running(0:PING_OK), master:192.168.20.132
13.服务器关机重启后
1)检查各个机器数据库是否正常启动
2)检查1主2从状态
3)检查mha配置文件
4)互信检查,主从检查
5)启动mha
四、MHA模拟故障并恢复
1.MHA工作状态查看
[root@localhost app1]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:88762) is running(0:PING_OK), master:192.168.20.132
2.主库宕机
mysql> shutdown;
Query OK, 0 rows affected (0.00 sec)mysql> quit;
//db03MHA工作状态查看
[root@localhost app1]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 master is down and failover is running(50:FAILOVER_RUNNING). master:192.168.20.132
Check /var/log/mha/app1/manager for details.
3.查看日志
cat /var/log/mha/app1/manager
----- Failover Report -----app1: MySQL Master failover 192.168.20.132(192.168.20.132:3306) to 192.168.20.120(192.168.20.120:3306)Master 192.168.20.132(192.168.20.132:3306) is down!Check MHA Manager logs at localhost.localdomain:/var/log/mha/app1/manager for details.Started automated(non-interactive) failover.
Selected 192.168.20.120(192.168.20.120:3306) as a new master.
192.168.20.120(192.168.20.120:3306): OK: Applying all logs succeeded.
192.168.20.231(192.168.20.231:3306): ERROR: Starting slave failed.
Master failover to 192.168.20.120(192.168.20.120:3306) done, but recovery on slave partially failed.
Fri May 31 20:04:25 2024 - [info] MHA::MasterMonitor version 0.58.
Fri May 31 20:04:26 2024 - [info] GTID failover mode = 1
Fri May 31 20:04:26 2024 - [info] Dead Servers:
Fri May 31 20:04:26 2024 - [info] Alive Servers:
Fri May 31 20:04:26 2024 - [info] 192.168.20.132(192.168.20.132:3306)
Fri May 31 20:04:26 2024 - [info] 192.168.20.120(192.168.20.120:3306)
Fri May 31 20:04:26 2024 - [info] 192.168.20.231(192.168.20.231:3306)
Fri May 31 20:04:26 2024 - [info] Alive Slaves:
Fri May 31 20:04:26 2024 - [info] 192.168.20.120(192.168.20.120:3306) Version=8.0.20 (oldest major version between slaves) log-bin:enabled
Fri May 31 20:04:26 2024 - [info] GTID ON
Fri May 31 20:04:26 2024 - [info] Replicating from 192.168.20.132(192.168.20.132:3306)
Fri May 31 20:04:26 2024 - [info] 192.168.20.231(192.168.20.231:3306) Version=8.0.20 (oldest major version between slaves) log-bin:enabled
Fri May 31 20:04:26 2024 - [info] GTID ON
Fri May 31 20:04:26 2024 - [info] Replicating from 192.168.20.132(192.168.20.132:3306)
Fri May 31 20:04:26 2024 - [info] Current Alive Master: 192.168.20.132(192.168.20.132:3306)
Fri May 31 20:04:26 2024 - [info] Checking slave configurations..
Fri May 31 20:04:26 2024 - [info] read_only=1 is not set on slave 192.168.20.120(192.168.20.120:3306).
Fri May 31 20:04:26 2024 - [info] read_only=1 is not set on slave 192.168.20.231(192.168.20.231:3306).
Fri May 31 20:04:26 2024 - [info] Checking replication filtering settings..
Fri May 31 20:04:26 2024 - [info] binlog_do_db= , binlog_ignore_db=
Fri May 31 20:04:26 2024 - [info] Replication filtering check ok.
Fri May 31 20:04:26 2024 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri May 31 20:04:26 2024 - [info] Checking SSH publickey authentication settings on the current master..
Fri May 31 20:04:26 2024 - [info] HealthCheck: SSH to 192.168.20.132 is reachable.
Fri May 31 20:04:26 2024 - [info]
192.168.20.132(192.168.20.132:3306) (current master)+--192.168.20.120(192.168.20.120:3306)+--192.168.20.231(192.168.20.231:3306)Fri May 31 20:04:26 2024 - [warning] master_ip_failover_script is not defined.
Fri May 31 20:04:26 2024 - [warning] shutdown_script is not defined.
Fri May 31 20:04:26 2024 - [info] Set master ping interval 2 seconds.
Fri May 31 20:04:26 2024 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Fri May 31 20:04:26 2024 - [info] Starting ping health check on 192.168.20.132(192.168.20.132:3306)..
Fri May 31 20:04:26 2024 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Fri May 31 20:05:04 2024 - [warning] Got error on MySQL select ping: 1053 (Server shutdown in progress)
Fri May 31 20:05:04 2024 - [info] Executing SSH check script: exit 0
Fri May 31 20:05:04 2024 - [info] HealthCheck: SSH to 192.168.20.132 is reachable.
Fri May 31 20:05:06 2024 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.20.132' (111))
Fri May 31 20:05:06 2024 - [warning] Connection failed 2 time(s)..
Fri May 31 20:05:08 2024 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.20.132' (111))
Fri May 31 20:05:08 2024 - [warning] Connection failed 3 time(s)..
Fri May 31 20:05:10 2024 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.20.132' (111))
Fri May 31 20:05:10 2024 - [warning] Connection failed 4 time(s)..
Fri May 31 20:05:10 2024 - [warning] Master is not reachable from health checker!
Fri May 31 20:05:10 2024 - [warning] Master 192.168.20.132(192.168.20.132:3306) is not reachable!
Fri May 31 20:05:10 2024 - [warning] SSH is reachable.
Fri May 31 20:05:10 2024 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha/app1.cnf again, and trying to connect to all servers to check server status..
Fri May 31 20:05:10 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri May 31 20:05:10 2024 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Fri May 31 20:05:10 2024 - [info] Reading server configuration from /etc/mha/app1.cnf..
Fri May 31 20:05:11 2024 - [info] GTID failover mode = 1
Fri May 31 20:05:11 2024 - [info] Dead Servers:
Fri May 31 20:05:11 2024 - [info] 192.168.20.132(192.168.20.132:3306)
Fri May 31 20:05:11 2024 - [info] Alive Servers:
Fri May 31 20:05:11 2024 - [info] 192.168.20.120(192.168.20.120:3306)
Fri May 31 20:05:11 2024 - [info] 192.168.20.231(192.168.20.231:3306)
Fri May 31 20:05:11 2024 - [info] Alive Slaves:
Fri May 31 20:05:11 2024 - [info] 192.168.20.120(192.168.20.120:3306) Version=8.0.20 (oldest major version between slaves) log-bin:enabled
Fri May 31 20:05:11 2024 - [info] GTID ON
Fri May 31 20:05:11 2024 - [info] Replicating from 192.168.20.132(192.168.20.132:3306)
Fri May 31 20:05:11 2024 - [info] 192.168.20.231(192.168.20.231:3306) Version=8.0.20 (oldest major version between slaves) log-bin:enabled
Fri May 31 20:05:11 2024 - [info] GTID ON
Fri May 31 20:05:11 2024 - [info] Replicating from 192.168.20.132(192.168.20.132:3306)
Fri May 31 20:05:11 2024 - [info] Checking slave configurations..
Fri May 31 20:05:11 2024 - [info] read_only=1 is not set on slave 192.168.20.120(192.168.20.120:3306).
Fri May 31 20:05:11 2024 - [info] read_only=1 is not set on slave 192.168.20.231(192.168.20.231:3306).
Fri May 31 20:05:11 2024 - [info] Checking replication filtering settings..
Fri May 31 20:05:11 2024 - [info] Replication filtering check ok.
Fri May 31 20:05:11 2024 - [info] Master is down!
Fri May 31 20:05:11 2024 - [info] Terminating monitoring script.
Fri May 31 20:05:11 2024 - [info] Got exit code 20 (Master dead).
Fri May 31 20:05:11 2024 - [info] MHA::MasterFailover version 0.58.
Fri May 31 20:05:11 2024 - [info] Starting master failover.
Fri May 31 20:05:11 2024 - [info]
Fri May 31 20:05:11 2024 - [info] * Phase 1: Configuration Check Phase..
Fri May 31 20:05:11 2024 - [info]
Fri May 31 20:05:12 2024 - [info] GTID failover mode = 1
Fri May 31 20:05:12 2024 - [info] Dead Servers:
Fri May 31 20:05:12 2024 - [info] 192.168.20.132(192.168.20.132:3306)
Fri May 31 20:05:12 2024 - [info] Checking master reachability via MySQL(double check)...
Fri May 31 20:05:12 2024 - [info] ok.
Fri May 31 20:05:12 2024 - [info] Alive Servers:
Fri May 31 20:05:12 2024 - [info] 192.168.20.120(192.168.20.120:3306)
Fri May 31 20:05:12 2024 - [info] 192.168.20.231(192.168.20.231:3306)
Fri May 31 20:05:12 2024 - [info] Alive Slaves:
Fri May 31 20:05:12 2024 - [info] 192.168.20.120(192.168.20.120:3306) Version=8.0.20 (oldest major version between slaves) log-bin:enabled
Fri May 31 20:05:12 2024 - [info] GTID ON
Fri May 31 20:05:12 2024 - [info] Replicating from 192.168.20.132(192.168.20.132:3306)
Fri May 31 20:05:12 2024 - [info] 192.168.20.231(192.168.20.231:3306) Version=8.0.20 (oldest major version between slaves) log-bin:enabled
Fri May 31 20:05:12 2024 - [info] GTID ON
Fri May 31 20:05:12 2024 - [info] Replicating from 192.168.20.132(192.168.20.132:3306)
Fri May 31 20:05:12 2024 - [info] Starting GTID based failover.
Fri May 31 20:05:12 2024 - [info]
Fri May 31 20:05:12 2024 - [info] ** Phase 1: Configuration Check Phase completed.
Fri May 31 20:05:12 2024 - [info]
Fri May 31 20:05:12 2024 - [info] * Phase 2: Dead Master Shutdown Phase..
Fri May 31 20:05:12 2024 - [info]
Fri May 31 20:05:12 2024 - [info] Forcing shutdown so that applications never connect to the current master..
Fri May 31 20:05:12 2024 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master IP address.
Fri May 31 20:05:12 2024 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Fri May 31 20:05:13 2024 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Fri May 31 20:05:13 2024 - [info]
Fri May 31 20:05:13 2024 - [info] * Phase 3: Master Recovery Phase..
Fri May 31 20:05:13 2024 - [info]
Fri May 31 20:05:13 2024 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Fri May 31 20:05:13 2024 - [info]
Fri May 31 20:05:13 2024 - [info] The latest binary log file/position on all slaves is binlog.000011:196
Fri May 31 20:05:13 2024 - [info] Latest slaves (Slaves that received relay log files to the latest):
Fri May 31 20:05:13 2024 - [info] 192.168.20.120(192.168.20.120:3306) Version=8.0.20 (oldest major version between slaves) log-bin:enabled
Fri May 31 20:05:13 2024 - [info] GTID ON
Fri May 31 20:05:13 2024 - [info] Replicating from 192.168.20.132(192.168.20.132:3306)
Fri May 31 20:05:13 2024 - [info] 192.168.20.231(192.168.20.231:3306) Version=8.0.20 (oldest major version between slaves) log-bin:enabled
Fri May 31 20:05:13 2024 - [info] GTID ON
Fri May 31 20:05:13 2024 - [info] Replicating from 192.168.20.132(192.168.20.132:3306)
Fri May 31 20:05:13 2024 - [info] The oldest binary log file/position on all slaves is binlog.000011:196
Fri May 31 20:05:13 2024 - [info] Oldest slaves:
Fri May 31 20:05:13 2024 - [info] 192.168.20.120(192.168.20.120:3306) Version=8.0.20 (oldest major version between slaves) log-bin:enabled
Fri May 31 20:05:13 2024 - [info] GTID ON
Fri May 31 20:05:13 2024 - [info] Replicating from 192.168.20.132(192.168.20.132:3306)
Fri May 31 20:05:13 2024 - [info] 192.168.20.231(192.168.20.231:3306) Version=8.0.20 (oldest major version between slaves) log-bin:enabled
Fri May 31 20:05:13 2024 - [info] GTID ON
Fri May 31 20:05:13 2024 - [info] Replicating from 192.168.20.132(192.168.20.132:3306)
Fri May 31 20:05:13 2024 - [info]
Fri May 31 20:05:13 2024 - [info] * Phase 3.3: Determining New Master Phase..
Fri May 31 20:05:13 2024 - [info]
Fri May 31 20:05:13 2024 - [info] Searching new master from slaves..
Fri May 31 20:05:13 2024 - [info] Candidate masters from the configuration file:
Fri May 31 20:05:13 2024 - [info] Non-candidate masters:
Fri May 31 20:05:13 2024 - [info] New master is 192.168.20.120(192.168.20.120:3306)
Fri May 31 20:05:13 2024 - [info] Starting master failover..
Fri May 31 20:05:13 2024 - [info]
From:
192.168.20.132(192.168.20.132:3306) (current master)+--192.168.20.120(192.168.20.120:3306)+--192.168.20.231(192.168.20.231:3306)To:
192.168.20.120(192.168.20.120:3306) (new master)+--192.168.20.231(192.168.20.231:3306)
Fri May 31 20:05:13 2024 - [info]
Fri May 31 20:05:13 2024 - [info] * Phase 3.3: New Master Recovery Phase..
Fri May 31 20:05:13 2024 - [info]
Fri May 31 20:05:13 2024 - [info] Waiting all logs to be applied..
Fri May 31 20:05:13 2024 - [info] done.
Fri May 31 20:05:13 2024 - [info] Getting new master's binlog name and position..
Fri May 31 20:05:13 2024 - [info] binlog.000003:196
Fri May 31 20:05:13 2024 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.20.120', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='replmha', MASTER_PASSWORD='xxx';
Fri May 31 20:05:13 2024 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: binlog.000003, 196, 93909ace-1b58-11ef-81d8-000c2912a662:1-7
Fri May 31 20:05:13 2024 - [warning] master_ip_failover_script is not set. Skipping taking over new master IP address.
Fri May 31 20:05:13 2024 - [info] ** Finished master recovery successfully.
Fri May 31 20:05:13 2024 - [info] * Phase 3: Master Recovery Phase completed.
Fri May 31 20:05:13 2024 - [info]
Fri May 31 20:05:13 2024 - [info] * Phase 4: Slaves Recovery Phase..
Fri May 31 20:05:13 2024 - [info]
Fri May 31 20:05:13 2024 - [info]
Fri May 31 20:05:13 2024 - [info] * Phase 4.1: Starting Slaves in parallel..
Fri May 31 20:05:13 2024 - [info]
Fri May 31 20:05:13 2024 - [info] -- Slave recovery on host 192.168.20.231(192.168.20.231:3306) started, pid: 38527. Check tmp log /var/log/mha/app1/192.168.20.231_3306_20240531200511.log if it takes time..
Fri May 31 20:05:15 2024 - [info]
Fri May 31 20:05:15 2024 - [info] Log messages from 192.168.20.231 ...
Fri May 31 20:05:15 2024 - [info]
Fri May 31 20:05:13 2024 - [info] Resetting slave 192.168.20.231(192.168.20.231:3306) and starting replication from the new master 192.168.20.120(192.168.20.120:3306)..
Fri May 31 20:05:13 2024 - [info] Executed CHANGE MASTER.
Fri May 31 20:05:14 2024 - [info] Slave started.
Fri May 31 20:05:14 2024 - [info] gtid_wait(93909ace-1b58-11ef-81d8-000c2912a662:1-7) completed on 192.168.20.231(192.168.20.231:3306). Executed 0 events.
Fri May 31 20:05:15 2024 - [info] End of log messages from 192.168.20.231.
Fri May 31 20:05:15 2024 - [info] -- Slave on host 192.168.20.231(192.168.20.231:3306) started.
Fri May 31 20:05:15 2024 - [info] All new slave servers recovered successfully.
Fri May 31 20:05:15 2024 - [info]
Fri May 31 20:05:15 2024 - [info] * Phase 5: New master cleanup phase..
Fri May 31 20:05:15 2024 - [info]
Fri May 31 20:05:15 2024 - [info] Resetting slave info on the new master..
Fri May 31 20:05:15 2024 - [info] 192.168.20.120: Resetting slave info succeeded.
Fri May 31 20:05:15 2024 - [info] Master failover to 192.168.20.120(192.168.20.120:3306) completed successfully.
Fri May 31 20:05:15 2024 - [info] Deleted server1 entry from /etc/mha/app1.cnf .
Fri May 31 20:05:15 2024 - [info] ----- Failover Report -----app1: MySQL Master failover 192.168.20.132(192.168.20.132:3306) to 192.168.20.120(192.168.20.120:3306) succeededMaster 192.168.20.132(192.168.20.132:3306) is down!Check MHA Manager logs at localhost.localdomain:/var/log/mha/app1/manager for details.Started automated(non-interactive) failover.
Selected 192.168.20.120(192.168.20.120:3306) as a new master.
192.168.20.120(192.168.20.120:3306): OK: Applying all logs succeeded.
192.168.20.231(192.168.20.231:3306): OK: Slave started, replicating from 192.168.20.120(192.168.20.120:3306)
192.168.20.120(192.168.20.120:3306): Resetting slave info succeeded.
Master failover to 192.168.20.120(192.168.20.120:3306) completed successfully.
4.修复主库
重启主库数据库服务即可
如果是实际生产中怎么办?——>判断是否有可恢复性,如果没有就重新初始化,重构
5.恢复主从
此时db02是主库,db03是他的从库也是master库,db01失去了主从环境
需要把db01重新设定为db02的从库
6.修改配置文件
vim /etc/mha/app1.cnf
打开配置文件发现,db01的配置已经被mha自动删掉了,再添加上就可以
7.重新启动MHA
nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null> /var/log/mha/app1/manager.log 2>&1 &
8.查看MHA状态
[root@localhost /]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:100728) is running(0:PING_OK), master:192.168.20.120