本章节基于:MySQL基于MHA高可用部署篇(二)
通过上一个章节实验,我们知道mha可以正常切换主从,但是当mysql正常切换之后,应用中是无法正常自动切换的,这时就需要vip了。VIP配置可以采用两种方式,一种通过keepalived的方式管理虚拟ip的浮动;另外一种通过脚本方式启动虚拟ip的方式(即不需要keepalived或者heartbeat类似的软件)。
第一种:通过keepalived配置VIP
keepalived方式管理虚拟ip,keepalived配置方法如下:
1)下载软件进行并进行安装(两台master,准确的说一台是master,另外一台是备选master,在没有切换以前是slave)
1 |
$ yum install keepalived |
2)配置keepalived的配置文件,在master上配置(10.99.73.9)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
[root@mysql01 ~]# cat /etc/keepalived/keepalived.conf Configuration File for keepalived global_defs { notification_email { 18612100045@163.com } notification_email_from root smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id MYSQL_HA } vrrp_instance VI_1 { state BACKUP interface eth0 virtual_router_id 51 priority 100 advert_int 1 nopreempt 1 authentication { auth_type PASS auth_pass 1234 } virtual_ipaddress { 10.99.73.100 } } |
其中router_id MySQL HA表示设定keepalived组的名称,将192.168.0.88这个虚拟ip绑定到该主机的eth1网卡上,并且设置了状态为backup模式,将keepalived的模式设置为非抢占模式(nopreempt),priority 150表示设置的优先级为150。下面的配置略有不同,但是都是一个意思。
在候选master上配置(10.99.73.10)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
[root@mysql02 ~]# cat /etc/keepalived/keepalived.conf ! Configuration File for keepalived global_defs { notification_email { 18612100045@163.com } notification_email_from root smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id MYSQL_HA } vrrp_instance VI_1 { state BACKUP interface eth0 virtual_router_id 51 priority 50 advert_int 1 authentication { auth_type PASS auth_pass 1234 } virtual_ipaddress { 10.99.73.100 } } |
3)启动keepalived服务,在master上启动并查看日志.
1 2 3 4 5 6 7 8 |
[root@mysql01 ~]# systemctl start keepalived.service [root@mysql01 ~]# tailf /var/log/messages Feb 9 10:48:14 mysql01 Keepalived_vrrp[20790]: VRRP_Instance(VI_1) Transition to MASTER STATE Feb 9 10:48:15 mysql01 Keepalived_vrrp[20790]: VRRP_Instance(VI_1) Entering MASTER STATE Feb 9 10:48:15 mysql01 Keepalived_vrrp[20790]: VRRP_Instance(VI_1) setting protocol VIPs. Feb 9 10:48:15 mysql01 Keepalived_vrrp[20790]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 10.99.73.100 Feb 9 10:48:15 mysql01 Keepalived_healthcheckers[20789]: Netlink reflector reports IP 10.99.73.100 added Feb 9 10:48:20 mysql01 Keepalived_vrrp[20790]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 10.99.73.100 |
1 2 3 4 5 6 7 |
[root@mysql01 ~]# ip addr show eth0 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether fa:16:3e:2e:bb:f0 brd ff:ff:ff:ff:ff:ff inet 10.99.73.9/32 scope global eth0 valid_lft forever preferred_lft forever inet 10.99.73.100/32 scope global eth0 valid_lft forever preferred_lft forever |
发现已经将虚拟ip 10.99.73.100绑定了网卡eth0上。
在另外一台服务器,候选master上启动keepalived服务,并观察
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
[root@mysql02 ~]# systemctl start keepalived.service [root@mysql02 ~]# tail -f /var/log/messages Feb 9 10:53:42 mysql02 Keepalived_vrrp[26598]: Netlink reflector reports IP fe80::c880:d7ff:fe95:f666 added Feb 9 10:53:42 mysql02 Keepalived_vrrp[26598]: Netlink reflector reports IP fe80::84e9:d1ff:fed4:99f4 added Feb 9 10:53:42 mysql02 Keepalived_vrrp[26598]: Netlink reflector reports IP fe80::dcf7:32ff:fe52:1ed1 added Feb 9 10:53:42 mysql02 Keepalived_vrrp[26598]: Registering Kernel netlink reflector Feb 9 10:53:42 mysql02 Keepalived_vrrp[26598]: Registering Kernel netlink command channel Feb 9 10:53:42 mysql02 Keepalived_vrrp[26598]: Registering gratuitous ARP shared channel Feb 9 10:53:42 mysql02 Keepalived_vrrp[26598]: Opening file '/etc/keepalived/keepalived.conf'. Feb 9 10:53:42 mysql02 Keepalived_vrrp[26598]: Configuration is using : 63839 Bytes Feb 9 10:53:42 mysql02 Keepalived_vrrp[26598]: Using LinkWatch kernel netlink reflector... Feb 9 10:53:42 mysql02 Keepalived_healthcheckers[26597]: Using LinkWatch kernel netlink reflector... Feb 9 10:53:42 mysql02 Keepalived_vrrp[26598]: VRRP_Instance(VI_1) Entering BACKUP STATE Feb 9 10:53:42 mysql02 Keepalived_vrrp[26598]: VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(10,11)] Feb 9 10:53:42 mysql02 systemd: Started LVS and VRRP High Availability Monitor. |
从上面的信息可以看到keepalived已经配置成功。
现在我们关掉master keepalived,来测试主从切换。
1 2 3 4 5 6 7 8 |
[root@mysql01 ~]# systemctl stop keepalived.service [root@mysql02 ~]# tailf /var/log/messages Feb 9 10:57:02 mysql02 Keepalived_vrrp[26598]: VRRP_Instance(VI_1) Transition to MASTER STATE Feb 9 10:57:03 mysql02 Keepalived_vrrp[26598]: VRRP_Instance(VI_1) Entering MASTER STATE Feb 9 10:57:03 mysql02 Keepalived_vrrp[26598]: VRRP_Instance(VI_1) setting protocol VIPs. Feb 9 10:57:03 mysql02 Keepalived_vrrp[26598]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 10.99.73.100 Feb 9 10:57:03 mysql02 Keepalived_healthcheckers[26597]: Netlink reflector reports IP 10.99.73.100 added Feb 9 10:57:08 mysql02 Keepalived_vrrp[26598]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 10.99.73.100 |
1 2 3 4 5 6 7 |
[root@mysql02 ~]# ip addr show eth0 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether fa:16:3e:2a:63:90 brd ff:ff:ff:ff:ff:ff inet 10.99.73.10/24 brd 10.99.73.255 scope global eth0 valid_lft forever preferred_lft forever inet 10.99.73.100/32 scope global eth0 valid_lft forever preferred_lft forever |
可以看到正常切换过来了,下面要做的就是使用keepalived检查mysql是否存活。
注意:上面两台服务器的keepalived都设置为了BACKUP模式,在keepalived中2种模式,分别是master->backup模式和backup->backup模式。这两种模式有很大区别。在master->backup模式下,一旦主库宕机,虚拟ip会自动漂移到从库,当主库修复后,keepalived启动后,还会把虚拟ip抢占过来,即使设置了非抢占模式(nopreempt)抢占ip的动作也会发生。在backup->backup模式下,当主库宕机后虚拟ip会自动漂移到从库上,当原主库恢复和keepalived服务启动后,并不会抢占新主的虚拟ip,即使是优先级高于从库的优先级别,也不会发生抢占。为了减少ip漂移次数,通常是把修复好的主库当做新的备库。
4)MHA引入keepalived(MySQL服务进程挂掉时通过MHA停止keepalived)
要想把keepalived服务引入MHA,我们只需要修改切换是触发的脚本文件master_ip_failover即可,在该脚本中添加在master发生宕机时对keepalived的处理。
编辑脚本/usr/local/bin/master_ip_failover,修改后如下,我对perl不熟悉,所以我这里完整贴出该脚本。
在MHA Manager修改脚本,修改后的内容如下(参考资料比较少):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
#!/usr/bin/env perl use strict; use warnings FATAL => 'all'; use Getopt::Long; my ( $command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port $ssh_user ); my $ssh_start_vip = "systemctl start keepalived.service"; my $ssh_stop_vip = "systemctl stop keepalived.service"; GetOptions( 'command=s' => \$command, 'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, ); exit &main(); sub main { print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n"; if ( $command eq "stop" || $command eq "stopssh" ) { my $exit_code = 1; eval { print "Disabling the VIP on old master: $orig_master_host \n"; &stop_vip(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { my $exit_code = 10; eval { print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); $exit_code = 0; }; if ($@) { warn $@; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { print "Checking the Status of the script.. OK \n"; #`ssh $ssh_user\@cluster1 \" $ssh_start_vip \"`; exit 0; } else { &usage(); exit 1; } } # A simple system call that enable the VIP on the new master sub start_vip() { `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`; } # A simple system call that disable the VIP on the old_master sub stop_vip() { return 0 unless ($ssh_user); `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } sub usage { print "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n"; } |
/usr/local/bin/master_ip_failover添加或者修改的内容意思是当主库数据库发生故障时,会触发MHA切换,MHA Manager会停掉主库上的keepalived服务,触发虚拟ip漂移到备选从库,从而完成切换。当然可以在keepalived里面引入脚本,这个脚本监控mysql是否正常运行,如果不正常,则调用该脚本杀掉keepalived进程。
MHA manager配置文件调用方式如下,自动故障切换和在线切换都是使用这个脚本即可。
1 2 |
master_ip_failover_script=/usr/local/bin/master_ip_failover --ssh_user=root master_ip_online_change_script=/usr/local/bin/master_ip_failover --ssh_user=root |
比如有如下环境都已配置好,如下:
mysql01为mysql master+keepalived master+vip(10.99.73.100)
mysql02为mysql slave+keepalived slave+vip(10.99.73.100)
mysql03为mysql slave
mha为monitor监控整个复制组,然后你可以停掉mysql01的mysql master实例,通过mha监控加master_ip_failover脚本达到vip转移到mysql02主机,同时mysql02成为新的mysql master,而mysql03成为mysql02的从。
下面就是整个mha切换的日志:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 |
[root@mha ~]# cat /var/log/masterha/app1/manager.log Thu Feb 9 11:51:22 2017 - [warning] Got error on MySQL select ping: 2013 (Lost connection to MySQL server during query) Thu Feb 9 11:51:22 2017 - [info] Executing secondary network check script: /usr/local/bin/masterha_secondary_check -s mysql02 -s mysql03 --user=root --master_host=mysql01 --master_ip=10.99.73.9 --master_port=3306 --user=root --master_host=10.99.73.9 --master_ip=10.99.73.9 --master_port=3306 --master_user=mha --master_password=123456 --ping_type=SELECT Thu Feb 9 11:51:22 2017 - [info] Executing SSH check script: exit 0 Thu Feb 9 11:51:22 2017 - [info] HealthCheck: SSH to 10.99.73.9 is reachable. Monitoring server mysql02 is reachable, Master is not reachable from mysql02. OK. Monitoring server mysql03 is reachable, Master is not reachable from mysql03. OK. Thu Feb 9 11:51:22 2017 - [info] Master is not reachable from all other monitoring servers. Failover should start. Thu Feb 9 11:51:25 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.99.73.9' (111)) Thu Feb 9 11:51:25 2017 - [warning] Connection failed 2 time(s).. Thu Feb 9 11:51:28 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.99.73.9' (111)) Thu Feb 9 11:51:28 2017 - [warning] Connection failed 3 time(s).. Thu Feb 9 11:51:31 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.99.73.9' (111)) Thu Feb 9 11:51:31 2017 - [warning] Connection failed 4 time(s).. Thu Feb 9 11:51:31 2017 - [warning] Master is not reachable from health checker! Thu Feb 9 11:51:31 2017 - [warning] Master 10.99.73.9(10.99.73.9:3306) is not reachable! Thu Feb 9 11:51:31 2017 - [warning] SSH is reachable. Thu Feb 9 11:51:31 2017 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status.. Thu Feb 9 11:51:31 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Thu Feb 9 11:51:31 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Thu Feb 9 11:51:31 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Thu Feb 9 11:51:31 2017 - [info] GTID failover mode = 1 Thu Feb 9 11:51:31 2017 - [info] Dead Servers: Thu Feb 9 11:51:31 2017 - [info] 10.99.73.9(10.99.73.9:3306) Thu Feb 9 11:51:31 2017 - [info] Alive Servers: Thu Feb 9 11:51:31 2017 - [info] 10.99.73.10(10.99.73.10:3306) Thu Feb 9 11:51:31 2017 - [info] 10.99.73.11(10.99.73.11:3306) Thu Feb 9 11:51:31 2017 - [info] Alive Slaves: Thu Feb 9 11:51:31 2017 - [info] 10.99.73.10(10.99.73.10:3306) Version=5.7.17-log (oldest major version between slaves) log-bin:enabled Thu Feb 9 11:51:31 2017 - [info] GTID ON Thu Feb 9 11:51:31 2017 - [info] Replicating from 10.99.73.9(10.99.73.9:3306) Thu Feb 9 11:51:31 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Feb 9 11:51:31 2017 - [info] 10.99.73.11(10.99.73.11:3306) Version=5.7.17-log (oldest major version between slaves) log-bin:enabled Thu Feb 9 11:51:31 2017 - [info] GTID ON Thu Feb 9 11:51:31 2017 - [info] Replicating from 10.99.73.9(10.99.73.9:3306) Thu Feb 9 11:51:31 2017 - [info] Not candidate for the new Master (no_master is set) Thu Feb 9 11:51:31 2017 - [info] Checking slave configurations.. Thu Feb 9 11:51:31 2017 - [info] Checking replication filtering settings.. Thu Feb 9 11:51:31 2017 - [info] Replication filtering check ok. Thu Feb 9 11:51:31 2017 - [info] Master is down! Thu Feb 9 11:51:31 2017 - [info] Terminating monitoring script. Thu Feb 9 11:51:31 2017 - [info] Got exit code 20 (Master dead). Thu Feb 9 11:51:31 2017 - [info] MHA::MasterFailover version 0.57. Thu Feb 9 11:51:31 2017 - [info] Starting master failover. Thu Feb 9 11:51:31 2017 - [info] Thu Feb 9 11:51:31 2017 - [info] * Phase 1: Configuration Check Phase.. Thu Feb 9 11:51:31 2017 - [info] Thu Feb 9 11:51:31 2017 - [info] GTID failover mode = 1 Thu Feb 9 11:51:31 2017 - [info] Dead Servers: Thu Feb 9 11:51:31 2017 - [info] 10.99.73.9(10.99.73.9:3306) Thu Feb 9 11:51:31 2017 - [info] Checking master reachability via MySQL(double check)... Thu Feb 9 11:51:31 2017 - [info] ok. Thu Feb 9 11:51:31 2017 - [info] Alive Servers: Thu Feb 9 11:51:31 2017 - [info] 10.99.73.10(10.99.73.10:3306) Thu Feb 9 11:51:31 2017 - [info] 10.99.73.11(10.99.73.11:3306) Thu Feb 9 11:51:31 2017 - [info] Alive Slaves: Thu Feb 9 11:51:31 2017 - [info] 10.99.73.10(10.99.73.10:3306) Version=5.7.17-log (oldest major version between slaves) log-bin:enabled Thu Feb 9 11:51:31 2017 - [info] GTID ON Thu Feb 9 11:51:31 2017 - [info] Replicating from 10.99.73.9(10.99.73.9:3306) Thu Feb 9 11:51:31 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Feb 9 11:51:31 2017 - [info] 10.99.73.11(10.99.73.11:3306) Version=5.7.17-log (oldest major version between slaves) log-bin:enabled Thu Feb 9 11:51:31 2017 - [info] GTID ON Thu Feb 9 11:51:31 2017 - [info] Replicating from 10.99.73.9(10.99.73.9:3306) Thu Feb 9 11:51:31 2017 - [info] Not candidate for the new Master (no_master is set) Thu Feb 9 11:51:31 2017 - [info] Starting GTID based failover. Thu Feb 9 11:51:31 2017 - [info] Thu Feb 9 11:51:31 2017 - [info] ** Phase 1: Configuration Check Phase completed. Thu Feb 9 11:51:31 2017 - [info] Thu Feb 9 11:51:31 2017 - [info] * Phase 2: Dead Master Shutdown Phase.. Thu Feb 9 11:51:31 2017 - [info] Thu Feb 9 11:51:31 2017 - [info] Forcing shutdown so that applications never connect to the current master.. Thu Feb 9 11:51:31 2017 - [info] Executing master IP deactivation script: Thu Feb 9 11:51:31 2017 - [info] /usr/local/bin/master_ip_failover --orig_master_host=10.99.73.9 --orig_master_ip=10.99.73.9 --orig_master_port=3306 --command=stopssh --ssh_user=root IN SCRIPT TEST====systemctl stop keepalived.service==systemctl start keepalived.service=== Disabling the VIP on old master: 10.99.73.9 Thu Feb 9 11:51:31 2017 - [info] done. Thu Feb 9 11:51:31 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Thu Feb 9 11:51:31 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed. Thu Feb 9 11:51:31 2017 - [info] Thu Feb 9 11:51:31 2017 - [info] * Phase 3: Master Recovery Phase.. Thu Feb 9 11:51:31 2017 - [info] Thu Feb 9 11:51:31 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Thu Feb 9 11:51:31 2017 - [info] Thu Feb 9 11:51:31 2017 - [info] The latest binary log file/position on all slaves is mysql-bin.000003:5825639 Thu Feb 9 11:51:31 2017 - [info] Retrieved Gtid Set: 0f7c2151-edc0-11e6-ab9e-fa163e2ebbf0:9695-28014 Thu Feb 9 11:51:31 2017 - [info] Latest slaves (Slaves that received relay log files to the latest): Thu Feb 9 11:51:31 2017 - [info] 10.99.73.10(10.99.73.10:3306) Version=5.7.17-log (oldest major version between slaves) log-bin:enabled Thu Feb 9 11:51:31 2017 - [info] GTID ON Thu Feb 9 11:51:31 2017 - [info] Replicating from 10.99.73.9(10.99.73.9:3306) Thu Feb 9 11:51:31 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Feb 9 11:51:31 2017 - [info] 10.99.73.11(10.99.73.11:3306) Version=5.7.17-log (oldest major version between slaves) log-bin:enabled Thu Feb 9 11:51:31 2017 - [info] GTID ON Thu Feb 9 11:51:31 2017 - [info] Replicating from 10.99.73.9(10.99.73.9:3306) Thu Feb 9 11:51:31 2017 - [info] Not candidate for the new Master (no_master is set) Thu Feb 9 11:51:31 2017 - [info] The oldest binary log file/position on all slaves is mysql-bin.000003:5825639 Thu Feb 9 11:51:31 2017 - [info] Retrieved Gtid Set: 0f7c2151-edc0-11e6-ab9e-fa163e2ebbf0:9695-28014 Thu Feb 9 11:51:31 2017 - [info] Oldest slaves: Thu Feb 9 11:51:31 2017 - [info] 10.99.73.10(10.99.73.10:3306) Version=5.7.17-log (oldest major version between slaves) log-bin:enabled Thu Feb 9 11:51:31 2017 - [info] GTID ON Thu Feb 9 11:51:31 2017 - [info] Replicating from 10.99.73.9(10.99.73.9:3306) Thu Feb 9 11:51:31 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Feb 9 11:51:31 2017 - [info] 10.99.73.11(10.99.73.11:3306) Version=5.7.17-log (oldest major version between slaves) log-bin:enabled Thu Feb 9 11:51:31 2017 - [info] GTID ON Thu Feb 9 11:51:31 2017 - [info] Replicating from 10.99.73.9(10.99.73.9:3306) Thu Feb 9 11:51:31 2017 - [info] Not candidate for the new Master (no_master is set) Thu Feb 9 11:51:31 2017 - [info] Thu Feb 9 11:51:31 2017 - [info] * Phase 3.3: Determining New Master Phase.. Thu Feb 9 11:51:31 2017 - [info] Thu Feb 9 11:51:31 2017 - [info] Searching new master from slaves.. Thu Feb 9 11:51:31 2017 - [info] Candidate masters from the configuration file: Thu Feb 9 11:51:31 2017 - [info] 10.99.73.10(10.99.73.10:3306) Version=5.7.17-log (oldest major version between slaves) log-bin:enabled Thu Feb 9 11:51:31 2017 - [info] GTID ON Thu Feb 9 11:51:31 2017 - [info] Replicating from 10.99.73.9(10.99.73.9:3306) Thu Feb 9 11:51:31 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Feb 9 11:51:31 2017 - [info] Non-candidate masters: Thu Feb 9 11:51:31 2017 - [info] 10.99.73.11(10.99.73.11:3306) Version=5.7.17-log (oldest major version between slaves) log-bin:enabled Thu Feb 9 11:51:31 2017 - [info] GTID ON Thu Feb 9 11:51:31 2017 - [info] Replicating from 10.99.73.9(10.99.73.9:3306) Thu Feb 9 11:51:31 2017 - [info] Not candidate for the new Master (no_master is set) Thu Feb 9 11:51:31 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Thu Feb 9 11:51:31 2017 - [info] New master is 10.99.73.10(10.99.73.10:3306) Thu Feb 9 11:51:31 2017 - [info] Starting master failover.. Thu Feb 9 11:51:31 2017 - [info] From: 10.99.73.9(10.99.73.9:3306) (current master) +--10.99.73.10(10.99.73.10:3306) +--10.99.73.11(10.99.73.11:3306) To: 10.99.73.10(10.99.73.10:3306) (new master) +--10.99.73.11(10.99.73.11:3306) Thu Feb 9 11:51:31 2017 - [info] Thu Feb 9 11:51:31 2017 - [info] * Phase 3.3: New Master Recovery Phase.. Thu Feb 9 11:51:31 2017 - [info] Thu Feb 9 11:51:31 2017 - [info] Waiting all logs to be applied.. Thu Feb 9 11:51:31 2017 - [info] done. Thu Feb 9 11:51:31 2017 - [info] Getting new master's binlog name and position.. Thu Feb 9 11:51:31 2017 - [info] mysql-bin.000002:6975266 Thu Feb 9 11:51:31 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.99.73.10', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='mysql_slave', MASTER_PASSWORD='xxx'; Thu Feb 9 11:51:31 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000002, 6975266, 0f7c2151-edc0-11e6-ab9e-fa163e2ebbf0:1-28014 Thu Feb 9 11:51:31 2017 - [info] Executing master IP activate script: Thu Feb 9 11:51:31 2017 - [info] /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=10.99.73.9 --orig_master_ip=10.99.73.9 --orig_master_port=3306 --new_master_host=10.99.73.10 --new_master_ip=10.99.73.10 --new_master_port=3306 --new_master_user='mha' --new_master_password=xxx Unknown option: new_master_user Unknown option: new_master_password IN SCRIPT TEST====systemctl stop keepalived.service==systemctl start keepalived.service=== Enabling the VIP - 10.99.73.100 on the new master - 10.99.73.10 Thu Feb 9 11:51:31 2017 - [info] OK. Thu Feb 9 11:51:31 2017 - [info] Setting read_only=0 on 10.99.73.10(10.99.73.10:3306).. Thu Feb 9 11:51:31 2017 - [info] ok. Thu Feb 9 11:51:31 2017 - [info] ** Finished master recovery successfully. Thu Feb 9 11:51:31 2017 - [info] * Phase 3: Master Recovery Phase completed. Thu Feb 9 11:51:31 2017 - [info] Thu Feb 9 11:51:31 2017 - [info] * Phase 4: Slaves Recovery Phase.. Thu Feb 9 11:51:31 2017 - [info] Thu Feb 9 11:51:31 2017 - [info] Thu Feb 9 11:51:31 2017 - [info] * Phase 4.1: Starting Slaves in parallel.. Thu Feb 9 11:51:31 2017 - [info] Thu Feb 9 11:51:31 2017 - [info] -- Slave recovery on host 10.99.73.11(10.99.73.11:3306) started, pid: 1902. Check tmp log /var/log/masterha/app1/10.99.73.11_3306_20170209115131.log if it takes time.. Thu Feb 9 11:51:32 2017 - [info] Thu Feb 9 11:51:32 2017 - [info] Log messages from 10.99.73.11 ... Thu Feb 9 11:51:32 2017 - [info] Thu Feb 9 11:51:31 2017 - [info] Resetting slave 10.99.73.11(10.99.73.11:3306) and starting replication from the new master 10.99.73.10(10.99.73.10:3306).. Thu Feb 9 11:51:32 2017 - [info] Executed CHANGE MASTER. Thu Feb 9 11:51:32 2017 - [info] Slave started. Thu Feb 9 11:51:32 2017 - [info] gtid_wait(0f7c2151-edc0-11e6-ab9e-fa163e2ebbf0:1-28014) completed on 10.99.73.11(10.99.73.11:3306). Executed 0 events. Thu Feb 9 11:51:32 2017 - [info] End of log messages from 10.99.73.11. Thu Feb 9 11:51:32 2017 - [info] -- Slave on host 10.99.73.11(10.99.73.11:3306) started. Thu Feb 9 11:51:32 2017 - [info] All new slave servers recovered successfully. Thu Feb 9 11:51:32 2017 - [info] Thu Feb 9 11:51:32 2017 - [info] * Phase 5: New master cleanup phase.. Thu Feb 9 11:51:32 2017 - [info] Thu Feb 9 11:51:32 2017 - [info] Resetting slave info on the new master.. Thu Feb 9 11:51:32 2017 - [info] 10.99.73.10: Resetting slave info succeeded. Thu Feb 9 11:51:32 2017 - [info] Master failover to 10.99.73.10(10.99.73.10:3306) completed successfully. Thu Feb 9 11:51:32 2017 - [info] Deleted server1 entry from /etc/masterha/app1.cnf . Thu Feb 9 11:51:32 2017 - [info] ----- Failover Report ----- app1: MySQL Master failover 10.99.73.9(10.99.73.9:3306) to 10.99.73.10(10.99.73.10:3306) succeeded Master 10.99.73.9(10.99.73.9:3306) is down! Check MHA Manager logs at mha:/var/log/masterha/app1/manager.log for details. Started automated(non-interactive) failover. Invalidated master IP address on 10.99.73.9(10.99.73.9:3306) Selected 10.99.73.10(10.99.73.10:3306) as a new master. 10.99.73.10(10.99.73.10:3306): OK: Applying all logs succeeded. 10.99.73.10(10.99.73.10:3306): OK: Activated master IP address. 10.99.73.11(10.99.73.11:3306): OK: Slave started, replicating from 10.99.73.10(10.99.73.10:3306) 10.99.73.10(10.99.73.10:3306): Resetting slave info succeeded. Master failover to 10.99.73.10(10.99.73.10:3306) completed successfully. Thu Feb 9 11:51:32 2017 - [info] Sending mail.. |
结果表明vip+mysql master都已经正常切换到了mysql02主机,具体内容细看,这里就不做详细说明了。在使用keepalived时,就不需要arping工具了,高版本keepalived自带了arp处理。
第二种:通过脚本的方式管理VIP
这里是修改/usr/local/bin/master_ip_failover,也可以使用其他的语言完成,比如Python语言。使用Python脚本编写的failover这里就不介绍了。修改完成后内容如下,而且如果使用脚本管理vip的话,需要手动在master服务器上绑定一个vip。
1 |
[root@mysql01 ~]# /sbin/ifconfig eth0:1 10.99.73.100/24 |
通过脚本来维护vip的测试我这里就不说明了,自行测试,脚本如下(测试通过)。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
#!/usr/bin/env perl use strict; use warnings FATAL => 'all'; use Getopt::Long; my ( $orig_master_port, $orig_master_host, $orig_master_ip, $orig_master_user, $orig_master_ssh_user, $orig_master_password, $orig_master_is_new_slave, $new_master_password, $new_master_ssh_user, $new_master_user, $new_master_host, $new_master_ip, $new_master_port, $command, $vip, $interface ,$gateway ); my $orig_master_ssh_port = 22; my $new_master_ssh_port = 22; GetOptions( 'command=s' => \$command, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, 'new_master_user=s' => \$new_master_user, 'new_master_ssh_user=s' => \$new_master_ssh_user, 'new_master_ssh_port:i' => \$new_master_ssh_port, 'new_master_password=s' => \$new_master_password, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'orig_master_user=s' => \$orig_master_user, 'orig_master_ssh_user=s' => \$orig_master_ssh_user, 'orig_master_ssh_port:i' => \$orig_master_ssh_port, 'orig_master_password=s' => \$orig_master_password, 'orig_master_is_new_slave' => \$orig_master_is_new_slave, 'vip=s' => \$vip, 'gateway=s' => \$gateway, 'interface=s' => \$interface, ); my $key = '1'; my @vipnomask=(split(/\//,$vip))[0]; my $ssh_start_vip = "sudo /sbin/ifconfig $interface:$key $vip && sudo /sbin/arping -I $interface -c 1 -s @vipnomask $gateway"; my $ssh_stop_vip = "sudo /sbin/ifconfig $interface:$key down"; exit &main(); sub main { print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n"; if ( $command eq "stop" || $command eq "stopssh" ) { my $exit_code = 1; eval { print "Disabling the VIP on old master: $orig_master_host \n"; &stop_vip(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { my $exit_code = 10; eval { print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); $exit_code = 0; }; if ($@) { warn $@; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { print "Checking the Status of the script.. OK \n"; exit 0; } else { &usage(); exit 1; } } sub start_vip() { `ssh -p $new_master_ssh_port $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`; } sub stop_vip() { `ssh -p $orig_master_ssh_port $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } sub usage { print "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n"; } |
使用方式如下,自动切换和在线切换都是使用这个脚本即可。
1 2 |
master_ip_failover_script=/usr/local/bin/master_ip_failover --interface=eth0 --vip=10.99.73.100/32 --gateway=10.99.73.1 master_ip_online_change_script=/usr/local/bin/master_ip_failover --interface=eth0 --vip=10.99.73.100/32 --gateway=10.99.73.1 |
为了防止脑裂发生,推荐生产环境采用脚本的方式来管理虚拟 IP,而不是使用 keepalived 来完成。但是当我们有了 VIP 之后,在切换时还有一个问题,涉及到网络知识,MySQL 服务器对应的上层路由设备缓存有 ARP 表(MAC 和 IP 对应关系),路由设备在收到请求包之后进行解包,会根据包中的目标 IP 地址在 ARP 表中找 IP 对应的 MAC 地址(找不到就会进行 ARP 广播查找 IP 对应的 MAC 地址),找到后就会进行封包,把包的目标 MAC 地址替换成这里找到的 MAC 地址,然后继续往交换机进行发送。
所以当我们使用 VIP 时,在上层路由设备中缓存有 VIP 对应的服务器 IP,当 VIP 切换后,路由设备并不会立马刷新自己的 ARP 缓存表,所以会出现客户端请求找不到目标主机的情况,这就需要我们人工干预了。
其实就是在我们进行切换时,可以通过使用一个 arping 命令实现。ARP 协议是“Address Resolution Protocol”(地址解析协议)的缩写,工作于 OSI 模型中的第二层,ARP 协议的数据包无法通过路由器。在同一以太网中,通过地址解析协议,源主机可以通过目的主机的 IP 地址获得目的主机的 MAC 地址。arping 就是完成上述过程的程序。
其命令语法如下:
1 |
Usage: arping [-fqbDUAV] [-c count] [-w timeout] [-I device] [-s source] destination |
-b:用于发送以太网广播帧(FFFFFFFFFFFF)。arping一开始使用广播地址,在收到响应后就使用unicast地址。
-q:quiet output不显示任何信息。
-f:表示在收到第一个响应报文后就退出。
-w timeout:设定一个超时时间,单位是秒。如果到了指定时间,arping还没到完全收到响应则退出。
-c count:表示发送指定数量的ARP请求数据包后就停止。如果指定了deadline选项,则arping会等待相同数量的arp响应包,直到超时为止。
-s source:设定arping发送的arp数据包中的SPA字段的值。如果为空,则按下面处理,
- 如果是DAD模式(冲突地址探测),则设置为0.0.0.0。
- 如果是Unsolicited ARP模式(Gratutious ARP)则设置为目标地址。
- 否则从路由表得出。
-I interface:设置ping使用的网络接口。
到此为止,基本MHA集群已经配置完毕。