[server1]
hostname=db42
candidate_master=1
check_repl_delay=0
port=3307
[server2]
hostname=dbo_100_241
candidate_master=1
check_repl_delay=0
port=3307
[server3]
hostname=db121
port=3307
#ignore_fail=1
no_master=1
情况1:
主、备主正常,从IO进程down,主down了之后 Slave_IO_Running: No Slave_SQL_Running: Yes可以正常切换至From:db42(192.168.100.42:3307) (current master) +--dbo_100_241(192.168.100.241:3307) +--db121(192.168.100.121:3307)To:dbo_100_241(192.168.100.241:3307) (new master) +--db121(192.168.100.121:3307)情况2:
主、备主正常,从IO和SQL进程down,主down了之后 Slave_IO_Running: No Slave_SQL_Running: No可以正常切换至From:dbo_100_241(192.168.100.241:3307) (current master) +--db42(192.168.100.42:3307) +--db121(192.168.100.121:3307)To:db42(192.168.100.42:3307) (new master) +--db121(192.168.100.121:3307)情况3:
主、备主正常,从down了,主down了之后Server db121(192.168.100.121:3307) is dead, but must be alive! Check server settings.MHA不进行切换
情况4:
对从增加参数ignore_fail=1主、备主正常,从down了,主down了之后From:db42(192.168.100.42:3307) (current master) +--dbo_100_241(192.168.100.241:3307)To:dbo_100_241(192.168.100.241:3307) (new master)正常切换情况5:
主、备主、从正常,此时备主SQL进程有延迟(>100M),主down[warning] Slave dbo_100_241(192.168.100.241:3307) SQL Thread delays too much. Latest log file:mysql-bin.000027:268227005, Current log file:mysql-bin.000027:107. This server is not selected as a new master because recovery will take long time.(文档上说100M,因为备主延迟大于100M,从加了参数no_master=1,所以找不到可用的主,切换失败)----- Failover Report -----app1: MySQL Master failover db42(192.168.100.42:3307)Master db42(192.168.100.42:3307) is down!Check MHA Manager logs at dbo_100_241:/etc/mha/data/app1.log for details.Started automated(non-interactive) failover.Invalidated master IP address on db42(192.168.100.42:3307)The latest slave db121(192.168.100.121:3307) has all relay logs for recovery.None of existing slaves matches as a new master. Maybe preferred node is misconfigured or all slaves are too far behind.Got Error so couldn't continue failover from here.
情况6:
主、备主、从正常,此时备主SQL进程有延迟(<100M),主down----- Failover Report -----app1: MySQL Master failover db42(192.168.100.42:3307) to dbo_100_241(192.168.100.241:3307) succeededMaster db42(192.168.100.42:3307) is down!Check MHA Manager logs at dbo_100_241:/etc/mha/data/app1.log for details.Started automated(non-interactive) failover.Invalidated master IP address on db42(192.168.100.42:3307)The latest slave dbo_100_241(192.168.100.241:3307) has all relay logs for recovery.Selected dbo_100_241(192.168.100.241:3307) as a new master.dbo_100_241(192.168.100.241:3307): OK: Applying all logs succeeded.dbo_100_241(192.168.100.241:3307): OK: Activated master IP address.db121(192.168.100.121:3307): This host has the latest relay log events.Generating relay diff files from the latest slave succeeded.db121(192.168.100.121:3307): OK: Applying all logs succeeded. Slave started, replicating from dbo_100_241(192.168.100.241:3307)dbo_100_241(192.168.100.241:3307): Resetting slave info succeeded.Master failover to dbo_100_241(192.168.100.241:3307) completed successfully.Fri Feb 26 14:22:10 2016 - [info] Sending mail..正常切换
情况7:
主、备主、从正常,主有压力情况下,手动维护切主masterha_master_switch --conf=/etc/mha/app1.cnf --master_state=alive --new_master_host=db42 --new_master_port=3307 --orig_master_is_new_slave --running_updates_limit=10000 --interactive=0
情况8:
后台突然出现网络波动Sat Feb 27 10:58:30 2016 - [warning] Got timeout on MySQL Ping(SELECT) child process and killed it! at /usr/local/share/perl5/MHA/HealthCheck.pm line 431.Sat Feb 27 10:58:30 2016 - [info] Executing secondary network check script: /usr/local/bin/masterha_secondary_check -s dbo_100_241 -s db121 -s db42 --user=root --master_host=db42 --master_ip=192.168.100.42 --master_port=3307 --user=root --master_host=db42 --master_ip=192.168.100.42 --master_port=3307 --master_user=root --master_password=Qe091cs74 --ping_type=SELECTSat Feb 27 10:58:30 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/oracle/mysqllog/mysql3307 --output_file=/etc/mha/data/save_binary_logs_test --manager_version=0.57 --binlog_prefix=mysql-binSat Feb 27 10:58:31 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.100.42' (4))Sat Feb 27 10:58:31 2016 - [warning] Connection failed 2 time(s)..Sat Feb 27 10:58:32 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.100.42' (4))Sat Feb 27 10:58:32 2016 - [warning] Connection failed 3 time(s)..Sat Feb 27 10:58:33 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.100.42' (4))Sat Feb 27 10:58:33 2016 - [warning] Connection failed 4 time(s)..Sat Feb 27 10:58:35 2016 - [warning] HealthCheck: Got timeout on checking SSH connection to db42! at /usr/local/share/perl5/MHA/HealthCheck.pm line 342.Monitoring server dbo_100_241 is reachable, Master is not reachable from dbo_100_241. OK.ssh: connect to host db121 port 22: No route to hostMonitoring server db121 is NOT reachable!Sat Feb 27 10:58:38 2016 - [warning] At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.Sat Feb 27 10:58:38 2016 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details....Sat Feb 27 10:59:06 2016 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.Sat Feb 27 10:59:07 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.100.42' (4))Sat Feb 27 10:59:07 2016 - [warning] Connection failed 1 time(s)..Sat Feb 27 10:59:07 2016 - [info] Executing secondary network check script: /usr/local/bin/masterha_secondary_check -s dbo_100_241 -s db121 -s db42 --user=root --master_host=db42 --master_ip=192.168.100.42 --master_port=3307 --user=root --master_host=db42 --master_ip=192.168.100.42 --master_port=3307 --master_user=root --master_password=Qe091cs74 --ping_type=SELECTSat Feb 27 10:59:07 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/oracle/mysqllog/mysql3307 --output_file=/etc/mha/data/save_binary_logs_test --manager_version=0.57 --binlog_prefix=mysql-binSat Feb 27 10:59:08 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.100.42' (4))Sat Feb 27 10:59:08 2016 - [warning] Connection failed 2 time(s)..Sat Feb 27 10:59:09 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.100.42' (4))Sat Feb 27 10:59:09 2016 - [warning] Connection failed 3 time(s)..Sat Feb 27 10:59:10 2016 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 113)Sat Feb 27 10:59:10 2016 - [warning] Connection failed 4 time(s)..Sat Feb 27 10:59:12 2016 - [warning] HealthCheck: Got timeout on checking SSH connection to db42! at /usr/local/share/perl5/MHA/HealthCheck.pm line 342.Master is reachable from dbo_100_241!Sat Feb 27 10:59:12 2016 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.Sat Feb 27 10:59:12 2016 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.Sat Feb 27 10:59:12 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..问题总结:
Keepalived在主和备主只能起一个。切换完注意检查,从的read_only,relay_log_purge
切换步骤总结:
Phase 1: Configuration Check Phase..Phase 2: Dead Master Shutdown Phase.. Forcing shutdown so that applications never connect to the current master.. /etc/mha/scripts/master_ip_failover --orig_master_host=db42 --orig_master_ip=192.168.100.42 --orig_master_port=3307 --command=stopssh --ssh_user=root Disabling the VIP on old master: db42Phase 3: Master Recovery Phase..Phase 3.1: Getting Latest Slaves Phase.. The latest binary log file/position on all slaves is mysql-bin.000038:940 Latest slaves (Slaves that received relay log files to the latest): ... The oldest binary log file/position on all slaves is mysql-bin.000038:848 Oldest slaves: ...Phase 3.2: Saving Dead Master's Binlog Phase.. Executing command on the dead master save_binary_logs --command=save... scp from dead master...Phase 3.3: Determining New Master Phase.. Finding the latest slave that has all relay logs for recovering other slaves.. Executing command: apply_diff_relay_logs --command=find --latest_mlf=mysql-bin.000038 --latest_rmlp=940Phase 3.3: New Master Diff Log Generation Phase.. Sending binlog.. scp from local:/etc/mha/data/...Phase 3.4: Master Log Apply Phase.. Applying differential binary/relay Enabling the VIP - on the new master - Phase 4: Slaves Recovery Phase.. Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Generating relay diff files from the latest slave succeeded.Phase 4.2: Starting Parallel Slave Log Apply Phase.. All new slave servers recovered successfully.Phase 5: New master cleanup phase.. Resetting slave info on the new master.
命令杂:
masterha_manager -conf=/etc/mha/app1.cnf --ignore_last_failover > /tmp/app.log 2>&1 &/usr/local/mysql55/bin/mysql -uroot -pQe091cs74 -h127.0.0.1 -P3307 CHANGE MASTER TO MASTER_HOST='192.168.100.241', MASTER_PORT=3307, MASTER_LOG_FILE='mysql-bin.000012', MASTER_LOG_POS=107, MASTER_USER='repl', MASTER_PASSWORD='repl'; start slave;show slave status\G/usr/local/mysql55/bin/mysql -uroot -pQe091cs74 -h127.0.0.1 -P3307 CHANGE MASTER TO MASTER_HOST='192.168.100.42', MASTER_PORT=3307, MASTER_LOG_FILE='mysql-bin.000025', MASTER_LOG_POS=107, MASTER_USER='repl', MASTER_PASSWORD='repl'; start slave;show slave status\Gmasterha_master_switch --conf=/etc/mha/app1.cnf --master_state=alive --new_master_host=db42 --new_master_port=3307 --orig_master_is_new_slave --running_updates_limit=10000 --interactive=0time sysbench --test=oltp --mysql-user=root --mysql-host=192.168.100.199 --mysql-password=Qe091cs74 --mysql-port=3307 --mysql-db=test\--oltp-test-mode=complex --mysql-table-engine=innodb --oltp-table-size=100000000 --mysql-db=test \--oltp-table-name=INNODB_T7 --max-time=600 prepare &