Asynchronous Replication Automatic Failover in MySQL 8.0.22 (2024)

Oracle recently released MySQL 8.0.22, and this new version came with a new asynchronous connection failover mechanism. It allows a replica to automatically establish an asynchronous replication connection to a new source, in case its existing one fails.

In this blog, we’ll look at this connection failover mechanism.

Overview

The asynchronous failover mechanism can be used to keep a replica synchronized with a group of servers that share data (Multisource slave). It will move the replication connection to a new source when the existing source connection fails.

Working Principle

When the existing connection source fails, the replica first retries the same connection for the number of times specified by the MASTER_RETRY_COUNT. The interval between attempts is set by the MASTER_CONNECT_RETRY option. When these attempts are exhausted, the asynchronous connection failover mechanism takes over the failover process.

Note that by default the MASTER_RETRY_COUNT is 86400 (1 day –> 24 hours) and the MASTER_CONNECT_RETRY default value is 60.

To ensure that the asynchronous connection failover mechanism can be activated promptly, set MASTER_RETRY_COUNT to a minimal number that just allows a few retry attempts with the same source, in case the connection failure is caused by a transient network outage.

How to Activate Asynchronous Connection Failover

  • To activate asynchronous connection failover for a replication channel, set SOURCE_CONNECTION_AUTO_FAILOVER=1 on the CHANGE MASTER TO statement for the channel.
  • We have two new functions, which will help to add and delete the server entries from the source list.
    • asynchronous_connection_failover_add_source (add the server entries from the source list)
    • asynchronous_connection_failover_delete_source (delete the server entries from the source list)

While using these functions, you need to specify the arguments like (‘channel’,’host’,port,’network_namespace’,weight).

Example

mysql> select asynchronous_connection_failover_add_source('testing', '192.168.33.12', 3306, '', 100);+----------------------------------------------------------------------------------------+| asynchronous_connection_failover_add_source('testing', '192.168.33.12', 3306, '', 100) |+----------------------------------------------------------------------------------------+| The UDF asynchronous_connection_failover_add_source() executed successfully. |+----------------------------------------------------------------------------------------+1 row in set (0.00 sec)

The source servers need to be configured in the table “mysql.replication_asynchronous_connection_failover”. We can also use the table “performance_schema.replication_asynchronous_connection_failover” to view the available servers in the source list.

Note: If you are not using any channel based replication, this failover mechanism will work. While running the change master statement, there is no need to mention any channel name. But make sure GTID is enabled on all the servers.

Production Use Cases

Say you have three PXC-5.7 node with production data, running behind ProxySQL. Now, we will going to configure the channel based asynchronous replication under node 1 (192.168.33.12).

  • node 1 – 192.168.33.12
  • node 2 – 192.168.33.13
  • node 3 – 192.168.33.14
  • Read Replica – 192.168.33.15
mysql> change master to master_user='repl',master_password='Repl@123',master_host='192.168.33.12',master_auto_position=1,source_connection_auto_failover=1,master_retry_count=3,master_connect_retry=6 for channel "prod_replica";Query OK, 0 rows affected, 2 warnings (0.01 sec)mysql> start replica for channel 'prod_replica';Query OK, 0 rows affected (0.00 sec)

Architecture Diagram

Test Case 1

We will going to add the failover settings:

 mysql> select asynchronous_connection_failover_add_source('prod_replica', '192.168.33.12', 3306, '', 100);+---------------------------------------------------------------------------------------------+| asynchronous_connection_failover_add_source('prod_replica', '192.168.33.12', 3306, '', 100) |+---------------------------------------------------------------------------------------------+| The UDF asynchronous_connection_failover_add_source() executed successfully. |+---------------------------------------------------------------------------------------------+1 row in set (0.00 sec)mysql> select asynchronous_connection_failover_add_source('prod_replica', '192.168.33.13', 3306, '', 80);+--------------------------------------------------------------------------------------------+| asynchronous_connection_failover_add_source('prod_replica', '192.168.33.13', 3306, '', 80) |+--------------------------------------------------------------------------------------------+| The UDF asynchronous_connection_failover_add_source() executed successfully. |+--------------------------------------------------------------------------------------------+1 row in set (0.01 sec)mysql> select asynchronous_connection_failover_add_source('prod_replica', '192.168.33.14', 3306, '', 60);+--------------------------------------------------------------------------------------------+| asynchronous_connection_failover_add_source('prod_replica', '192.168.33.14', 3306, '', 60) |+--------------------------------------------------------------------------------------------+| The UDF asynchronous_connection_failover_add_source() executed successfully. |+--------------------------------------------------------------------------------------------+1 row in set (0.00 sec)mysql> select * from mysql.replication_asynchronous_connection_failover;+-------------------+---------------+------+-------------------+--------+| Channel_name | Host | Port | Network_namespace | Weight |+-------------------+---------------+------+-------------------+--------+| prod_replica | 192.168.33.12 | 3306 | | 100 || prod_replica | 192.168.33.13 | 3306 | | 80 || prod_replica | 192.168.33.14 | 3306 | | 60 |+-------------------+---------------+------+-------------------+--------+3 rows in set (0.00 sec)

Ok all good, I can activate the auto_failover. Let’s stop node 1 (192.168.33.12) MySQL. ProxySQL will promote the next suitable master.

[root@centos12 lib]# service mysqld stopRedirecting to /bin/systemctl stop mysqld.service

In the Replica Server

mysql> show replica statusG*************************** 1. row *************************** Slave_IO_State: Reconnecting after a failed master event read Master_Host: 192.168.33.12 Master_User: repl Master_Port: 3306 Connect_Retry: 6 Master_Log_File: binlog.000004 Read_Master_Log_Pos: 1143 Relay_Log_File: relay-bin-testing.000006 Relay_Log_Pos: 1352 Relay_Master_Log_File: binlog.000004 Slave_IO_Running: Connecting Slave_SQL_Running: Yes Replicate_Do_DB: Last_IO_Error: error reconnecting to master '[emailprotected]:3306' - retry-time: 10 retries: 2 message: Can't connect to MySQL server on '192.168.33.12' (111)

The IO thread is in “connecting” state. This means it is trying to establish the connection from the existing source (node 1) based on the master_retry_countand master_connect_retrysettings.

After a few seconds, you can see the source_host was changed to node 2 (192.168.33.13). Now the failover is done.

mysql> show replica statusG*************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.33.13 Master_User: repl Master_Port: 3306 Connect_Retry: 6 Master_Log_File: binlog.000004 Read_Master_Log_Pos: 1162 Relay_Log_File: relay-bin-testing.000007 Relay_Log_Pos: 487 Relay_Master_Log_File: binlog.000004 Slave_IO_Running: Yes Slave_SQL_Running: Yes Last_IO_Error:

From the Error Log

2020-10-29T22:22:05.679951Z 54 [ERROR] [MY-010584] [Repl] Slave I/O for channel 'prod_replica': error reconnecting to master '[emailprotected]:3306' - retry-time: 10 retries: 3 message: Can't connect to MySQL server on '192.168.33.12' (111), Error_code: MY-0020032020-10-29T22:22:05.681121Z 58 [Warning] [MY-010897] [Repl] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.2020-10-29T22:22:05.682830Z 58 [System] [MY-010562] [Repl] Slave I/O thread for channel 'prod_replica': connected to master '[emailprotected]:3306',replication started in log 'FIRST' at position 26602020-10-29T22:22:05.685175Z 58 [Warning] [MY-010549] [Repl] The master's UUID has changed, although this should not happen unless you have changed it manually. The old UUID was 31b5b7d0-1a25-11eb-8076-080027090068.(END)

Test Case 2

While running the change master statement, there is no need to mention any channel name, whether you are using channel based replication or not.

Example

mysql> change master to master_user='repl',master_password='Repl@123',master_host='192.168.33.12',master_auto_position=1,source_connection_auto_failover=1,master_retry_count=3,master_connect_retry=10;Query OK, 0 rows affected, 2 warnings (0.01 sec)mysql> start replica;Query OK, 0 rows affected (0.00 sec)

Then add the failover settings like below,

select asynchronous_connection_failover_add_source('', '192.168.33.12', 3306, '', 100);select asynchronous_connection_failover_add_source('', '192.168.33.13', 3306, '', 80);select asynchronous_connection_failover_add_source('', '192.168.33.14', 3306, '', 60); mysql> select * from mysql.replication_asynchronous_connection_failover;+--------------+---------------+------+-------------------+--------+| Channel_name | Host | Port | Network_namespace | Weight |+--------------+---------------+------+-------------------+--------+| | 192.168.33.12 | 3306 | | 100 || | 192.168.33.13 | 3306 | | 80 || | 192.168.33.14 | 3306 | | 60 |+--------------+---------------+------+-------------------+--------+3 rows in set (0.00 sec)

Now, I’m going to stop node 1 (192.168.33.12).

Replication Error

Last_IO_Error: error connecting to master '[emailprotected]:3306' - retry-time: 10 retries: 2 message: Can't connect to MySQL server on '192.168.33.12' (111)

From the Error Log

2020-10-30T00:38:03.471482Z 27 [ERROR] [MY-010584] [Repl] Slave I/O for channel '': error connecting to master '[emailprotected]:3306' - retry-time: 10 retries: 3 message: Can't connect to MySQL server on '192.168.33.12' (111), Error_code: MY-0020032020-10-30T00:38:03.472002Z 29 [Warning] [MY-010897] [Repl] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.2020-10-30T00:38:03.473493Z 29 [System] [MY-010562] [Repl] Slave I/O thread for channel '': connected to master '[emailprotected]:3306',replication started in log 'FIRST' at position 2342020-10-30T00:38:03.475471Z 29 [Warning] [MY-010549] [Repl] The master's UUID has changed, although this should not happen unless you have changed it manually. The old UUID was 1ff8a919-1a39-11eb-a27a-080027090068.

Using ClusterControl

Now we willuse ClusterControl to repeat this automatic failover process. I have three nodes pxc (5.7) deployed by ClusterControl. I have an 8.0.22 replication slave under my PXC node2 and we will going to add this read replica using ClusterControl.

Step 1

Setup the passwordless SSH login from ClusterControl node to read replica node.

$ ssh-copy-id -i ~/.ssh/id_rsa 192.168.33.15

Step 2

Go to ClusterControl and click the drop down icon and select Add Replication slave option.

Asynchronous Replication Automatic Failover in MySQL 8.0.22 (2)

Step 3

Then choose the “Existing Replication Slave” option and enter the read replica IP then click “Add Replication Slave”.

Asynchronous Replication Automatic Failover in MySQL 8.0.22 (3)

Step 4

A job will be triggered and you can monitor the progress at ClusterControl > Logs > Jobs. Once the process is complete, the slave will show up in your Overview page as highlighted in the following screenshot.

Asynchronous Replication Automatic Failover in MySQL 8.0.22 (4)

Now you can check the current topology at ClusterControl > Topology

Asynchronous Replication Automatic Failover in MySQL 8.0.22 (5)

Replica Auto Failover Process

Now I’m going to do failover testing and added the below settings to this function (asynchronous_connection_failover_add_source) in my read replica.

 select asynchronous_connection_failover_add_source('prod_replica', '192.168.33.12', 3306, '', 100); select asynchronous_connection_failover_add_source('prod_replica', '192.168.33.13', 3306, '', 80); select asynchronous_connection_failover_add_source('prod_replica', '192.168.33.14', 3306, '', 60);mysql> select * from mysql.replication_asynchronous_connection_failover;+--------------+---------------+------+-------------------+--------+| Channel_name | Host | Port | Network_namespace | Weight |+--------------+---------------+------+-------------------+--------+| prod_replica | 192.168.33.12 | 3306 | | 100 || prod_replica | 192.168.33.13 | 3306 | | 80 || prod_replica | 192.168.33.14 | 3306 | | 60 |+--------------+---------------+------+-------------------+--------+3 rows in set (0.00 sec)mysql> select CONNECTION_RETRY_INTERVAL,CONNECTION_RETRY_COUNT,SOURCE_CONNECTION_AUTO_FAILOVER from performance_schema.replication_connection_configuration;+---------------------------+------------------------+---------------------------------+| CONNECTION_RETRY_INTERVAL | CONNECTION_RETRY_COUNT | SOURCE_CONNECTION_AUTO_FAILOVER |+---------------------------+------------------------+---------------------------------+| 6 | 3 | 1 |+---------------------------+------------------------+---------------------------------+1 row in set (0.00 sec)

I’m going to stop node 2 (192.168.33.13). In ClusterControl, the (enable_cluster_autorecovery) parameter is enabled so it will promote the next suitable master.

Asynchronous Replication Automatic Failover in MySQL 8.0.22 (6)

Now my current master is down, so the read replica is retrying to connect the master.

Replication Error From Read Replica

Last_IO_Error: error connecting to master '[emailprotected]:3306' - retry-time: 6 retries: 2 message: Can't connect to MySQL server on '192.168.33.13' (111)

Once the ClusterControl promotes the next suitable master, my read replica will connect to any one of the available cluster nodes.

Asynchronous Replication Automatic Failover in MySQL 8.0.22 (7)

The automatic failover process is completed and my read replica joined back to node 1 (192.168.33.13) server.

Conclusion

This is one of the great features in MySQL, there is no manual intervention needed. This auto failover process can save you some time. And it reduces the replica server outage. Worth noting, when my old master came back to rotation, the replication connection would not switch back to the old master.

Asynchronous Replication Automatic Failover in MySQL 8.0.22 (2024)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Neely Ledner

Last Updated:

Views: 5989

Rating: 4.1 / 5 (62 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Neely Ledner

Birthday: 1998-06-09

Address: 443 Barrows Terrace, New Jodyberg, CO 57462-5329

Phone: +2433516856029

Job: Central Legal Facilitator

Hobby: Backpacking, Jogging, Magic, Driving, Macrame, Embroidery, Foraging

Introduction: My name is Neely Ledner, I am a bright, determined, beautiful, adventurous, adventurous, spotless, calm person who loves writing and wants to share my knowledge and understanding with you.