The mirror software clusterHigh availability cluster with real time file replication and application failoverThe mirror software cluster is a primary-backup high-availability solution. The application runs on a primary server and is restarted automatically on a secondary server if the primary server fails.
The mirror cluster can be configured with or without file replication. With its file-replication function, this architecture is particularly suited to providing high availability for back-end applications with critical data to protect against failure. Microsoft SQL Server.Safe, MySQL.Safe and Oracle.Safe are examples of "mirror" type application modules. You can write your own mirror module for your application, based on the generic module Mirror.Safe. The mirror software cluster works as follows. Step 1. Normal operation
For replication, only names of file directories are configured in SafeKit. There are no pre-requisites on disk organization for the two servers. Directories to replicate may be located in the system disk. Server 1 (PRIM) runs the application. SafeKit replicates files opened by the application. Only changes made by the application in the files are replicated in real time across the network, thus limiting traffic. Step 2. Failover
When Server 1 fails, Server 2 takes over. SafeKit switches the cluster's virtual IP address and restarts the application automatically on Server 2. The application finds the files replicated by SafeKit uptodate on Server 2, thanks to the synchronous replication between Server 1 and Server 2. The application continues to run on Server 2 by locally modifying its files that are no longer replicated to Server 1. The switch-over time is equal to the fault-detection time (set to 30 seconds by default) plus the application start-up time. Unlike disk replication solutions, there is no delay for remounting file systems and running recovery procedures. Step 3. Failback and reintegration
Failback involves restarting Server 1 after fixing the problem that caused it to fail. SafeKit automatically resynchronizes the files, updating only the files modified on Server 2 while Server 1 was halted. This reintegration takes place without disturbing the applications, which can continue running on Server 2. This is a major feature that differentiates SafeKit from other solutions, which require you to stop the applications on Server 2 in order to resynchronize Server 1. Step 4. Return to normal operation
After reintegration, the files are once again in mirror mode, as in step 1. The system is back in high-availability mode, with the application running on Server 2 and SafeKit replicating file updates to the backup Server 1. If the administrator wishes the application to run on Server 1, he/she can execute a "swap" command either manually at an appropriate time, or automatically through configuration. Synchronous replication versus asynchronous replication
There is a significant difference between synchronous replication, as offered by the SafeKit mirror solution, and asynchronous replication traditionally offered by other file replication solutions. With synchronous replication, when a disk IO is performed by the application or by the file cache system on the primary server inside a replicated file, SafeKit waits for the IO acknowledgement from the local disk and from the secondary server, before sending the IO acknowledgement to the application or to the file system cache. This mechanism is essential for recovery of transactional applications. The bandwidth of a LAN between the servers is required to implement synchronous data replication, possibly with an extended LAN in two geographically remote computer rooms. With asynchronous replication implemented by other solutions, the IOs are placed in a queue on the primary server but the primary server does not wait for the IO acknowledgments of the secondary server. So, all data that did not have time to be copied across the network on the second server is lost if the first server fails. In particular, a transactional application loses committed data in case of failure. Asynchronous replication can be used for data replication through a low-speed WAN, to back up data remotely. SafeKit provides an asynchronous solution, implementing the asynchrony not on the primary machine but on the secondary one. In this solution, SafeKit always waits for the acknowledgement of the two machines before sending the acknowledgement to the application or the system cache. But on the secondary, there are 2 options asynchronous or synchronous. In the asynchronous case, the secondary sends the acknowledgement to the primary upon receipt of the IO and writes to disk after. In the synchronous case, the secondary writes the IO to disk and then sends the acknowledgement to the primary. The synchronous mode is required if we consider a simultaneous double power outage of two servers, with inability to restart the former primary server and requirement to re-start on the secondary. More information: |
|||