A SafeKit mirror cluster with file replication at byte level provides a simple high availability solution to critical database applications. The SafeKit software implementing a mirror cluster runs either on Windows, Linux or AIX. It implements synchronous real-time byte-level file replication. The resulting solution is working like a replicated mirror SAN but without the costs and the complexity of hardware failover cluster.
The mirror cluster is a primary-backup high availability solution. The application runs on a primary server and is restarted automatically on a secondary server if the primary server fails.
The software data replication is configured at the file level with the name of the file directories to replicate. The directory can contain database files or flat files. With synchronous byte-level file replication, this architecture is particularly suited to providing high availability for back-end applications with critical data to protect against failure.
SafeKit provides a generic mirror module to build a mirror cluster. You can write your own mirror module for your application. Microsoft SQL Server, MySQL, Oracle are examples of mirror modules. And from a mirror module, you can also replicate a full Virtual Machine with automatic failover inside an Hyper-V cluster. Note that this article explains the difference between VM HA vs Application HA.
The SafeKit mirror cluster with byte-level file replication works as follows.
This step corresponds to the previous figure. Server 1 (PRIM) runs the application. Users are connected to the virtual IP address of the mirror cluster. SafeKit replicates files opened by the application in real time. Only changes made by the application in the files are replicated across the network, thus limiting traffic (byte-level file replication).
With a software data replication at the file level, only names of file directories are configured in SafeKit. There are no pre-requisites on disk organization for the two servers. Directories to replicate may be located in the system disk.
When Server 1 fails, Server 2 takes over. SafeKit switches the cluster's virtual IP address and restarts the application automatically on Server 2. The application finds the files replicated by SafeKit uptodate on Server 2, thanks to the synchronous replication between Server 1 and Server 2. The application continues to run on Server 2 by locally modifying its files that are no longer replicated to Server 1.
The failover time is equal to the fault-detection time (set to 30 seconds by default) plus the application start-up time. Unlike disk replication solutions, there is no delay for remounting file system and running file system recovery procedures.
Failback involves restarting Server 1 after fixing the problem that caused it to fail. SafeKit automatically resynchronizes the files, updating only the files modified on Server 2 while Server 1 was halted.
This reintegration takes place without disturbing the applications, which can continue running on Server 2. The automatic failback is a major feature that differentiates SafeKit from other solutions, which require you to stop manually the applications on Server 2 in order to resynchronize Server 1.
After reintegration, the files are once again in mirror mode, as in step 1. The system is back in high-availability mode, with the application running on Server 2 and SafeKit replicating data file updates to the backup Server 1.
If the administrator wishes the application to run on Server 1, he/she can execute a "swap" command either manually at an appropriate time, or automatically through configuration.