File replication at byte level and application failover in a mirror cluster

Critical database application high availability

A SafeKit mirror cluster with file replication at byte level provides a simple high availability solution to critical database applications. The SafeKit software implementing a mirror cluster runs either on Windows or Linux (even Windows editions for PCs). It implements synchronous real-time byte-level file replication. The resulting solution is working like a cluster connected to a replicated mirror SAN but without the costs and the complexity of hardware clustering solutions.

A mirror cluster: file replication at byte level and failover

The mirror cluster is a primary-backup high availability solution. The application runs on a primary server and is restarted automatically on a secondary server if the primary server fails. The software data replication is configured at the file level with the name of the file directories to replicate. The directory can contain database files or flat files. With synchronous byte-level file replication, this architecture is particularly suited to providing high availability for back-end applications with critical data to protect against failure. SafeKit provides a generic mirror module on Windows and Linux to build a mirror cluster as presented in the following video. You can write your own mirror module for your application. Microsoft SQL Server, MySQL, Oracle, PostgreSQL, Firebird are examples of mirror modules. And from a mirror module, you can also replicate a full Virtual Machine with automatic failover inside an Hyper-V cluster. Note that this article explains the difference between VM HA vs Application HA.

Example: Microsoft SQL Server 2012 cluster with replication and failover

If you want to implement this demonstration of a Microsoft SQL Server 2012 cluster with replication and failover, read the following article.

How the SafeKit mirror cluster works?

Step 1. File replication at byte level in a mirror cluster

File replication at byte level in a mirror cluster

Server 1 (PRIM) runs the application. Users are connected to the virtual IP address of the mirror cluster. SafeKit replicates files opened by the application in real time. Only changes made by the application in the files are replicated across the network, thus limiting traffic (byte-level file replication). With a software data replication at the file level, only names of file directories are configured in SafeKit. There are no pre-requisites on disk organization for the two servers. Directories to replicate may be located in the system disk. SafeKit implements synchronous replication with no data loss on failure contrary to asynchronous replication.

Step 2. Failover

Failover in a mirror cluster

When Server 1 fails, Server 2 takes over. SafeKit switches the cluster's virtual IP address and restarts the application automatically on Server 2. The application finds the files replicated by SafeKit uptodate on Server 2, thanks to the synchronous replication between Server 1 and Server 2. The application continues to run on Server 2 by locally modifying its files that are no longer replicated to Server 1. The failover time is equal to the fault-detection time (set to 30 seconds by default) plus the application start-up time. Unlike disk replication solutions, there is no delay for remounting file system and running file system recovery procedures.

Step 3. Failback and reintegration

Failback in a mirror cluster

Failback involves restarting Server 1 after fixing the problem that caused it to fail. SafeKit automatically resynchronizes the files, updating only the files modified on Server 2 while Server 1 was halted. This reintegration takes place without disturbing the applications, which can continue running on Server 2. The automatic failback is a major feature that differentiates SafeKit from other solutions, which require you to stop manually the applications on Server 2 in order to resynchronize Server 1.

In order to optimize file reintegration, different cases are considered:

  1. If SafeKit was cleanly stopped on server 1, then at its restart, only the modified zones of modified files are reintegrated, according to modification tracking bitmaps.
  2. If server 1 crashed (power off) or was incorrectly stopped (exception in the replication process), the modification bitmaps are not reliable, and are therefore discarded. All the files bearing a modification timestamp more recent than the last known synchronization point minus a graceful delay (typically one hour) are reintegrated.

Step 4. Return to byte-level file replication in the mirror cluster

Passive active mirror cluster with data replication

After reintegration, the files are once again in mirror mode, as in step 1. The system is back in high-availability mode, with the application running on Server 2 and SafeKit replicating data file updates to the backup Server 1. If the administrator wishes the application to run on Server 1, he/she can execute a "swap" command either manually at an appropriate time, or automatically through configuration.

Note that you can deploy several mirror modules on the same cluster and then implement an active-active cluster with crossed replication.


White Papers

Evidian SafeKit Pricing


To receive Evidian news, please fill the following form.