Synchronous replication as implemented by the SafeKit software is essential for failover of transactional applications. With synchronous replication, all committed data on the disk of the first server are on the disk of the second server. With asynchronous replication, committed data on the disk of the first server can be lost in case of failure. There is also an alternative solution named semi-synchronous replication, with commited data on the second server but not necessary on disk.
To help you to take the right decision if you have to choose between synchronous replication vs asynchronous replication, we explain now the technical mechanisms and the impact on application failover.
Synchronous replication requires the bandwidth of a LAN between the servers, possibly with an extended LAN in two geographically remote computer rooms. Asynchronous replication can be implemented on a low speed WAN.
With synchronous file-based replication as implemented by the SafeKit high availability software, when a disk IO is performed by the application or by the file cache system on the primary server inside a replicated file, SafeKit waits for the IO acknowledgement from the local disk and from the secondary server, before sending the IO acknowledgement to the application or to the file system cache. This mechanism is essential for failover of transactional applications. Note that SafeKit makes byte-level file replication by replicating directories and not entire disks, which greatly simplifies the configuration of a cluster.
With asynchronous file-based replication implemented by most solutions, the IOs are placed in a queue on the primary server but the primary server does not wait for the IO acknowledgments of the secondary server. So, all data that did not have time to be copied across the network on the second server is lost if the first server fails. In particular, a transactional application loses committed data in case of failure.
With the semi-synchronous file-based replication as implemented by the SafeKit high availability software, the asynchrony is not made on the primary server but on the secondary one. In this solution, SafeKit always waits for the acknowledgement of the two servers before sending the acknowledgement to the application or the system cache. But on the secondary, there are 2 options asynchronous or synchronous.
In the semi-synchronous case, the secondary sends the acknowledgement to the primary upon receipt of the IO and writes to disk after. In the synchronous case, the secondary writes the IO to disk and then sends the acknowledgement to the primary.
But be careful, the synchronous mode on the secondary server is required if we consider a simultaneous double power outage of two servers, with inability to restart the former primary server and requirement to re-start on the secondary.
You see that just delaying write on the secondary server has a direct impact on critical application failover. So be very careful when choosing synchronous replication vs asynchronous replication. Always prefer a synchronous or a semi-synchronous replication for a critical application.
If you want to understand the consequences of a bad choice with a real-life crisis situation in an airport, we recommand this HA Guide (High Availability Guide).
And you can also check the following important points: