What is RPO and RTO with examples?
Evidian SafeKit
What is RPO and RTO with examples of high availability and backup solutions?
Overview
This article explores RPO (Recovery Point Objective) and RTO (Recovery Time Objective) with examples of high availability and backup solutions.
High availability and backup solutions are complementary. The first is for automatic failover in the event of a failure and the second is for data recovery in the event of a disaster such as ransomware encrypting all data.
The article explains in detail the RTO and RPO of SafeKit, a high availability software product.
What is RPO?
RPO (Recovery Point Objective) reflects the data loss in the event of a failure.
If you are looking for a high availability cluster with automatic failover, then the RPO should be 0. The application is thus restarted without data loss. Either you can choose a hardware high availability cluster with shared disk. Or you can choose a software high availability cluster with synchronous real-time replication to have 0 data loss.
If you are implementing backup solutions, then the RPO is greater than 0 and the recovery is not automatic. Administrators decide how often to replicate and how many backups to keep.
What is RTO?
RTO (Recovery Time Objective) is the time during which an application is unavailable in the event of a failure.
For a critical application, RTO should be minimal. For this, a high availability solution is necessary with automatic restart of the application in the event of hardware or software failures. RTO is then approximatively one minute: the detection time plus the automatic restart time of the application.
With a backup solution, RTO is generally greater than several hours. Administrators will first attempt to repair the hardware and restart the application on up-to-date data. Restarting from a backup is the last decision when previous actions don't work, because it leads to data loss.
RTO with the example of a SafeKit mirror cluster
The SafeKit mirror cluster is a software high availability cluster with synchronous real-time data replication and automatic application failover.
RTO of the SafeKit mirror cluster is in the order of 1 mn and can be decreased if you configure the heartbeat timeout.
For a hardware failure, RTO = heartbeat timeout (default 30 s) + time to restart the application.
For a software failure or an administrator restart, RTO = time to stop the application + time to restart it.
With solutions that reboot a full virtual machine in case of failure, the RTO includes the reboot time of the virtual machine.
RTO with the example of a SafeKit farm cluster
The SafeKit farm cluster is a software high availability cluster with network load balancing and automatic failover.
RTO of a SafeKit farm cluster is in the order of a few seconds.
For a hardware failure, RTO = failure detection timeout through monitoring channels (default a few seconds). After the timeout the load balancing filters are reconfigured.
For a software failure or an administrator restart, RTO = time to stop the application + time to restart it.
RPO with the example of a SafeKit mirror cluster
RPO of the SafeKit mirror cluster is 0 as the replication is synchronous and real-time.
Be careful, with asynchronous replication, RPO is not 0 and there is data loss in case of failure when the application restarts on the secondary server.
RPO with the example of a SafeKit farm cluster
N/R. A farm cluster does replicate any data.
Partners, the success with SafeKit
This platform agnostic solution is ideal for a partner reselling a critical application and who wants to provide a redundancy and high availability option easy to deploy to many customers.
With many references in many countries won by partners, SafeKit has proven to be the easiest solution to implement for redundancy and high availability of building management, video management, access control, SCADA software...
Building Management Software (BMS)
Video Management Software (VMS)
Electronic Access Control Software (EACS)
SCADA Software (Industry)
What are the advantages of a mirror cluster?
- Low Complexity
- Plug&Play deployment with no specific skills
- Suitable for large deployments in many sites (very simple to deploy)
- 2 physical or virtual nodes
- No shared storage requirement
- No Domain Controller requirement
- Same solution on Windows and Linux
- Support Windows Server and Client OS editions
- Well documented API and support
- Synchronous data replication (no data loss in case of failure)
- Replicated directories can be in the system disk
- Supports multiple heartbeats and vitual IP addresses
- Offers configurable software, hardware and network checkers
- For the split brain problem and the quorum, does not require a special disk or a third machine or a dedicated link between both servers
- Automatic failover of application with a recovery time in the order of one minute
- Automatic failback when a server comes back after a failure (no manual operation)
- A very simple console to deploy the solution and to maintain it afterwards for end-customer
- Supports hardware and environment failures (20% of causes of unavailability), including the complete failure of a computer room with 2 nodes in two remote sites
- Supports software failures (40% of causes of unavailability): software bug, regression on software update (N and N+1 versions can coexist)
- Supports human errors (40% of causes of unavailability) : the simplicity of use avoids the administration error of the critical application
What are the advantages of a farm cluster
- Low Complexity
- Plug&Play deployment with no specific skills
- Suitable for large deployments in many sites (very simple to deploy)
- 2 physical or virtual nodes or more
- No network load balancers requirement
- No proxy server requirement (above the farm cluster)
- No Domain Controller requirement
- No restriction in VMware due to multicast or unicast address
- Same solution on Windows and Linux
- Support Windows Server and Client OS editions
- Well documented API and support
- Supports multiple monitoring channels on multiple networks for server failure detection
- Supports multiple vitual IP addresses
- Offers configurable software, hardware and network checkers
- Offers the mirror cluster with synchronous real-time replication and failover to implement a farm+mirror 3-tiers architecture
- Automatic failover with a recovery time in the order of a few seconds
- Automatic failback when a server comes back after a failure (no manual operation)
- A very simple console to deploy the solution and to maintain it afterwards for end-customer
- Supports hardware and environment failures (20% of causes of unavailability), including the complete failure of a computer room with 2 nodes in two remote sites
- Supports software failures (40% of causes of unavailability): software bug, regression on software update (N and N+1 versions can coexist)
- Supports human errors (40% of causes of unavailability): the simplicity of use avoids the administration error of the critical application
Network load balancing and failover |
|
Windows farm | Linux farm |
Generic Windows farm > | Generic Linux farm > |
Microsoft IIS > | - |
NGINX > | |
Apache > | |
Amazon AWS farm > | |
Microsoft Azure farm > | |
Google GCP farm > | |
Other cloud > |
Advanced clustering architectures
Several modules can be deployed on the same cluster. Thus, advanced clustering architectures can be implemented:
- the farm+mirror cluster built by deploying a farm module and a mirror module on the same cluster,
- the active/active cluster with replication built by deploying several mirror modules on 2 servers,
- the Hyper-V cluster or KVM cluster with real-time replication and failover of full virtual machines between 2 active hypervisors,
- the N-1 cluster built by deploying N mirror modules on N+1 servers.
Evidian SafeKit mirror cluster with real-time file replication and failover |
|
3 products in 1 More info > |
|
Very simple configuration More info > |
|
Synchronous replication More info > |
|
Fully automated failback More info > |
|
Replication of any type of data More info > |
|
File replication vs disk replication More info > |
|
File replication vs shared disk More info > |
|
Remote sites and virtual IP address More info > |
|
Quorum and split brain More info > |
|
Active/active cluster More info > |
|
Uniform high availability solution More info > |
|
RTO / RPO More info > |
|
Evidian SafeKit farm cluster with load balancing and failover |
|
No load balancer or dedicated proxy servers or special multicast Ethernet address |
|
All clustering features |
|
Remote sites and virtual IP address |
|
Uniform high availability solution |
|
Software clustering vs hardware clustering
|
|
|
|
Shared nothing vs a shared disk cluster |
|
|
|
Application High Availability vs Full Virtual Machine High Availability
|
|
|
|
High availability vs fault tolerance
|
|
|
|
Synchronous replication vs asynchronous replication
|
|
|
|
Byte-level file replication vs block-level disk replication
|
|
|
|
Heartbeat, failover and quorum to avoid 2 master nodes
|
|
|
|
Virtual IP address primary/secondary, network load balancing, failover
|
|
|
|
User's Guide
Application Modules
Release Notes
Presales documentation
Introduction
-
- Features
- Architectures
- Distinctive advantages
-
- Hardware vs software cluster
- Synchronous vs asynchronous replication
- File vs disk replication
- High availability vs fault tolerance
- Hardware vs software load balancing
- Virtual machine vs application HA
Installation, Console, CLI
- Install and setup / pptx
- Package installation
- Nodes setup
- Cluster configuration
- Upgrade
- Web console / pptx
- Cluster configuration
- Configuration tab
- Control tab
- Monitor tab
- Advanced Configuration tab
- Command line / pptx
- Silent installation
- Cluster administration
- Module administration
- Command line interface
Advanced configuration
- Mirror module / pptx
- userconfig.xml + restart scripts
- Heartbeat (<hearbeat>)
- Virtual IP address (<vip>)
- Real-time file replication (<rfs>)
- Farm module / pptx
- userconfig.xml + restart scripts
- Farm configuration (<farm>)
- Virtual IP address (<vip>)
- Checkers / pptx
- Failover machine (<failover>)
- Process monitoring (<errd>)
- Network and duplicate IP checkers
- Custom checker (<custom>)
- Split brain checker (<splitbrain>)
- TCP, ping, module checkers
Support
- Support tools / pptx
- Analyze snapshots
- Evidian support / pptx
- Get permanent license key
- Register on support.evidian.com
- Call desk