Amazon AWS: The Simplest Load Balancing Cluster with Failover on Windows and Linux
Evidian SafeKit
The solution in Amazon AWS
Evidian SafeKit brings load balancing and failover in Amazon AWS between two Windows or Linux redundant servers or more.
This article explains how to implement quickly a Amazon AWS cluster without specific skills.
A generic product
Note that SafeKit is a generic product on Windows and Linux.
You can implement with the same product real-time replication and failover of any file directory and service, database, complete Hyper-V or KVM virtual machines, Docker, Kubernetes , Cloud applications.
Architecture
How it works in Amazon AWS?
- The servers are running in different availability zones.
- The critical application is running in all servers of the farm.
- Users are connected to a virtual IP address which is configured in the Amazon AWS load balancer.
- SafeKit provides a generic health check for the load balancer.
When the farm module is stopped in a server, the health check returns NOK to the load balancer which stops the load balancing of requests to the server.
The same behavior happens when there is a hardware failure. - In each server, SafeKit monitors the critical application with process checkers and custom checkers.
- SafeKit restarts automatically the critical application in a server when there is a software failure thanks to restart scripts.
- A connector for the SafeKit web console is installed in each server.
Thus, the load balancing cluster can be managed in a very simple way to avoid human errors.
Partners, the success with SafeKit
This platform agnostic solution is ideal for a partner reselling a critical application and who wants to provide a redundancy and high availability option easy to deploy to many customers.
With many references in many countries won by partners, SafeKit has proven to be the easiest solution to implement for redundancy and high availability of building management, video management, access control, SCADA software...
Building Management Software (BMS)
Video Management Software (VMS)
Electronic Access Control Software (EACS)
SCADA Software (Industry)
Manual installation on existing VMs in Amazon AWS of a load balancing cluster with failover (Windows or Linux)
- Configuration of the Amazon AWS load balancer
- Configuration of the Amazon AWS network security
- Package installation on Windows
- Package installation on Linux
- Configuration of SafeKit
- Tests
Configuration of the Amazon AWS load balancer
The load balancer must be configured to periodically send health packets to virtual machines. For that, SafeKit provides a health check which runs inside the virtual machines and which
- returns OK when the farm module state is UP (green)
- returns NOT FOUND in all other states
You must configure the Amazon AWS load balancer with:
- HTTP protocol
- port 9010, the SafeKit web server port
- URL /var/modules/farm/ready.txt (if farm is the module name that you will deploy later)
For more information, see the configuration of the Amazon AWS load balancer.
Configuration of the Amazon AWS network security
The network security must be configured to enable communications for the following protocols and ports:
- UDP - 4800 for the safeadmin service (between SafeKit nodes)
- TCP – 9010 for the load-balancer health check and for the SafeKit web console running the http mode
- TCP – 9001 to configure the https mode for the console
- TCP – 9453 for the SafeKit web console running in the https mode
Package installation on Windows
On both Windows servers
- Install the free version of SafeKit for Cloud (click here) on 2 Windows nodes
- The module farm.safe is delivered inside the package.
- To open firewall, start a command line as administrator, goto C:\safekit\private\bin and type .\firewallcfg.cmd add on both nodes
Package installation on Linux
On both Linux servers
- Install the free version of SafeKit for Cloud (click here) on 2 Linux nodes
- After the download of safekit_xx.bin package, execute it to extract the rpm and the safekitinstall script and then execute the safekitinstall script
- Answer yes to firewall automatic configuration
- The module farm.safe is delivered inside the package.
Configuration of SafeKit
The configuration is presented with the web console connected to 2 Windows servers but it is the same thing with 2 Linux servers.
Important: all the configuration must be done from a single browser.
It is recommended to configure the web console in the https mode by connecting to https://<IP address of 1 VM>:9453 (next image). In this case, you must configure before the https mode by using the wizard described in the User's Guide: see "11.1 HTTPS Quick Configuration with the Configuration Wizard".
Or you can use the web console in the http mode by connecting to http://<IP address of 1 VM>:9010 (next image).
Note that you can also make a configuration with DNS names, especially if the IP addresses are not static.
Enter IP address of the first node and click on Confirm (next image)
Click on New node and enter IP address of the second node (next image)
Click on the red floppy disk to save the configuration (previous image)
In the Configuration tab, click on farm.safe then enter farm as the module name and Confirm (next images with farm instead of xxx)
Click on Validate (next image)
Do not configure a virtual IP address (next image) because this configuration is already made in the Amazon AWS load balancer. This section is useful for on-premise configuration only.
If a process is defined in the Process Checker section (next image), it will be monitored with the action restart in case of failure. The services will be stopped an restarted locally on the local server if this process disappears from the list of running processes. After 3 unsuccessful local restarts, the module is stopped on the local server. As a consequence, the health check answers NOT FOUND to the Amazon AWS load balancer and the load balancing is reconfigured to load balance the traffic on the remaining servers of the farm.
start_both and stop_both (next image) contain the start and the stop of services.
Click on Validate (previous image)
Click on Configure (previous image)
Check the success green message on both servers and click on Next (previous image)
Start the cluster on both nodes (previous image). Check that the status becomes UP (green) - UP (green) (next image).
The cluster is operational with services running on both UP nodes (previous image).
Be careful, components which are clients of the services must be configured with the virtual IP address. The configuration can be made with a DNS name (if a DNS name has been created and associated with the virtual IP address).
Tests
Check with Windows Microsoft Management Console (MMC) or with Linux command lines that the services are started on both UP nodes. Put services with Boot Startup Type = Manual (SafeKit controls start of services).
Stop one UP node by scrolling down the menu of the node and by clicking on Stop. Check that the load balancing is reconfigured with only the other node taking all TCP connections. And check that the services are stopped on the STOP node with Windows Microsoft Management Console (MMC) or with Linux command lines.
More information on tests in the User's Guide
Automatic start of the module at boot
Configure boot start (next image on the right side) configures the automatic boot of the module when the server boots. Do this configuration on both nodes once the load balancing and failover solution is correctly running.
Note that for synchronizing SafeKit at boot and at shutdown on Windows, we assume that a command line has been run on both nodes during installation as administrator: .\addStartupShutdown.cmd in C:\safekit\private\bin (otherwise dot it now).
For reading the SafeKit logs, go to the Troubleshooting tab
For editing userconfig.xml, start_both and stop_both, go to the Advanced Configuration tab
Troubleshooting with the SafeKit module and application logs
Module log
Read the module log to understand the reasons of a failover, of a waiting state on the availability of a resource etc...
To see the module log of node 1 (next image):
- click on the Control tab
- click on node 1/UP (it becomes blue) on the left side
- click on Module Log
- click on the Refresh icon (green arrows) to update the console
- click on the floppy disk to save the module log in a .txt file and to analyze in a text editor
Repeat the same operation to see the module log of node 2.
Application log
Read the application log to see the output messages of the stat_both and stop_both restart scripts.
To see the application log of node 1 (next image):
- click on the Control tab
- click on node 1/UP (it becomes blue) on the left side to select the server
- click on Application Log to see messages when starting and stopping services
- click on the Refresh icon (green arrows) to update the console
- click on the floppy disk to save the application log in a .txt file and to analyze in a text editor
Repeat the same operation to see the application log of node 2.
More information on troubleshooting in the User's Guide
For support, open the Support section
Advanced configuration of SafeKit for implementing the load balancing cluster with failover
Advanced configuration
In Advanced Configuration tab (next image), you can edit internal files of the module: bin/start_both and bin/stop_both and conf/userconfig.xml (next image on the left side). If you make change in the internal files here, you must apply the new configuration by a right click on the blue icon/xxx on the left side (next image): the interface will allow you to redeploy the modified files on both servers.
More information on userconfig.xml in the User's Guide
For an example of userconfig.xml, start_both and stop_both, open the Internals section below
Support of SafeKit
Support
For getting support on the call desk of https://support.evidian.com, get 2 Snaphots (2 .zip files), one for each server and upload them in the call desk tool (next image).
Internal files of a SafeKit / Microsoft Azure load balancing cluster with failover
Go to the Advanced Configuration tab, for editing these filesInternal files of the Windows farm.safe module
userconfig.xml on Windows (description in the User's Guide)<!DOCTYPE safe>
<safe>
<service mode="farm" maxloop="3" loop_interval="24">
<!-- Farm topology configuration -->
<!-- Names or IP addresses on the default network are set during initialization in the console -->
<farm>
<lan name="default" />
</farm>
<!-- Software Error Detection Configuration -->
<!-- Replace
* PROCESS_NAME by the name of the process to monitor
-->
<errd polltimer="10">
<proc name="PROCESS_NAME" atleast="1" action="restart" class="both" />
</errd>
<!-- User scripts activation -->
<user nicestoptimeout="300" forcestoptimeout="300" logging="userlog" />
</service>
</safe>
start_both.cmd on Windows
@echo off
rem Script called on all servers for starting applications
rem For logging into SafeKit log use:
rem "%SAFE%\safekit" printi | printe "message"
rem stdout goes into Application log
echo "Running start_both %*"
set res=0
rem Fill with your services start call
set res=%errorlevel%
if %res% == 0 goto end
:stop
set res=%errorlevel%
"%SAFE%\safekit" printe "start_both failed"
rem uncomment to stop SafeKit when critical
rem "%SAFE%\safekit" stop -i "start_both"
:end
stop_both.cmd on Windows
@echo off
rem Script called on all servers for stopping application
rem For logging into SafeKit log use:
rem "%SAFE%\safekit" printi | printe "message"
rem ----------------------------------------------------------
rem
rem 2 stop modes:
rem
rem - graceful stop
rem call standard application stop with net stop
rem
rem - force stop (%1=force)
rem kill application's processes
rem
rem ----------------------------------------------------------
rem stdout goes into Application log
echo "Running stop_both %*"
set res=0
rem default: no action on forcestop
if "%1" == "force" goto end
rem Fill with your services stop call
rem If necessary, uncomment to wait for the real stop of services
rem "%SAFEBIN%\sleep" 10
if %res% == 0 goto end
"%SAFE%\safekit" printe "stop_both failed"
:end
Internal files of the Linux farm.safe module
userconfig.xml on Linux (description in the User's Guide)<!DOCTYPE safe>
<safe>
<service mode="farm" maxloop="3" loop_interval="24">
<!-- Farm topology configuration for the membership protocol -->
<!-- Names or IP addresses on the default network are set during initialization in the console -->
<farm>
<lan name="default" />
</farm>
<!-- Software Error Detection Configuration -->
<!-- Replace
* PROCESS_NAME by the name of the process to monitor
-->
<errd polltimer="10">
<proc name="PROCESS_NAME" atleast="1" action="restart" class="both" />
</errd>
<!-- User scripts activation -->
<user nicestoptimeout="300" forcestoptimeout="300" logging="userlog" />
</service>
</safe>
start_both on Linux
#!/bin/sh
# Script called on the primary server for starting application
# For logging into SafeKit log use:
# $SAFE/safekit printi | printe "message"
# stdout goes into Application log
echo "Running start_both $*"
res=0
# Fill with your application start call
if [ $res -ne 0 ] ; then
$SAFE/safekit printe "start_both failed"
# uncomment to stop SafeKit when critical
# $SAFE/safekit stop -i "start_both"
fi
stop_both on Linux
#!/bin/sh
# Script called on the primary server for stopping application
# For logging into SafeKit log use:
# $SAFE/safekit printi | printe "message"
#----------------------------------------------------------
#
# 2 stop modes:
#
# - graceful stop
# call standard application stop
#
# - force stop ($1=force)
# kill application's processes
#
#----------------------------------------------------------
# stdout goes into Application log
echo "Running stop_both $*"
res=0
# default: no action on forcestop
[ "$1" = "force" ] && exit 0
# Fill with your application stop call
[ $res -ne 0 ] && $SAFE/safekit printe "stop_both failed"
Discover SafeKit in Google GCP
Network load balancing and failover |
|
Windows farm | Linux farm |
Generic Windows farm > | Generic Linux farm > |
Microsoft IIS > | - |
NGINX > | |
Apache > | |
Amazon AWS farm > | |
Microsoft Azure farm > | |
Google GCP farm > | |
Other cloud > |
Advanced clustering architectures
Several modules can be deployed on the same cluster. Thus, advanced clustering architectures can be implemented:
- the farm+mirror cluster built by deploying a farm module and a mirror module on the same cluster,
- the active/active cluster with replication built by deploying several mirror modules on 2 servers,
- the Hyper-V cluster or KVM cluster with real-time replication and failover of full virtual machines between 2 active hypervisors,
- the N-1 cluster built by deploying N mirror modules on N+1 servers.
Evidian SafeKit mirror cluster with real-time file replication and failover |
|
3 products in 1 More info > |
|
Very simple configuration More info > |
|
Synchronous replication More info > |
|
Fully automated failback More info > |
|
Replication of any type of data More info > |
|
File replication vs disk replication More info > |
|
File replication vs shared disk More info > |
|
Remote sites and virtual IP address More info > |
|
Quorum and split brain More info > |
|
Active/active cluster More info > |
|
Uniform high availability solution More info > |
|
RTO / RPO More info > |
|
Evidian SafeKit farm cluster with load balancing and failover |
|
No load balancer or dedicated proxy servers or special multicast Ethernet address |
|
All clustering features |
|
Remote sites and virtual IP address |
|
Uniform high availability solution |
|
Software clustering vs hardware clustering
|
|
|
|
Shared nothing vs a shared disk cluster |
|
|
|
Application High Availability vs Full Virtual Machine High Availability
|
|
|
|
High availability vs fault tolerance
|
|
|
|
Synchronous replication vs asynchronous replication
|
|
|
|
Byte-level file replication vs block-level disk replication
|
|
|
|
Heartbeat, failover and quorum to avoid 2 master nodes
|
|
|
|
Virtual IP address primary/secondary, network load balancing, failover
|
|
|
|
User's Guide
Application Modules
Release Notes
Presales documentation
Introduction
-
- Features
- Architectures
- Distinctive advantages
-
- Hardware vs software cluster
- Synchronous vs asynchronous replication
- File vs disk replication
- High availability vs fault tolerance
- Hardware vs software load balancing
- Virtual machine vs application HA
Installation, Console, CLI
- Install and setup / pptx
- Package installation
- Nodes setup
- Cluster configuration
- Upgrade
- Web console / pptx
- Cluster configuration
- Configuration tab
- Control tab
- Monitor tab
- Advanced Configuration tab
- Command line / pptx
- Silent installation
- Cluster administration
- Module administration
- Command line interface
Advanced configuration
- Mirror module / pptx
- userconfig.xml + restart scripts
- Heartbeat (<hearbeat>)
- Virtual IP address (<vip>)
- Real-time file replication (<rfs>)
- Farm module / pptx
- userconfig.xml + restart scripts
- Farm configuration (<farm>)
- Virtual IP address (<vip>)
- Checkers / pptx
- Failover machine (<failover>)
- Process monitoring (<errd>)
- Network and duplicate IP checkers
- Custom checker (<custom>)
- Split brain checker (<splitbrain>)
- TCP, ping, module checkers
Support
- Support tools / pptx
- Analyze snapshots
- Evidian support / pptx
- Get permanent license key
- Register on support.evidian.com
- Call desk