Microsoft Azure: The Simplest Load Balancing Cluster with Failover on Windows and Linux
Evidian SafeKit
The solution in Microsoft Azure
Evidian SafeKit brings load balancing and failover in Microsoft Azure between two Windows or Linux redundant servers or more.
This article explains how to implement quickly a Microsoft Azure cluster without specific skills.
A generic product
Note that SafeKit is a generic product on Windows and Linux.
You can implement with the same product real-time replication and failover of any file directory and service, database, complete Hyper-V or KVM virtual machines, Docker, Kubernetes , Cloud applications.
Architecture
How it works in Microsoft Azure?
- The servers are running in different availability zones.
- The critical application is running in all servers of the farm.
- Users are connected to a virtual IP address which is configured in the Microsoft Azure load balancer.
- SafeKit provides a generic health check for the load balancer.
When the farm module is stopped in a server, the health check returns NOK to the load balancer which stops the load balancing of requests to the server.
The same behavior happens when there is a hardware failure. - In each server, SafeKit monitors the critical application with process checkers and custom checkers.
- SafeKit restarts automatically the critical application in a server when there is a software failure thanks to restart scripts.
- A connector for the SafeKit web console is installed in each server.
Thus, the load balancing cluster can be managed in a very simple way to avoid human errors.
Partners, the success with SafeKit
This platform agnostic solution is ideal for a partner reselling a critical application and who wants to provide a redundancy and high availability option easy to deploy to many customers.
With many references in many countries won by partners, SafeKit has proven to be the easiest solution to implement for redundancy and high availability of building management, video management, access control, SCADA software...
Building Management Software (BMS)
Video Management Software (VMS)
Electronic Access Control Software (EACS)
SCADA Software (Industry)
Azure template for a quick deployment of a farm cluster in Microsoft Azure on Windows or Linux
The Evidian SafeKit farm cluster has been validated by Microsoft Azure in quickstart templates.
To deploy the Evidian SafeKit load balancing cluster with failover in Microsoft Azure, just click on the following button which deploys everything:
Deploy to Microsoft Azure a Farm Cluster (Windows or Linux) >
Go to the Template Guide tab for more information
Automatic installation in Microsoft Azure of a load balancing cluster with failover (Windows or Linux)
- Automatic deployment
- Configure deployment
- After deployment
- Video of the deployment
- Access VMs in Microsoft Azure
- Deployed resources
Automatic deployment of the Microsoft Azure template for a farm cluster
To deploy the Evidian SafeKit load balancing cluster with failover in Microsoft Azure, just click on the following button which deploys everything:
Deploy to Azure a Farm Cluster (Windows or Linux) >
Configure the Microsoft Azure template for a farm cluster
After the click:
- in "Resource group", click on "Create new" and set a name
- choose the geographical "Location" where the cluster will be deployed
- choose the "OS" Windows or Linux
- choose the number of "Cluster Nodes"
- choose an "Admin User" name (not Administrator, not root)
- choose an "Admin Password". Passwords must be between 12 and 72 characters and have 3 of the following: 1 lower case, 1 upper case, 1 number, and 1 special character.
- click on "I agree..." and then on "Purchase" (no fee on SafeKit free trial, only on Microsoft Azure infrastructure)
- wait the end of deployment of the load balancing and failover cluster
After deployment
After deployment, click on 'Microsoft.Template' (previous image), then go to the output panel and
- visit the credential URL to install the client and CA certificates in your web browser. Force the load of the unsafe page. Put as user 'CA_admin' and the password you enter during the template configuration. Be careful, put the second certificate in the 'Trusted Root Certification Authority' store
- after certificates installation, start the web console of the cluster
- test the load balanced virtual IP address with the test URL in the output. A load balancing rule has been set for external port 9453, internal port 9453. A mosaic of server names is displayed according the server answering to the TCP session
Video of the Microsoft Azure farm template deployment
Accessing the VMs through SSH (Linux) or remote desktop (Windows)
If you want to connect to Virtual Machines through SSH (Linux) or remote desktop (Windows), you can use the SafeKit web console to know IP addresses or DNS names of VMs (next images). Use the user/password entered during the template configuration for accessing the VMs.
Deployed resources
In term of VMs, this template deploys:
- from 2 to 4 VMs (Windows or Linux)
- each VM has a public IP address
- the SafeKit free trial is installed in all VMs
- a SafeKit farm module is configured in all VMs
In term of load balancer, this template deploys:
- a public load balancer
- a public IP is associated with the public load balancer and plays the role of the virtual IP
- alls VMs are in the backend pool of the load balancer
- a health probe checks the farm module state on all VMs
- a load balancing rule for external port 9453 / internal port 9453 is set to test the load balanced virtual IP
For a manual installation, go to the Manual Installation tab
Manual installation on existing VMs in Microsoft Azure of a load balancing cluster with failover (Windows or Linux)
- Configuration of the Microsoft Azure load balancer
- Configuration of the Microsoft Azure network security
- Package installation on Windows
- Package installation on Linux
- Configuration of SafeKit
- Tests
Configuration of the Microsoft Azure load balancer
The load balancer must be configured to periodically send health packets to virtual machines. For that, SafeKit provides a health probe which runs inside the virtual machines and which
- returns OK when the farm module state is UP (green)
- returns NOT FOUND in all other states
You must configure the Microsoft Azure load balancer with:
- HTTP protocol
- port 9010, the SafeKit web server port
- URL /var/modules/farm/ready.txt (if farm is the module name that you will deploy later)
For more information, see the configuration of the Microsoft Azure load balancer.
Configuration of the Microsoft Azure network security
The network security must be configured to enable communications for the following protocols and ports:
- UDP - 4800 for the safeadmin service (between SafeKit nodes)
- TCP – 9010 for the load-balancer health probe and for the SafeKit web console running the http mode
- TCP – 9001 to configure the https mode for the console
- TCP – 9453 for the SafeKit web console running in the https mode
Package installation on Windows
On both Windows servers
- Install the free version of SafeKit (click here) on 2 Windows nodes
- The module farm.safe is delivered inside the package.
- To open the Windows firewall, on both nodes start a powershell as administrator, and type c:/safekit/private/bin/firewallcfg add
- To initialize the password for the default admin user of the web console, on both nodes start a powershell as administrator, and type c:/safekit/private/bin/webservercfg.ps1 -passwd pwd (same pwd on both nodes)
Package installation on Linux
On both Linux servers
- Install the free version of SafeKit (click here) on 2 Linux nodes
- After the download of safekit_xx.bin package, execute it to extract the rpm and the safekitinstall script and then execute the safekitinstall script
- Answer yes to firewall automatic configuration
- Set the password for the web console and the default user admin (same password on both nodes)
- The module farm.safe is delivered inside the package.
Configuration of SafeKit
The configuration is presented with the web console connected to 2 Windows servers but it is the same thing with 2 Linux servers.
Important: all the configuration must be done from a single browser.
It is recommended to configure the web console in the https mode by connecting to https://<IP address of 1 VM>:9453 (next image). In this case, you must configure before the https mode by using the wizard described in the User's Guide: see "11.1 HTTPS Quick Configuration with the Configuration Wizard".
Or you can use the web console in the http mode by connecting to http://<IP address of 1 VM>:9010 (next image).
Note that you can also make a configuration with DNS names, especially if the IP addresses are not static.
Enter IP address of the first node and click on Confirm (next image)
Click on New node and enter IP address of the second node (next image)
Click on the red floppy disk to save the configuration (previous image)
In the Configuration tab, click on farm.safe then enter farm as the module name and Confirm (next images with farm instead of xxx)
Click on Validate (next image)
Do not configure a virtual IP address (next image) because this configuration is already made in the Microsoft Azure load balancer. This section is useful for on-premise configuration only.
If a process is defined in the Process Checker section (next image), it will be monitored with the action restart in case of failure. The services will be stopped an restarted locally on the local server if this process disappears from the list of running processes. After 3 unsuccessful local restarts, the module is stopped on the local server. As a consequence, the health probe answers NOT FOUND to the Microsoft Azure load balancer and the load balancing is reconfigured to load balance the traffic on the remaining servers of the farm.
start_both and stop_both (next image) contain the start and the stop of services.
Click on Validate (previous image)
Click on Configure (previous image)
Check the success green message on both servers and click on Next (previous image)
Start the cluster on both nodes (previous image). Check that the status becomes UP (green) - UP (green) (next image).
The cluster is operational with services running on both UP nodes (previous image).
Be careful, components which are clients of the services must be configured with the virtual IP address. The configuration can be made with a DNS name (if a DNS name has been created and associated with the virtual IP address).
Tests
Check with Windows Microsoft Management Console (MMC) or with Linux command lines that the services are started on both UP nodes. Put services with Boot Startup Type = Manual (SafeKit controls start of services).
Stop one UP node by scrolling down the menu of the node and by clicking on Stop. Check that the load balancing is reconfigured with only the other node taking all TCP connections. And check that the services are stopped on the STOP node with Windows Microsoft Management Console (MMC) or with Linux command lines.
More information on tests in the User's Guide
Automatic start of the module at boot
Configure boot start (next image on the right side) configures the automatic boot of the module when the server boots. Do this configuration on both nodes once the load balancing and failover solution is correctly running.
Note that for synchronizing SafeKit at boot and at shutdown on Windows, we assume that a command line has been run on both nodes during installation as administrator: .\addStartupShutdown.cmd in C:\safekit\private\bin (otherwise dot it now).
For reading the SafeKit logs, go to the Troubleshooting tab
For editing userconfig.xml, start_both and stop_both, go to the Advanced Configuration tab
Troubleshooting with the SafeKit module and application logs
Module log
Read the module log to understand the reasons of a failover, of a waiting state on the availability of a resource etc...
To see the module log of node 1 (next image):
- click on the Control tab
- click on node 1/UP (it becomes blue) on the left side
- click on Module Log
- click on the Refresh icon (green arrows) to update the console
- click on the floppy disk to save the module log in a .txt file and to analyze in a text editor
Repeat the same operation to see the module log of node 2.
Application log
Read the application log to see the output messages of the stat_both and stop_both restart scripts.
To see the application log of node 1 (next image):
- click on the Control tab
- click on node 1/UP (it becomes blue) on the left side to select the server
- click on Application Log to see messages when starting and stopping services
- click on the Refresh icon (green arrows) to update the console
- click on the floppy disk to save the application log in a .txt file and to analyze in a text editor
Repeat the same operation to see the application log of node 2.
More information on troubleshooting in the User's Guide
For support, open the Support section
Advanced configuration of SafeKit for implementing the load balancing cluster with failover
Advanced configuration
In Advanced Configuration tab (next image), you can edit internal files of the module: bin/start_both and bin/stop_both and conf/userconfig.xml (next image on the left side). If you make change in the internal files here, you must apply the new configuration by a right click on the blue icon/xxx on the left side (next image): the interface will allow you to redeploy the modified files on both servers.
More information on userconfig.xml in the User's Guide
For an example of userconfig.xml, start_both and stop_both, open the Internals section below
Support of SafeKit
Support
For getting support on the call desk of https://support.evidian.com, get 2 Snaphots (2 .zip files), one for each server and upload them in the call desk tool (next image).
Internal files of a SafeKit / Microsoft Azure load balancing cluster with failover
Go to the Advanced Configuration tab, for editing these filesInternal files of the Windows farm.safe module
userconfig.xml on Windows (description in the User's Guide)<!DOCTYPE safe>
<safe>
<service mode="farm" maxloop="3" loop_interval="24">
<!-- Farm topology configuration -->
<!-- Names or IP addresses on the default network are set during initialization in the console -->
<farm>
<lan name="default" />
</farm>
<!-- Software Error Detection Configuration -->
<!-- Replace
* PROCESS_NAME by the name of the process to monitor
-->
<errd polltimer="10">
<proc name="PROCESS_NAME" atleast="1" action="restart" class="both" />
</errd>
<!-- User scripts activation -->
<user nicestoptimeout="300" forcestoptimeout="300" logging="userlog" />
</service>
</safe>
start_both.cmd on Windows
@echo off
rem Script called on all servers for starting applications
rem For logging into SafeKit log use:
rem "%SAFE%\safekit" printi | printe "message"
rem stdout goes into Application log
echo "Running start_both %*"
set res=0
rem Fill with your services start call
set res=%errorlevel%
if %res% == 0 goto end
:stop
set res=%errorlevel%
"%SAFE%\safekit" printe "start_both failed"
rem uncomment to stop SafeKit when critical
rem "%SAFE%\safekit" stop -i "start_both"
:end
stop_both.cmd on Windows
@echo off
rem Script called on all servers for stopping application
rem For logging into SafeKit log use:
rem "%SAFE%\safekit" printi | printe "message"
rem ----------------------------------------------------------
rem
rem 2 stop modes:
rem
rem - graceful stop
rem call standard application stop with net stop
rem
rem - force stop (%1=force)
rem kill application's processes
rem
rem ----------------------------------------------------------
rem stdout goes into Application log
echo "Running stop_both %*"
set res=0
rem default: no action on forcestop
if "%1" == "force" goto end
rem Fill with your services stop call
rem If necessary, uncomment to wait for the real stop of services
rem "%SAFEBIN%\sleep" 10
if %res% == 0 goto end
"%SAFE%\safekit" printe "stop_both failed"
:end
Internal files of the Linux farm.safe module
userconfig.xml on Linux (description in the User's Guide)<!DOCTYPE safe>
<safe>
<service mode="farm" maxloop="3" loop_interval="24">
<!-- Farm topology configuration for the membership protocol -->
<!-- Names or IP addresses on the default network are set during initialization in the console -->
<farm>
<lan name="default" />
</farm>
<!-- Software Error Detection Configuration -->
<!-- Replace
* PROCESS_NAME by the name of the process to monitor
-->
<errd polltimer="10">
<proc name="PROCESS_NAME" atleast="1" action="restart" class="both" />
</errd>
<!-- User scripts activation -->
<user nicestoptimeout="300" forcestoptimeout="300" logging="userlog" />
</service>
</safe>
start_both on Linux
#!/bin/sh
# Script called on the primary server for starting application
# For logging into SafeKit log use:
# $SAFE/safekit printi | printe "message"
# stdout goes into Application log
echo "Running start_both $*"
res=0
# Fill with your application start call
if [ $res -ne 0 ] ; then
$SAFE/safekit printe "start_both failed"
# uncomment to stop SafeKit when critical
# $SAFE/safekit stop -i "start_both"
fi
stop_both on Linux
#!/bin/sh
# Script called on the primary server for stopping application
# For logging into SafeKit log use:
# $SAFE/safekit printi | printe "message"
#----------------------------------------------------------
#
# 2 stop modes:
#
# - graceful stop
# call standard application stop
#
# - force stop ($1=force)
# kill application's processes
#
#----------------------------------------------------------
# stdout goes into Application log
echo "Running stop_both $*"
res=0
# default: no action on forcestop
[ "$1" = "force" ] && exit 0
# Fill with your application stop call
[ $res -ne 0 ] && $SAFE/safekit printe "stop_both failed"
Network load balancing and failover |
|
Windows farm | Linux farm |
Generic Windows farm > | Generic Linux farm > |
Microsoft IIS > | - |
NGINX > | |
Apache > | |
Amazon AWS farm > | |
Microsoft Azure farm > | |
Google GCP farm > | |
Other cloud > |
Advanced clustering architectures
Several modules can be deployed on the same cluster. Thus, advanced clustering architectures can be implemented:
- the farm+mirror cluster built by deploying a farm module and a mirror module on the same cluster,
- the active/active cluster with replication built by deploying several mirror modules on 2 servers,
- the Hyper-V cluster or KVM cluster with real-time replication and failover of full virtual machines between 2 active hypervisors,
- the N-1 cluster built by deploying N mirror modules on N+1 servers.
Evidian SafeKit mirror cluster with real-time file replication and failover |
|
3 products in 1 More info > |
|
Very simple configuration More info > |
|
Synchronous replication More info > |
|
Fully automated failback More info > |
|
Replication of any type of data More info > |
|
File replication vs disk replication More info > |
|
File replication vs shared disk More info > |
|
Remote sites and virtual IP address More info > |
|
Quorum and split brain More info > |
|
Active/active cluster More info > |
|
Uniform high availability solution More info > |
|
RTO / RPO More info > |
|
Evidian SafeKit farm cluster with load balancing and failover |
|
No load balancer or dedicated proxy servers or special multicast Ethernet address |
|
All clustering features |
|
Remote sites and virtual IP address |
|
Uniform high availability solution |
|
Software clustering vs hardware clustering
|
|
|
|
Shared nothing vs a shared disk cluster |
|
|
|
Application High Availability vs Full Virtual Machine High Availability
|
|
|
|
High availability vs fault tolerance
|
|
|
|
Synchronous replication vs asynchronous replication
|
|
|
|
Byte-level file replication vs block-level disk replication
|
|
|
|
Heartbeat, failover and quorum to avoid 2 master nodes
|
|
|
|
Virtual IP address primary/secondary, network load balancing, failover
|
|
|
|
User's Guide
Application Modules
Release Notes
Presales documentation
Introduction
-
- Demonstration
- Examples of redundancy and high availability solution
- Evidian SafeKit sold in many different countries with Milestone
- Distinctive advantages
- More information on the web site
-
- Cluster of virtual machines
- Mirror cluster
- Farm cluster
Installation, Console, CLI
- Install and setup / pptx
- Package installation
- Nodes setup
- Upgrade
- Web console / pptx
- Cluster configuration
- Configuration tab
- Control tab
- Monitor tab
- Advanced Configuration tab
- Troubleshooting
- Command line / pptx
- Cluster administration
- Module administration
- Control commands
- Troubleshooting
Advanced configuration
- Mirror module / pptx
- Mirror's states in action
- start_prim / stop_prim scripts
- userconfig.xml
- Heartbeat (<hearbeat>)
- Virtual IP address (<vip>)
- Real-time file replication (<rfs>)
- How real-time file replication works?
- Troubleshooting
- Farm module / pptx
- Farm's states in action
- start_both / stop_both scripts
- userconfig.xml
- Farm heartbeats (<farm>)
- Virtual IP address (<vip>)
- Troubleshooting
Advanced configuration
- Checkers / pptx
- Checkers in action
- userconfig.xml
- errd checker
- intf and ip checkers
- custom checker
- splitbrain checker for a mirror module
- tcp, ping, module checkers
- Troubleshooting
Support
- Support tools / pptx
- How to analyze snapshots?
- Best practises
- Evidian support / pptx
- Get permanent license key
- Register on support.evidian.com
- Call desk