Google GCP: The Simplest Load Balancing Cluster with Failover on Windows and Linux

How it works in Google GCP?

The servers are running in different availability zones.
The critical application is running in all servers of the farm.
Users are connected to a virtual IP address which is configured in the Google GCP load balancer.
SafeKit provides a generic health check for the load balancer.
When the farm module is stopped in a server, the health check returns NOK to the load balancer which stops the load balancing of requests to the server.
The same behavior happens when there is a hardware failure.
In each server, SafeKit monitors the critical application with process checkers and custom checkers.
SafeKit restarts automatically the critical application in a server when there is a software failure thanks to restart scripts.
A connector for the SafeKit web console is installed in each server.
Thus, the load balancing cluster can be managed in a very simple way to avoid human errors.

Installation

Configuration of the Google GCP load balancer

The load balancer must be configured with a virtual IP address.
And the load balancer must be configured to periodically send health packets to nodes.
For that, SafeKit provides a health check which runs inside the nodes and which

returns OK when the farm module state is UP (green)
returns NOT FOUND in all other states

You must configure the Google GCP load balancer with:

HTTP protocol
port 9010, the SafeKit web server port
URL /var/modules/farm/ready.txt (if farm is the module name that you will deploy later)

For more information, see the configuration of the Google GCP load balancer.

Do not configure a virtual IP address and load balancing rules at step 4 in the step by step configuration below. The virtual IP address and load balancing rules are already set in the Google GCP load balancer. Setting a virtual IP and load balancing rules at step 4 is useful for on-premise configuration only.

Configuration of the Google GCP network security

The network security must be configured to enable communications for the following protocols and ports:

UDP - 4800 for the safeadmin service (between SafeKit nodes)
TCP – 9010 for the load-balancer health check and for the SafeKit web console running in the http mode
TCP – 9001 to configure the https mode for the console
TCP – 9453 for the SafeKit web console running in https mode

Package installation on Windows

Download the free version of SafeKit on 2 Windows nodes.

Note: the free version includes all SafeKit features. At the end of the trial, you can activate permanent license keys without uninstalling the package.
To open the Windows firewall, on both nodes start a powershell as administrator, and type
```
c:/safekit/private/bin/firewallcfg add
```
To initialize the password for the default admin user of the web console, on both nodes start a powershell as administrator, and type
```
c:/safekit/private/bin/webservercfg.ps1 -passwd pwd
```
1. Use aphanumeric characters for the password (no special characters).
2. pwd must be the same on both nodes.
Exclude from antivirus scans C:\safekit\ (the default installation directory) and all replicated folders that you are going to define.

Antiviruses may face detection challenges with SafeKit due to its close integration with the OS, virtual IP mechanisms, real-time replication and restart of critical services.

Package installation on Linux

Install the free version of SafeKit on 2 Linux nodes.

Note: the free trial includes all SafeKit features. At the end of the trial, you can activate permanent license keys without uninstalling the package.
After the download of safekit_xx.bin package, execute it to extract the rpm and the safekitinstall script and then execute the safekitinstall script
Answer yes to firewall automatic configuration
Set the password for the web console and the default user admin.
1. Use aphanumeric characters for the password (no special characters).
2. The password must be the same on both nodes.

Step by step configuration

2. Configure node addresses

Enter the node IP addresses, press the Tab key to check connectivity and fill node names.
Then, click on Save and apply to save the configuration.

If either node1 or node2 has a red color, check connectivity of the browser to both nodes and check firewall on both nodes for troubleshooting.

In a farm architecture, you can define more than 2 nodes.

If you want, you can add a new LAN for a second heartbeat.

This operation will place the IP addresses in the cluster.xml file on both nodes (more information in the training with the command line).

4. Configure the module

Choose an Automatic start of the module at boot without delay in Module startup at boot.
Normally, you have a single Heartbeat network (except if you add a LAN at step 2).
Enter a Virtual IP address. A virtual IP address is a standard IP address in the same IP network (same subnet) as the IP addresses of both nodes.
Application clients must be configured with the virtual IP address (or the DNS name associated with the virtual IP address).
Set the service port to load balance (ex.: TCP 80 for HTTP, TCP 443 for HTTPS, TCP 9010 in the example).
Set the load balancing rule, Source address or Source port:
- with the source IP address of the client, the same client will be connected to the same node in the farm on multiple TCP sessions and retrieve its context on the node.
- with the source TCP port of the client, the same client will be connected to different nodes in the farm on multiple TCP sessions (without retrieving a context).
Note that if a process name is displayed in Monitored processes/services, it will be monitored with a restart action in case of failure. Configuring a wrong process name will cause the module to stop right after its start.

5. Edit scripts (optional)

start_both and stop_both must contain starting and stopping of the application (example provided for Microsoft IIS on the right).
You can add new services in these scripts.
Check that the names of the services started in these scripts are those installed on both nodes, otherwise modify them in the scripts.
On Windows and on both nodes, with the Windows services manager, set Boot Startup Type = Manual for all the services registered in start_both (SafeKit controls the start of services in start_both).

8. Wait for the transition to UP (green) / UP (green)

Node 1 and node 2 should reach the UP (green) state, which means that the start_both script has been executed on node 1 and node 2.

If UP (green) is not reached or if the application is not started, analyze why with the module log of node 1 or node 2.

click the "log" icon of node1 or node2 to open the module log and look for error messages such as a checker detecting an error and stopping the module.
click on start_both in the log: output messages of the script are displayed on the right and errors can be detected such as a service incorrectly started.

9. Testing

SafeKit brings a built-in test in the product:

Configure a rule for TCP port 9010 with a load balancing on source TCP port.
Connect an external workstation outside the farm nodes.
Start a browser on http://virtual-ip:9010/safekit/mosaic.html.

You should see a mosaic of colors depending on nodes answering to HTTP requests.

Stop one UP (green) node by scrolling down its contextual menu and clicking Stop.
Check that there is no more TCP connections on the stopped node and on the virtual IP address.

More information on tests in the User's Guide.

Internals

Internal files of a SafeKit / Microsoft Azure load balancing cluster with failover

Go to the Advanced Configuration tab, for editing these files

Internal files of the Windows farm.safe module

userconfig.xml on Windows (description in the User's Guide)

<!DOCTYPE safe>
<safe>
<service mode="farm" maxloop="3" loop_interval="24">
  <!-- Farm topology configuration -->
  <!-- Names or IP addresses on the default network are set during initialization in the console -->
  <farm>
    <lan name="default" />
  </farm>
  <!-- Software Error Detection Configuration -->
  <!-- Replace
       * PROCESS_NAME by the name of the process to monitor
  -->
  <errd polltimer="10">
    <proc name="PROCESS_NAME" atleast="1" action="restart" class="both" />
  </errd>
  <!-- User scripts activation -->
  <user nicestoptimeout="300" forcestoptimeout="300" logging="userlog" />
</service>
</safe>

start_both.cmd on Windows

@echo off

rem Script called on all servers for starting applications

rem For logging into SafeKit log use:
rem "%SAFE%\safekit" printi | printe "message"

rem stdout goes into Application log
echo "Running start_both %*" 

set res=0

rem Fill with your services start call

set res=%errorlevel%
if %res% == 0 goto end

:stop
set res=%errorlevel%
"%SAFE%\safekit" printe "start_both failed"

rem uncomment to stop SafeKit when critical
rem "%SAFE%\safekit" stop -i "start_both"

:end

stop_both.cmd on Windows

@echo off

rem Script called on all servers for stopping application

rem For logging into SafeKit log use:
rem "%SAFE%\safekit" printi | printe "message"

rem ----------------------------------------------------------
rem
rem 2 stop modes:
rem
rem - graceful stop
rem   call standard application stop with net stop
rem
rem - force stop (%1=force)
rem   kill application's processes
rem
rem ----------------------------------------------------------

rem stdout goes into Application log
echo "Running stop_both %*" 

set res=0

rem default: no action on forcestop
if "%1" == "force" goto end

rem Fill with your services stop call

rem If necessary, uncomment to wait for the real stop of services
rem "%SAFEBIN%\sleep" 10

if %res% == 0 goto end

"%SAFE%\safekit" printe "stop_both failed"

:end

Internal files of the Linux farm.safe module

userconfig.xml on Linux (description in the User's Guide)

<!DOCTYPE safe>
<safe>
<service mode="farm" maxloop="3" loop_interval="24">
  <!-- Farm topology configuration for the membership protocol -->
  <!-- Names or IP addresses on the default network are set during initialization in the console -->
  <farm>
    <lan name="default" />
  </farm>
  <!-- Software Error Detection Configuration -->
  <!-- Replace
       * PROCESS_NAME by the name of the process to monitor
  -->
  <errd polltimer="10">
    <proc name="PROCESS_NAME" atleast="1" action="restart" class="both" />
  </errd>
  <!-- User scripts activation -->
  <user nicestoptimeout="300" forcestoptimeout="300" logging="userlog" />
</service>
</safe>

start_both on Linux

#!/bin/sh
# Script called on the primary server for starting application

# For logging into SafeKit log use:
# $SAFE/safekit printi | printe "message" 

# stdout goes into Application log
echo "Running start_both $*" 

res=0

# Fill with your application start call

if [ $res -ne 0 ] ; then
  $SAFE/safekit printe "start_both failed"

  # uncomment to stop SafeKit when critical
  # $SAFE/safekit stop -i "start_both"
fi

stop_both on Linux

#!/bin/sh
# Script called on the primary server for stopping application

# For logging into SafeKit log use:
# $SAFE/safekit printi | printe "message" 

#----------------------------------------------------------
#
# 2 stop modes:
#
# - graceful stop
#   call standard application stop
#
# - force stop ($1=force)
#   kill application's processes
#
#----------------------------------------------------------

# stdout goes into Application log
echo "Running stop_both $*" 

res=0

# default: no action on forcestop
[ "$1" = "force" ] && exit 0

# Fill with your application stop call

[ $res -ne 0 ] && $SAFE/safekit printe "stop_both failed"

Advanced clustering architectures

Several modules can be deployed on the same cluster. Thus, advanced clustering architectures can be implemented:

the farm+mirror cluster built by deploying a farm module and a mirror module on the same cluster,
the active/active cluster with replication built by deploying several mirror modules on 2 servers,
the Hyper-V cluster or KVM cluster with real-time replication and failover of full virtual machines between 2 active hypervisors,
the N-1 cluster built by deploying N mirror modules on N+1 servers.

Key differentiators of a mirror cluster with replication and failover

Evidian SafeKit mirror cluster with real-time file replication and failover
3 products in 1 More info >	The SafeKit high availability software saves on Windows and Linux the cost of : external shared or replicated storage, load balancing boxes, enterprise editions of OS and databases SafeKit includes all clustering features: synchronous real-time file replication, monitoring of server / network / software failures, automatic application restart, virtual IP address switched in case of failure to reroute clients
Very simple configuration More info >	The cluster configuration is very simple and made by means of application modules. New services and new replicated directories can be added to an existing application module to complete a high availability solution All the configuration of clusters is made using a simple centralized web administration console There is no domain controller or active directory to configure as with Microsoft cluster
Synchronous replication More info >	The real-time replication is synchronous with no data loss on failure This is not the case with asynchronous replication
Fully automated failback More info >	After a failure when a server reboots, the replication failback procedure is fully automatic and the failed server reintegrates the cluster without stopping the application on the only remaining server This is not the case with most replication solutions particularly with replication at the database level. Manual operations are required for resynchronizing a failed server. The application may even be stopped on the only remaining server during the resynchonization of the failed server
Replication of any type of data More info >	The replication is working for databases but also for any files which shall be replicated This not the case for replication at the database level
File replication vs disk replication More info >	The replication is based on file directories that can be located anywhere (even in the system disk) This is not the case with disk replication where special application configuration must be made to put the application data in a special disk
File replication vs shared disk More info >	The servers can be put in two remote sites This is not the case with shared disk solutions
Remote sites and virtual IP address More info >	All SafeKit clustering features are working for 2 servers in remote sites. Replication requires an extended LAN type network (latency = performance of synchronous replication, bandwidth = performance of resynchronization after failure). If both servers are connected to the same IP network through an extended LAN between two remote sites, the virtual IP address of SafeKit is working with rerouting at level 2 If both servers are connected to two different IP networks between two remote sites, the virtual IP address can be configured at the level of a load balancer with the "healh check" of SafeKit.
Quorum and split brain More info >	The solution works with only 2 servers and for the quorum (network isolation between both sites), a simple split brain checker to a router is offered to support a single execution of the critical application This is not the case for most clustering solutions where a 3^rd server is required for the quorum
Active/active cluster More info >	The secondary server is not dedicated to the restart of the primary server. The cluster can be active-active by running 2 different mirror modules This is not the case with a fault-tolerant system where the secondary is dedicated to the execution of the same application synchronized at the instruction level
Uniform high availability solution More info >	SafeKit implements a mirror cluster with replication and failover. But it imlements also a farm cluster with load balancing and failover. Thus a N-tiers architecture can be made highly available and load balanced with the same solution on Windows and Linux (same installation, configuration, administration with the SafeKit console or with the command line interface). This is unique on the market This is not the case with an architecture mixing different technologies for load balancing, replication and failover
RTO / RPO More info >	SafeKit implements quick application restart in case of failure: around 1 mn or less Quick application restart is not ensured with full virtual machines replication. In case of hypervisor failure, a full VM must be rebooted on a new hypervisor with a recovery time depending on the OS reboot as with VMware HA or Hyper-V cluster

Key differentiators of a farm cluster with load balancing and failover

Evidian SafeKit farm cluster with load balancing and failover
No load balancer or dedicated proxy servers or special multicast Ethernet address More info >	The solution does not require load balancers or dedicated proxy servers above the farm for imlementing load balancing. SafeKit is installed directly on the application servers in the farm. The load balancing is based on a standard virtual IP address/Ethernet MAC address and is working with physical servers or virtual machines on Windows and Linux without special network configuration This is not the case with network load balancers This is not the case with dedicated proxies on Linux This is not the case with a specific multicast Ethernet address on Windows
All clustering features More info >	The solution includes all clustering features: virtual IP address, load balancing on client IP address or on sessions, monitoring of server / network / software failures, automatic application restart with a quick revovery time and a replication option with a mirror module This is not the case with other load balancing solutions. They are able to make load balancing but they do not include a full clustering solution with restart scripts and automatic application restart in case of failure. They do not offer a replication option The cluster configuration is very simple and made by means of application modules. There is no domain controller or active directory to configure on Windows. The solution works on Windows and Linux
Remote sites and virtual IP address More info >	If servers are connected to the same IP network through an extended LAN between remote sites, the virtual IP address of SafeKit is working with load balancing at level 2 If servers are connected to different IP networks between remote sites, the virtual IP address can be configured at the level of a load balancer with the help of the SafeKit health check. Thus you can implement load balancing but also all the clustering features of SafeKit, in particular monitoring and automatic recovery of the critical application on application servers
Uniform high availability solution More info >	SafeKit imlements a farm cluster with load balancing and failover. But it implements also a mirror cluster with replication and failover. Thus a N-tiers architecture can be made highly available and load balanced with the same solution on Windows and Linux (same installation, configuration, administration with the SafeKit console or with the command line interface). This is unique on the market This is not the case with an architecture mixing different technologies for load balancing, replication and failover

Key differentiators of the SafeKit high availability technology

Software clustering vs hardware clustering More info >
A simple software cluster with the SafeKit package just installed on two servers	Complex hardware clustering with external storage or network load balancers
Shared nothing vs a shared disk cluster More info >
SafeKit is a shared-nothing cluster: easy to deploy even in remote sites	A shared disk cluster is complex to deploy
Application High Availability vs Full Virtual Machine High Availability More info >
Application HA supports hardware failure and software failure with a quick recovery time (RTO around 1 mn or less). Application HA requires to define restart scripts per application and folders to replicate (SafeKit application modules).	Full virtual machines HA supports only hardware failure with a VM reboot and a recovery time depending on the OS reboot. No restart scripts to define with full virtual machines HA (SafeKit hyperv.safe or kvm.safe modules). Hypervisors are active/active with just multiple virtual machines.
High availability vs fault tolerance More info >
No dedicated server with SafeKit. Each server can be the failover server of the other one. Software failure with restart in another OS environment. Smooth upgrade of application and OS possible server by server (version N and N+1 can coexist)	Secondary server dedicated to the execution of the same application synchronized at the instruction level. Software exception on both servers at the same time. Smooth upgrade not possible
Synchronous replication vs asynchronous replication More info >
SafeKit implements real-time synchronous replication with no data loss in case of failure	With asynchronous replication, there is data loss on failure
Byte-level file replication vs block-level disk replication More info >
SafeKit implements real-time byte-level file replication and is simply configured with application directories to replicate even in the system disk	Block-level disk replication is complex to configure and requires to put application data in a special disk
Heartbeat, failover and quorum to avoid 2 master nodes More info >
To avoid 2 masters, SafeKit proposes a simple split brain checker configured on a router	To avoid 2 masters, other clusters require a complex configuration with a third machine, a special quorum disk, a special interconnect
Virtual IP address primary/secondary, network load balancing, failover More info >
No dedicated proxy servers and no special network configuration are required in a SafeKit cluster for virtual IP addresses	Special network configuration is required in other clusters for virtual IP addresses. Note that SafeKit offers a health check adapted to load balancers

Advanced configuration

Mirror module / pptx
- start_prim / stop_prim scripts
- userconfig.xml
- Heartbeat (<hearbeat>)
- Virtual IP address (<vip>)
- Real-time file replication (<rfs>)
- How real-time file replication works?
- Mirror's states in action
Farm module / pptx
- start_both / stop_both scripts
- userconfig.xml
- Farm heartbeats (<farm>)
- Virtual IP address (<vip>)
- Farm's states in action

Checkers / pptx
- userconfig.xml
- errd checker
- intf and ip checkers
- custom checker
- splitbrain checker for a mirror module
- tcp, ping, module checkers
- Checkers in action

Network load balancing and failover
Windows farm	Linux farm
Generic Windows farm >	Generic Linux farm >
Microsoft IIS >	-
NGINX >
Apache >
Amazon AWS farm >
Microsoft Azure farm >
Google GCP farm >
Other cloud >

Real-time file replication and failover
Windows mirror	Linux mirror
Generic Windows mirror >	Generic Linux mirror >
Microsoft SQL Server >	-
Oracle >
MariaDB >
MySQL >
PostgreSQL >
Firebird >
Windows Hyper-V >	Linux KVM >
-	Docker > Podman > Kubernetes K3S >
-	Elasticsearch >
Milestone XProtect >	-
Genetec SQL Server >	-
Hanwha Vision > Hanwha Wisenet >	-
Nedap AEOS >	-
Siemens SIMATIC WinCC > Siemens SIMATIC PCS 7 > Siemens Siveillance suite > Siemens Siveillance VMS > Siemens Desigo CC > Siemens SiPass > Siemens SIPORT >	-
Bosch AMS > Bosch BIS > Bosch BVMS >	-
Amazon AWS mirror >
Microsoft Azure mirror >
Google GCP mirror >
Other cloud >

Google GCP: The Simplest Load Balancing Cluster with Failover on Windows and Linux

Evidian SafeKit

How the Evidian SafeKit software simply implements a load balancing cluster with failover in Google GCP?

The solution in Google GCP

A generic product

Architecture

How it works in Google GCP?

Partners, the success with SafeKit

Free trial + GCP quick start configuration to deploy a SafeKit farm cluster between two Windows or Linux nodes

Configuration of the Google GCP load balancer

Configuration of the Google GCP network security

Package installation on Windows

Package installation on Linux

1. Launch the SafeKit console

2. Configure node addresses

3. Select a module

4. Configure the module

5. Edit scripts (optional)

6. Communication encryption (optional)

7. Start the farm cluster on node 1 and node 2

8. Wait for the transition to UP (green) / UP (green)

9. Testing

10. Support

Internal files of a SafeKit / Microsoft Azure load balancing cluster with failover

Internal files of the Windows farm.safe module

Internal files of the Linux farm.safe module

Discover SafeKit in Microsoft Azure

Discover SafeKit in Google GCP

Discover SafeKit in Amazon AWS

SafeKit Modules for Plug&Play Redundancy and High Availability Solutions

Network load balancing and failover

Advanced clustering architectures

Real-time file replication and failover

Evidian SafeKit Webinar

SafeKit High Availability Differentiators against Competition

Evidian SafeKit mirror cluster with real-time file replication and failover

Evidian SafeKit farm cluster with load balancing and failover

Evidian SafeKit 8.2

All new features compared to SafeKit 7.5 described in the release notes

Packages

One-month license key

Technical documentation

Training

Product information

Modules and quick installation

SafeKit 8.2 Training

Introduction

Installation, Console, CLI

Advanced configuration

Troubleshooting

Support