Google GCP: The Simplest Load Balancing Cluster with Failover

Google GCP: The Simplest High Availability Cluster with Synchronous Replication and Failover

Evidian SafeKit provides load balancing and failover in Google GCP, the Google cloud. This article explains how to implement quickly a load balancing and failover cluster in Google GCP. A free trial is offered in the installation instructions section.

This clustering solution is recognized as the simplest to implement by our customers and partners. It is also a complete solution that solves hardware failures (20% of problems) including the complete failure of a computer room, software failures (40% of problems) including software error detection and automatic restart and human errors (40% of problems) thanks to its simplicity of administration.

How the Evidian SafeKit software simply implements a load balancing cluster with failover in Google GCP?

How the Evidian SafeKit farm cluster implements load balancing and failover in Google GCP?

On the previous figure,

All cloud solutions [+]

FAQ on Evidian SafeKit [+]

Key differentiators of load balancing and failover with the Evidian SafeKit farm cluster

Evidian SafeKit farm cluster with load balancing and failover

All clustering features All clustering features

Like  The solution includes all clustering features: virtual IP address, load balancing on client IP address or on sessions, server failure monitoring, network failure monitoring, software failure monitoring, automatic application restart with a quick revovery time and a replication option with a mirror module

Dislike  This is not the case with other load balancing solutions. They are able to make load balancing but they do not include a full clustering solution with restart scripts and automatic application restart in case of failure. They do not offer a replication option

Like   The cluster configuration is very simple and made by means of a high availability application module. There is no domain controller or active directory to configure on Windows. The solution works on Windows and Linux

Remote sites Remote sites

Like   If servers are connected to the same IP network through an extended LAN between remote sites, the virtual IP address of SafeKit is working with load balancing at level 2

Like   If servers are connected to different IP networks between remote sites, the virtual IP address can be configured at the level of a load balancer. SafeKit offers a health check: the load balancer is configured with a URL managed by SafeKit which returns OK on the UP servers and NOT FOUND else. This solution is implemented for SafeKit in the Cloud but it can be also implemented with a load balancer on premise. Thus you can implement load balancing but also all the clustering features of SafeKit including an easy administration of the cluster through the SafeKit web console

Uniform high availability solution Uniform high availability solution

Like  SafeKit imlements a farm cluster with load balancing and failover. But it implements also a mirror cluster with replication and failover. Thus a N-tiers architecture can me made highly available and load balanced with the same solution on Windows and Linux (same installation, configuration, administration with the SafeKit console or with the command line interface). This is unique on the market

Dislike  This is not the case with an architecture mixing different technologies for load balancing, replication and failover

Farm cluster in Google GCP: installation on existing Google GCP virtual machines (Windows or Linux)

Configuration of the Google GCP load balancer

The load balancer must be configured to periodically send health packets to virtual machines. For that, SafeKit provides a health check which runs inside the virtual machines and which

You must configure the Google GCP load balancer with:

For more information, see the configuration of the Google GCP load balancer.

Configuration of the Google GCP network security

The network security must be configured to enable communications for the following protocols and ports:

Package installation on Windows

On both Windows servers

Package installation on Linux

On both Linux servers

Configuration instructions

The configuration is presented with the web console connected to 2 Windows servers but it is the same thing with 2 Linux servers.

Important: all the configuration must be done from a single browser.

It is recommended to configure the web console in the https mode by connecting to https://<IP address of 1 VM>:9453 (next image). In this case, you must configure before the https mode by using the wizard described in the User's Guide: see "11.1 HTTPS Quick Configuration with the Configuration Wizard".

Start the https SafeKit web console for configuring

Or you can use the web console in the http mode by connecting to http://<IP address of 1 VM>:9010 (next image).

Start the SafeKit web console for configuring

Note that you can also make a configuration with DNS names, especially if the IP addresses are not static.

Enter IP address of the first node and click on Confirm (next image)

SafeKit web console - first  node in the cluster

Click on New node and enter IP address of the second node (next image)

SafeKit web console - second  node in the cluster

Click on the red floppy disk to save the configuration (previous image)

In the Configuration tab, click on farm.safe then enter farm as the module name and Confirm (next images with farm instead of xxx)

SafeKit web console - start configuration of SafeKit web console - enter  module name

Click on Validate (next image)

SafeKit web console - enter  module nodes

Do not configure a virtual IP address (next image) because this configuration is already made in the Google GCP load balancer. This section is useful for on-premise configuration only.

If a process is defined in the Process Checker section (next image), it will be monitored with the action restart in case of failure. The services will be stopped an restarted locally on the local server if this process disappears from the list of running processes. After 3 unsuccessful local restarts, the module is stopped on the local server. As a consequence, the health check answers NOT FOUND to the Google GCP load balancer and the load balancing is reconfigured to load balance the traffic on the remaining servers of the farm.

start_both and stop_both (next image) contain the start and the stop of services.

SafeKit web console - enter  parameters

Click on Validate (previous image)

SafeKit web console - stop the  module before configuration

Click on Configure (previous image)

SafeKit web console - check the success green messages of the  configuration

Check the success green message on both servers and click on Next (previous image)

SafeKit web console - start the  cluster on both nodes

Start the cluster on both nodes (previous image). Check that the status becomes UP (green) - UP (green) (next image).

SafeKit web console -  cluster started

The cluster is operational with services running on both UP nodes (previous image).

Be careful, components which are clients of the services must be configured with the virtual IP address. The configuration can be made with a DNS name (if a DNS name has been created and associated with the virtual IP address).

Tests

Check with Windows Microsoft Management Console (MMC) or with Linux command lines that the services are started on both UP nodes. Put  services with Boot Startup Type = Manual (SafeKit controls start of services).

Stop one UP node by scrolling down the menu of the node and by clicking on Stop. Check that the load balancing is reconfigured with only the other node taking all TCP connections. And check that the services are stopped on the STOP node with Windows Microsoft Management Console (MMC) or with Linux command lines.

To understand what happens in the cluster, check the SafeKit logs of node 1 and node 2.

To see the module log of node 1 (next image):

SafeKit web console - Module Log of the  node 1

To see the application log of node 1 (next image):

SafeKit web console - Application Log of  node 1

To see the logs of node 2 (previous image), click on W12R2server75/UP (it will become blue) on the left side and repeat the same operations.

Advanced configuration

In Advanced Configuration tab (next image), you can edit internal files of the module: bin/start_both and bin/stop_both and conf/userconfig.xml (next image on the left side). If you make change in the internal files here, you must apply the new configuration by a right click on the blue icon/xxx on the left side (next image): the interface will allow you to redeploy the modified files on both servers.

SafeKit web console - Advanced configuration of the  module

Configure boot start (next image on the right side) configures the automatic boot of the module when the server boots. Do this configuration on both nodes once the load balancing and failover solution is correctly running.

SafeKit web console - automatic boot of the  module

Support

For getting support on the call desk of https://support.evidian.com, get 2 Snaphots (2 .zip files), one for each server and upload them in the call desk tool (next image).

SafeKit web console -  snaphots for support

Internal files of the Windows farm.safe module

userconfig.xml

<!DOCTYPE safe>
<safe>
<service mode="farm" maxloop="3" loop_interval="24">
  <!-- Farm topology configuration -->
  <!-- Names or IP addresses on the default network are set during initialization in the console -->
  <farm>
    <lan name="default" />
  </farm>
  <!-- Software Error Detection Configuration -->
  <!-- Replace
       * PROCESS_NAME by the name of the process to monitor
  -->
  <errd polltimer="10">
    <proc name="PROCESS_NAME" atleast="1" action="restart" class="both" />
  </errd>
  <!-- User scripts activation -->
  <user nicestoptimeout="300" forcestoptimeout="300" logging="userlog" />
</service>
</safe>

start_both.cmd

@echo off

rem Script called on all servers for starting applications

rem For logging into SafeKit log use:
rem "%SAFE%\safekit" printi | printe "message"

rem stdout goes into Application log
echo "Running start_both %*" 

set res=0

rem Fill with your services start call

set res=%errorlevel%
if %res% == 0 goto end

:stop
set res=%errorlevel%
"%SAFE%\safekit" printe "start_both failed"

rem uncomment to stop SafeKit when critical
rem "%SAFE%\safekit" stop -i "start_both"

:end

stop_both.cmd

@echo off

rem Script called on all servers for stopping application

rem For logging into SafeKit log use:
rem "%SAFE%\safekit" printi | printe "message"

rem ----------------------------------------------------------
rem
rem 2 stop modes:
rem
rem - graceful stop
rem   call standard application stop with net stop
rem
rem - force stop (%1=force)
rem   kill application's processes
rem
rem ----------------------------------------------------------

rem stdout goes into Application log
echo "Running stop_both %*" 

set res=0

rem default: no action on forcestop
if "%1" == "force" goto end

rem Fill with your services stop call

rem If necessary, uncomment to wait for the real stop of services
rem "%SAFEBIN%\sleep" 10

if %res% == 0 goto end

"%SAFE%\safekit" printe "stop_both failed"

:end

Internal files of the Linux farm.safe module

userconfig.xml

<!DOCTYPE safe>
<safe>
<service mode="farm" maxloop="3" loop_interval="24">
  <!-- Farm topology configuration for the membership protocol -->
  <!-- Names or IP addresses on the default network are set during initialization in the console -->
  <farm>
    <lan name="default" />
  </farm>
  <!-- Software Error Detection Configuration -->
  <!-- Replace
       * PROCESS_NAME by the name of the process to monitor
  -->
  <errd polltimer="10">
    <proc name="PROCESS_NAME" atleast="1" action="restart" class="both" />
  </errd>
  <!-- User scripts activation -->
  <user nicestoptimeout="300" forcestoptimeout="300" logging="userlog" />
</service>
</safe>

start_both

#!/bin/sh
# Script called on the primary server for starting application

# For logging into SafeKit log use:
# $SAFE/safekit printi | printe "message" 

# stdout goes into Application log
echo "Running start_both $*" 

res=0

# Fill with your application start call

if [ $res -ne 0 ] ; then
  $SAFE/safekit printe "start_both failed"

  # uncomment to stop SafeKit when critical
  # $SAFE/safekit stop -i "start_both"
fi

stop_both

#!/bin/sh
# Script called on the primary server for stopping application

# For logging into SafeKit log use:
# $SAFE/safekit printi | printe "message" 

#----------------------------------------------------------
#
# 2 stop modes:
#
# - graceful stop
#   call standard application stop
#
# - force stop ($1=force)
#   kill application's processes
#
#----------------------------------------------------------

# stdout goes into Application log
echo "Running stop_both $*" 

res=0

# default: no action on forcestop
[ "$1" = "force" ] && exit 0

# Fill with your application stop call

[ $res -ne 0 ] && $SAFE/safekit printe "stop_both failed"