PostgreSQL High Availability with Synchronous Replication and Failover

Evidian SafeKit brings high availability to PostgreSQL. This article explains how to implement quickly a PostgreSQL cluster without shared disk and without specific skills. A free trial is offered.

How the Evidian SafeKit software simply implements PostgreSQL high availability with real-time synchronous replication and failover without shared disk

How the Evidian SafeKit mirror cluster implements PostgreSQL high availability with synchronous replication and failover?

On the previous figure, the server 1/PRIM (Windows or Linux) runs PostgreSQL (any edition). Users are connected to the virtual IP address of the mirror cluster. SafeKit replicates files opened by PostgreSQL in real time. Only changes in the files are replicated across the network, thus limiting traffic (byte-level file replication). Names of file directories containing PostgreSQL database are simply configured in SafeKit. There are no pre-requisites on disk organization for the two servers. Directories to replicate may be located in the system disk. SafeKit implements synchronous replication with no data loss on failure contrary to asynchronous replication.

In case of server 1 failure, there is an automatic failover on server 2 with restart of PostgreSQL. Then, when server 1 is restarted, SafeKit implements failback with reintegration of PostgreSQL database without stopping PostgreSQL on server 2. Finally, the system returns to synchronous replication between server 2 and server 1. The administrator can decide to swap the role of primary and secondary and return to a server 1 running PostgreSQL. The swap can also be done automatically by configuration.

Configuration overview of PostgreSQL high availability with synchronous replication and failover

With SafeKit, you can configure either a farm application module or a mirror application module according the high availability architecture suited for an application. For PostgreSQL high availability with synchronous replication and failover, the right module is the mirror module.

Configuration overview of PostgreSQL high availability with synchronous replication and failover

The configuration files for PostgreSQL high availability are given  for Windows here and for Linux here.

They include:

1. the PostgreSQL stop and start scripts,

2. the configuration file userconfig.xml which contains:

Deployment of PostgreSQL high availability with synchronous replication and failover requires no specific IT skills:

    • install PostgreSQL on two standard servers
    • install the SafeKit software on both servers
    • install the postgresql.safe module

SafeKit configuration files on Windows for PostgreSQL high availability with synchronous replication and failover

Installation instructions

userconfig.xml

<!DOCTYPE safe>
<safe>
<service mode="mirror" defaultprim="alone" maxloop="3" loop_interval="24" failover="on">
  <!-- Heartbeat Configuration -->
  <!-- Names or IP addresses on the default network are set during initialization in the console -->
  <heart pulse="700" timeout="30000">
    <heartbeat name="default" ident="flow">
    </heartbeat>
  </heart>
  <!-- Virtual IP Configuration -->
  <!-- Replace
     * VIRTUAL_TO_BE_DEFINED by the name of your virtual server 
  -->
  <vip>
    <interface_list>
        <interface check="on" arpreroute="on"> 
	  <real_interface>
               <virtual_addr addr="VIRTUAL_TO_BE_DEFINED" where="one_side_alias" />
          </real_interface>
        </interface>
    </interface_list>
  </vip>
  <!-- Software Error Detection Configuration -->
  <errd polltimer="10">
    <!-- PostgreSQL Server -->
    <proc name="pg_ctl.exe" atleast="1" action="restart" class="prim" />
  </errd>
  <!-- File Replication Configuration -->
  <!-- Replicate
     * C:\Program Files\PostgreSQL\9.5\data\ default directory path of PostgreSQL database and redo log
  -->
  <rfs async="second" acl="off" nbrei="3">
	<replicated dir="C:\Program Files\PostgreSQL\9.5\data\" mode="read_only" />
  </rfs>
  <!-- User scripts activation -->
  <user nicestoptimeout="300" forcestoptimeout="300" logging="userlog" />
</service>
</safe>

start_prim.cmd

@echo off
rem Script called on the primary server for starting application services 

rem For logging into SafeKit log use:
rem "%SAFE%\safekit" printi | printe "message"

rem stdout goes into Application log
echo "Running start_prim %*" 

set res=0

net start postgresql-9.5 > nul
if not %errorlevel% == 0 (
  %SAFE%\safekit printi "PostgreSQL start failed"
) else (
  %SAFE%\safekit printi "PostgreSQL started"
)

if %res% == 0 goto end

:stop
set res=%errorlevel%
"%SAFE%\safekit" printe "start_prim failed"

rem uncomment to stop SafeKit when critical
rem "%SAFE%\safekit" stop -i "start_prim"

:end

stop_prim.cmd

@echo off
rem Script called on the primary server for stopping application services 

rem ----------------------------------------------------------
rem
rem 2 stop modes:
rem
rem - graceful stop
rem   call standard application stop with net stop
rem
rem - force stop (%1=force)
rem   kill application's processes
rem
rem ----------------------------------------------------------

rem For logging into SafeKit log use:
rem "%SAFE%\safekit" printi | printe "message"

rem stdout goes into Application log
echo "Running stop_prim %*" 

set res=0

rem default: no action on forcestop
if "%1" == "force" goto end

net stop postgresql-9.5 > nul
%SAFE%\safekit printi "PostgreSQL stopped"

rem wait a little for a real stop of services
%SAFEBIN%\sleep 10

:end 

SafeKit configuration files on Linux for PostgreSQL high availability with synchronous replication and failover

Installation instructions

userconfig.xml

<!DOCTYPE safe>
<safe>
<service mode="mirror" defaultprim="alone" maxloop="3" loop_interval="24" failover="on">
  <!-- Heartbeat Configuration -->
  <!-- Names or IP addresses on the default network are set during initialization in the console -->
  <heart pulse="700" timeout="30000">
    <heartbeat name="default" ident="flow">
    </heartbeat>
  </heart>
  <!-- Virtual IP Configuration -->
  <!-- Replace
     * VIRTUAL_TO_BE_DEFINED by the name of your virtual server 
  -->
  <vip>
    <interface_list>
        <interface check="on" arpreroute="on"> 
	  <real_interface>
               <virtual_addr addr="VIRTUAL_TO_BE_DEFINED" where="one_side_alias" />
          </real_interface>
        </interface>
    </interface_list>
  </vip>
  <!-- Software Error Detection Configuration -->
  <errd polltimer="10">
    <!-- PostgreSQL Server -->
    <proc name="postgres" atleast="1" action="restart" class="prim" />
  </errd>
  <!-- File Replication Configuration -->
  <!-- Replicate
     * /usr/local/pgsql/data: default directory path of PostgreSQL database and redo log
  -->
  <rfs mountover="off" async="second" acl="off" nbrei="3">
	<replicated dir="/usr/local/pgsql/data" mode="read_only" />
  </rfs>
  <!-- User scripts activation -->
  <user nicestoptimeout="300" forcestoptimeout="300" logging="userlog" />
</service>
</safe>

start_prim

#!/bin/sh 
# Script called on the primary server for starting applications 

# For logging into SafeKit log use:
# $SAFE/safekit printi | printe "message" 

#---------- Clean PostgreSQL residual processes 
# Call this function before starting any PostgreSQL databases 
# to clean eventual resual PostgreSQL processes
clean_PostgreSQL()
{
  retval=0

  $SAFE/safekit printw "Cleaning PostgreSQL processes"

  # kill started PostgreSQL processes
  ps -e -o pid,comm | grep postgres | $AWK '{print "kill " $1}'| sh >/dev/null 2>&1

  return $retval
}

#---------- PostgreSQL Databases
# Call this function for starting PostgreSQL Server
start_PostgreSQL()
{
  retval=0

  $SAFE/safekit printw "Starting PostgreSQL Server"

  # PostgreSQL - Database Starting 
  /bin/su - postgres -c "/usr/local/pgsql/bin/pg_ctl start -D /usr/local/pgsql/data"  
  if [ $? -ne 0 ] ; then 
    $SAFE/safekit printw "PostgreSQL server start failed"
  else
    $SAFE/safekit printw "PostgreSQL server started"
  fi

  return $retval
}

# stdout goes into Application log
echo "Running start_prim $*" 

res=0

[ -z "$OSNAME" ] && OSNAME=`uname -s`
OSNAME=`uname -s`
case "$OSNAME" in
    Linux)
	AWK=/bin/awk
	;;
    *)
	AWK=/usr/bin/awk
	;;
esac

# TODO
# remove PostgreSQL boot start 

# Clean PostgreSQL residual processes 
clean_PostgreSQL || res=$?

# Start PostgreSQL databases
start_PostgreSQL || res=$?

if [ $res -ne 0 ] ; then
  $SAFE/safekit printi "start_prim failed"

  # uncomment to stop SafeKit when critical
  # $SAFE/safekit stop -i "start_prim"
fi

exit 0

stop_prim

#!/bin/sh
# Script called on the primary server for stopping application services

# For logging into SafeKit log use:
# $SAFE/safekit printi | printe "message" 

#----------------------------------------------------------
#
# 2 stop modes:
#
# - graceful stop
#   call standard application stop
#
# - force stop ($1=force)
#   kill application's processes
#
#----------------------------------------------------------

#---------- Clean PostgreSQL residual processes
# Call this function on force stop 
# to clean eventual resual PostgreSQL processes 
clean_PostgreSQL()
{
  retval=0

  $SAFE/safekit printw "Cleaning PostgreSQL processes "

  # kill started PostgreSQL 
  ps -e -o pid,comm | grep postgres | $AWK '{print "kill -9 " $1}'| sh >/dev/null 2>&1

  return $retval
}

#---------- PostgreSQL databases
# Call this function for stopping PostgreSQL databases
stop_PostgreSQL()
{
  retval=0

  if [ "$1" = "force" ] ; then
    # PostgreSQL databases force stop
    clean_PostgreSQL
    return $retval
  fi

  # PostgreSQL databases gracefull stop
  $SAFE/safekit printw "Stopping PostgreSQL server"  
  /bin/su - postgres -c "/usr/local/pgsql/bin/pg_ctl stop"
  if [ $? -ne 0 ] ; then 
    $SAFE/safekit printw "PostgreSQL server stop failed"
  else
    $SAFE/safekit printw "PostgreSQL server stopped"  
  fi

  return $retval
}

# stdout goes into Application log
echo "Running stop_prim $*" 

res=0

[ -z "$OSNAME" ] && OSNAME=`uname -s`
case "$OSNAME" in
    Linux)
	AWK=/bin/awk
	;;
    *)
	AWK=/usr/bin/awk
	;;
esac

mode=
if [ "$1" = "force" ] ; then
  mode=force
  shift
fi

# Stop PostgreSQL  server 
stop_PostgreSQL $mode || res=$?

[ $res -ne 0 ] && $SAFE/safekit printi "stop_prim failed"

exit 0

Demonstration

This demonstration is made with Microsoft SQL Server Express but the operating mode is the same as PostgreSQL.

More on SafeKit

Other examples of high availability modules:

Mirror modules

Windows

Linux

Microsoft SQL ServerWindows module-
OracleWindows moduleLinux module
MySQLWindows moduleLinux module
PostgreSQLWindows moduleLinux module
FirebirdWindows moduleLinux module
Hyper-VWindows module-
Hanwha SSMWindows module-
Milestone XProtectWindows module-
Generic moduleWindows moduleLinux module

Farm modules

Windows

Linux

IIS moduleWindows module-
Apache moduleWindows moduleLinux module
Generic moduleWindows moduleLinux module
contact
CONTACT
Demonstration

Evidian SafeKit Pricing





White Papers

contact
NEWS

To receive Evidian news, please fill the following form.