Oracle High Availability with Synchronous Replication and Failover

How the Evidian SafeKit software simply implements Oracle high availability with real-time synchronous replication and failover without shared disk

How the Evidian SafeKit mirror cluster implements Oracle high availability with synchronous replication and failover?

On the previous figure, the server 1/PRIM (Windows or Linux) runs Oracle (any edition). Users are connected to the virtual IP address of the mirror cluster. SafeKit replicates files opened by Oracle in real time. Only changes in the files are replicated across the network, thus limiting traffic (byte-level file replication). Names of file directories containing Oracle database are simply configured in SafeKit. There are no pre-requisites on disk organization for the two servers. Directories to replicate may be located in the system disk. SafeKit implements synchronous replication with no data loss on failure contrary to asynchronous replication.

In case of server 1 failure, there is an automatic failover on server 2 with restart of Oracle. Then, when server 1 is restarted, SafeKit implements failback with reintegration of Oracle database without stopping Oracle on server 2. Finally, the system returns to synchronous replication between server 2 and server 1. The administrator can decide to swap the role of primary and secondary and return to a server 1 running Oracle. The swap can also be done automatically by configuration.

Configuration overview of Oracle high availability with synchronous replication and failover

With SafeKit, you can configure either a farm application module or a mirror application module according the high availability architecture suited for an application. For Oracle high availability with synchronous replication and failover, the right module is the mirror module.

Configuration overview of Oracle high availability with synchronous replication and failover

Starting from the generic mirror module, the Oracle module can be customized with the SafeKit web console (Advanced configuration).

The configuration files for Oracle high availability are given for Windows here and for Linux here.

They include:

1. the Oracle stop and start scripts,

2. the configuration file userconfig.xml which contains:

Once the module is configured and tested, you can publish it with the web console. Thus, deployment of Oracle high availability with synchronous replication and failover requires no specific IT skills:

    • install Oracle on two standard Windows or Linux servers
    • install the SafeKit software on both servers

Then with the SafeKit web console (Quick configuration):

SafeKit configuration files on Windows for Oracle high availability with synchronous replication and failover

userconfig.xml

<!DOCTYPE safe>
<safe>
<service mode="mirror" defaultprim="alone" maxloop="3" loop_interval="24" failover="on">
  <!-- Heartbeat Configuration -->
  <!-- Names or IP addresses on the default network are set during initialization in the console -->
  <heart pulse="700" timeout="30000">
    <heartbeat name="default" ident="flow">
    </heartbeat>
  </heart>
  <!-- Virtual IP Configuration (used by Oracle SQL*Net Listener) -->
  <!-- Replace
     * VIRTUAL_TO_BE_DEFINED by the IP address of your virtual server 
  --> 
  <vip>
    <interface_list>
        <interface check="on" arpreroute="on">
           <real_interface>
               <virtual_addr addr="VIRTUAL_TO_BE_DEFINED" where="one_side_alias" />
          </real_interface>
        </interface>
    </interface_list>
  </vip>
  <!-- Software Error Detection Configuration -->
  <errd polltimer="10">
    <!-- Oracle databases 
    For monitoring one specific oracle instance, insert the attribute
    argregex="{.*SID.*}" where SID is the name of the DataBase
    -->
    <proc name="oracle.exe" atleast="1" action="restart" class="prim" />
    <proc name="tnslsnr.exe" atleast="1" action="restart" class="prim" />
  </errd>
  <!-- File Replication Configuration -->
  <!-- Replace
     * ORACLE_DATA_TO_BE_DEFINED by the path of your Oracle database directory and transaction logs
  -->
  <rfs async="second" acl="off" nbrei="3">
    <replicated dir="ORACLE_DATA_TO_BE_DEFINED" mode="read_only" />
  </rfs>
  <!-- User scripts Configuration / Environment variables -->
  <user nicestoptimeout="300" forcestoptimeout="300" logging="userlog">
    <var name="ORACLE_SID" value="ORACLE_SID_TO_BE_DEFINED" />
    <var name="ORACLE_HOME_NAME" value="ORACLE_HOME_NAME_TO_BE_DEFINED" />
  </user>
</service>
</safe>

start_prim.cmd

@echo off
rem Script called on the primary server for starting application services 

rem For logging into SafeKit log use:
rem "%SAFE%\safekit" printi | printe "message"

rem stdout goes into Application log
echo "Running start_prim %*" 

set res=0

net start "OracleDbConsole%ORACLE_SID%" > nul
if not %errorlevel% == 0 goto stop
%SAFE%\safekit printi "OracleDbConsole%ORACLE_SID% started"

net start "OracleService%ORACLE_SID%" > nul
if not %errorlevel% == 0 goto stop
%SAFE%\safekit printi "OracleService%ORACLE_SID% started"

net start "Oracle%ORACLE_HOME_NAME%TNSListener" > nul
if not %errorlevel% == 0 goto stop
%SAFE%\safekit printi "Oracle%ORACLE_HOME_NAME%TNSListener started"

if %res% == 0 goto end

:stop
set res=%errorlevel%
%SAFE%\safekit printi "Oracle start failed"

rem uncomment to stop SafeKit when critical
rem %SAFE%\safekit stop -i "start_prim"

:end

stop_prim.cmd

@echo off
rem Script called on the primary server for stopping application services 

rem ----------------------------------------------------------
rem
rem 2 stop modes:
rem
rem - graceful stop
rem   call standard application stop with net stop
rem
rem - force stop (%1=force)
rem   kill application's processes
rem
rem ----------------------------------------------------------

rem For logging into SafeKit log use:
rem "%SAFE%\safekit" printi | printe "message"

rem stdout goes into Application log
echo "Running stop_prim %*" 

set res=0

rem default: no action on forcestop
if "%1" == "force" goto end

net stop OracleService%ORACLE_SID% > nul
%SAFE%\safekit printi "OracleService%ORACLE_SID% stopped"

net stop  OracleDBConsole%ORACLE_SID% > nul
%SAFE%\safekit printi "OracleDBConsole%ORACLE_SID% stopped"

net stop  Oracle%ORACLE_HOME_NAME%TNSListener > nul
%SAFE%\safekit printi "Oracle%ORACLE_HOME_NAME%TNSListener stopped"

rem wait a little for a real stop of services
%SAFEBIN%\sleep 10

:end

SafeKit configuration files on Linux for Oracle high availability with synchronous replication and failover

userconfig.xml

<!DOCTYPE safe>
<safe>
<service mode="mirror" defaultprim="alone" maxloop="3" loop_interval="24" failover="on">
  <!-- Heartbeat Configuration -->
  <!-- Names or IP addresses on the default network are set during initialization in the console -->
  <heart pulse="700" timeout="30000">
    <heartbeat name="default" ident="flow">
    </heartbeat>
  </heart>
  <!-- Virtual IP Configuration (used by Oracle SQL*Net Listener) -->
  <!-- Replace
  * VIRTUAL_TO_BE_DEFINED by the IP address of your virtual server
  -->
  <vip>
    <interface_list>
      <interface check="on" arpreroute="on">
        <real_interface>
          <virtual_addr addr="VIRTUAL_TO_BE_DEFINED" where="one_side_alias" />
        </real_interface>
      </interface>
    </interface_list>
  </vip>
  <!-- Software Error Detection Configuration -->
  <errd polltimer="10">
    <!-- Oracle databases
    For monitoring one specific oracle instance, insert the attribute
    argregex=".*SID$" where SID is the name of the DataBase
    -->
    <proc name="oracle" nameregex="ora_.*" atleast="1" action="restart" class="prim" />
    <proc name="tnslsnr" atleast="1" action="restart" class="prim" />
  </errd>
  <!-- File Replication Configuration -->
  <!-- Replace
  * ORACLE_DATA_TO_BE_DEFINED by the path of your Oracle database directory and transaction logs
  -->
  <rfs mountover="off" packetsize="32768" async="second" acl="off" nbrei="3">
    <replicated dir="ORACLE_DATA_TO_BE_DEFINED" mode="read_only" />
  </rfs>
  <!-- User scripts Configuration / Environment variables -->
  <user nicestoptimeout="300" forcestoptimeout="300" logging="userlog">
    <var name="ORACLE_HOME" value="ORACLE_HOME_TO_BE_DEFINED" />
    <var name="ORACLE_DBA" value="ORACLE_DBA_TO_BE_DEFINED" />
  </user>
</service>
</safe>

start_prim

#!/bin/sh
# Script called on the primary server for starting applications

# For logging into SafeKit log use:
# $SAFE/safekit printi | printe "message" 

#---------- Clean Oracle residual processes and shared memory
# Call this function before starting any Oracle databases 
# to clean eventual resual Oracle processes and IPC
clean_oracle()
{
  retval=0

  $SAFE/safekit printw "Cleaning Oracle processes and shared memory"

  # kill started Oracle databases
  ps -e -o pid,comm |grep ora | $AWK '{print "kill " $1}'| sh >/dev/null 2>&1

  # delete oracle shared memory to start in a clean state
  case $OSNAME in
 	   Linux)
	        ipcs -m |grep oracle |$AWK '{print "shm "$2 | "xargs ipcrm"}' >/dev/null 2>&1
		ipcs -s |grep oracle |$AWK '{print "sem "$2 | "xargs ipcrm"}' >/dev/null 2>&1
		;;
 	   *)
	        ipcs -m |grep oracle |$AWK '{print "-m "$2 | "xargs ipcrm"}' >/dev/null 2>&1
		ipcs -s |grep oracle |$AWK '{print "-s "$2 | "xargs ipcrm"}' >/dev/null 2>&1
		;;
  esac

  if [ -f $ORACLE_HOME/dbs/sgadef*.dbf ]; then          
    rm $ORACLE_HOME/dbs/sgadef*.dbf                   
  fi                                               

  return $retval
}

#---------- Oracle Databases
# Call this function for starting Oracle Databases        
start_oracle()
{
  retval=0

  $SAFE/safekit printw "Starting Oracle databases"

  # Oracle - Database Starting 
  /bin/su - $ORACLE_DBA -c "$ORACLE_HOME/bin/dbstart $ORACLE_HOME" #> /dev/console 2>&1   
  if [ $? -ne 0 ] ; then 
    $SAFE/safekit printw "Oracle databases start failed"
#    retval=1
  else
    $SAFE/safekit printw "Oracle databases started"
  fi

  return $retval
}

#---------- Oracle SQL*Net Listener
# Call this function for starting Oracle Listener
start_listener()
{
  retval=0

  # Oracle - Listener Starting
  LISTENER_STATE=`$SAFEBIN/killit list tnslsnr`     
  if [ "$LISTENER_STATE" != "" ]; then              
    $SAFE/safekit printw "Oracle Listener already started"         
    return $retval                                  
  fi                                                

  $SAFE/safekit printw "Starting Oracle Listener"
  /bin/su - $ORACLE_DBA -c "$ORACLE_HOME/bin/lsnrctl start" #> /dev/console 2>&1
  if [ $? -ne 0 ] ; then
    $SAFE/safekit printw "Oracle Listener start failed"
  else
    $SAFE/safekit printw "Oracle Listener started"
  fi

  return $retval
}

# stdout goes into Application log
echo "Running start_prim $*" 

res=0

[ -z "$OSNAME" ] && OSNAME=`uname -s`
OSNAME=`uname -s`
case "$OSNAME" in
    Linux)
	AWK=/bin/awk
	;;
    *)
	AWK=/usr/bin/awk
	;;
esac

# stdout goes into Application log
echo "Running start_prim $*" 

# TODO
# remove oracle boot start  

# WARNING: all databases defined in /etc/oratab are started
#

# Clean Oracle residual processes and shared memory to start Oracle databases in a clean state
clean_oracle || res=$?

# Start Oracle databases
start_oracle || res=$?

# Start SQL*Net Listener Oracle
start_listener || res=$?

if [ $res -ne 0 ] ; then
  $SAFE/safekit printi "start_prim failed"

  # uncomment to stop SafeKit when critical
  # $SAFE/safekit stop -i "start_prim"
fi

exit 0

stop_prim

#!/bin/sh
# Script called on the primary server for stopping applications

# For logging into SafeKit log use:
# $SAFE/safekit printi | printe "message" 

#----------------------------------------------------------
#
# 2 stop modes:
#
# - graceful stop
#   call standard application stop
#
# - force stop ($1=force)
#   kill application's processes
#
#----------------------------------------------------------

#---------- Clean Oracle residual processes and shared memory
# Call this function on force stop 
# to clean eventual resual Oracle processes and IPC
clean_oracle()
{
  retval=0

  $SAFE/safekit printw "Cleaning Oracle processes and shared memory"

  # kill started Oracle databases
  ps -e -o pid,comm |grep ora | $AWK '{print "kill -9 " $1}'| sh >/dev/null 2>&1

  # delete oracle shared memory to start in a clean state
  case $OSNAME in
 	   Linux)
	        ipcs -m |grep oracle |$AWK '{print "shm "$2 | "xargs ipcrm"}' >/dev/null 2>&1
		ipcs -s |grep oracle |$AWK '{print "sem "$2 | "xargs ipcrm"}' >/dev/null 2>&1
		;;
 	   *)
	        ipcs -m |grep oracle |$AWK '{print "-m "$2 | "xargs ipcrm"}' >/dev/null 2>&1
		ipcs -s |grep oracle |$AWK '{print "-s "$2 | "xargs ipcrm"}' >/dev/null 2>&1
		;;
  esac

  if [ -f $ORACLE_HOME/dbs/sgadef*.dbf ]; then          
    rm $ORACLE_HOME/dbs/sgadef*.dbf                   
  fi                                               

  return $retval
}

#---------- Oracle SQL*Net Listener
# Call this function for stopping Oracle Listener
stop_listener()
{
  retval=0

  if [ "$1" = "force" ] ; then
    # Oracle Listener force stop
    $SAFEBIN/killit +KILL tnslsnr 1>/dev/null 2>&1
    return $retval
  fi

  # Oracle - Listener stopping
  LISTENER_STATE=`$SAFEBIN/killit list tnslsnr`            
  if [ "$LISTENER_STATE" = "" ]; then                      
    $SAFE/safekit printw "Oracle Listener already stoppped"               
    return $retval                                         
  fi            

  $SAFE/safekit printw "Stopping Oracle Listener"
  /bin/su - $ORACLE_DBA -c "$ORACLE_HOME/bin/lsnrctl stop" #> /dev/console 2>&1
  if [ $? -ne 0 ] ; then
    $SAFE/safekit printw "Oracle Listener stop failed"
  else
    $SAFE/safekit printw "Oracle Listener stopped"
  fi

  return $retval
}

#---------- Oracle databases
# Call this function for stopping Oracle databases
stop_oracle()
{
  retval=0

  if [ "$1" = "force" ] ; then
    # Oracle databases force stop
    clean_oracle
    return $retval
  fi

  # Oracle databases gracefull stop

  # First stop the startup - shutdown command if it is running
  $SAFEBIN/killit +TERM dbstart dbshut> /dev/null 2>&1 

  # Kill oracle connections that prevent Oracle from stopping gracefully   
  ps -e -o pid,args |grep LOCAL=NO |$AWK '{print "kill " $1}'| sh > /dev/null 2>&1        
  ps -e -o pid,args |grep LOCAL=YES |$AWK '{print "kill " $1}'| sh > /dev/null 2>&1        

  $SAFE/safekit printw "Stopping Oracle databases"  
  /bin/su - $ORACLE_DBA -c "$ORACLE_HOME/bin/dbshut $ORACLE_HOME " #> /dev/console 2>&1
  if [ $? -ne 0 ] ; then 
    $SAFE/safekit printw "Oracle databases stop failed"
  else
    $SAFE/safekit printw "Oracle databases stopped"  
  fi

  return $retval
}

# stdout goes into Application log
echo "Running stop_prim $*" 

res=0

[ -z "$OSNAME" ] && OSNAME=`uname -s`
case "$OSNAME" in
    Linux)
	AWK=/bin/awk
	;;
    *)
	AWK=/usr/bin/awk
	;;
esac

mode=
if [ "$1" = "force" ] ; then
  mode=force
  shift
fi

# WARNING: all databases defined in /etc/oratab are stopped                    

# Stop Oracle SQL*Net Listener
stop_listener $mode || res=$?

# Stop Oracle databases 
stop_oracle $mode || res=$?

[ $res -ne 0 ] && $SAFE/safekit printi "stop_prim failed"

exit 0

Other examples of Windows and Linux high availability modules

Examples of mirror application modules:

Examples of farm application modules:

 

contact
CONTACT
Demonstration

Evidian SafeKit Pricing





White Papers

contact
NEWS

To receive Evidian news, please fill the following form.