Table of Contents

Table of Contents

Overview .. 3

Table of Contents . 5

1. ... Technical overview .. 15

1.1 ..... Generalities, solutions, architectures 15

1.1.1 Introduction to SafeKit 15

1.1.2 SafeKit solutions . 15

1.1.3 SafeKit architectures . 16

1.1.4 SafeKit cluster definition . 16

1.1.5 SafeKit module definition . 17

1.1.6 SafeKit limitations . 17

1.2 ..... The SafeKit mirror cluster 18

1.2.1 Real time file replication and application failover 18

1.2.2 Step 1. Normal operation . 19

1.2.3 Step 2. Failover 19

1.2.4 Step 3. Failback and automatic resynchronization . 19

1.2.5 Step 4. Return to normal operation . 20

1.2.6 Synchronous replication versus asynchronous replication . 20

1.2.7 Behavior in case of network isolation . 20

1.2.8 3-node replication . 21

1.2.9 SafeKit on a single node to protect against software failures 21

1.3 ..... The SafeKit farm cluster 22

1.3.1 Network load balancing and application failover 22

1.3.2 Principle of a virtual IP address with network load balancing . 22

1.3.3 Load balancing for stateful or stateless web services 22

1.3.4 Chain high availability solution in a farm .. 23

1.4 ..... Clusters running several modules 23

1.4.1 The SafeKit farm+mirror cluster 23

1.4.2 The SafeKit active/active cluster with replication . 23

1.4.3 The SafeKit N-1 cluster 24

1.5 ..... The SafeKit Hyper-V or KVM cluster 25

1.5.1 Load balancing, replication, failover of entire virtual machines . 25

1.6 ..... SafeKit clusters in the cloud . 25

1.6.1 Mirror cluster in Azure, AWS and GCP . 25

1.6.2 Farm cluster in Azure, AWS and GCP . 26

2. ... Installation . 29

2.1 ..... SafeKit install 29

2.1.1 Download the package . 29

2.1.2 Installation directories and disk space provisioning . 30

2.1.3 SafeKit install procedure . 30

2.1.4 Use the SafeKit web console or command line interface . 33

2.1.5 SafeKit license keys . 34

2.1.6 System specific procedures and characteristics 35

2.2 ..... Mirror installation recommendation . 36

2.2.1 Hardware and system prerequisites 36

2.2.2 Network prerequisites . 36

2.2.3 Application prerequisites . 36

2.2.4 File replication prerequisites . 36

2.3 ..... Farm installation recommendation . 36

2.3.1 Hardware and system prerequisites 37

2.3.2 Network prerequisites . 37

2.3.3 Application prerequisites . 37

2.4 ..... SafeKit upgrade . 37

2.4.1 Prepare the upgrade . 37

2.4.2 Uninstall procedure . 37

2.4.3 Reinstall and postinstall procedure . 38

2.5 ..... SafeKit full uninstall 40

2.5.1 Uninstall on Windows as administrator 40

2.5.2 Uninstall on Linux as root 41

2.6 ..... SafeKit documentation . 42

3. ... The SafeKit web console . 43

3.1 ..... Start the web console . 43

3.1.1 Start a web browser 43

3.1.2 Connect to a SafeKit node . 43

3.1.3 List of connection nodes 44

3.1.4 Use the SafeKit web application . 45

3.1.5 Update the web console . 47

3.2 ..... Configure the cluster 47

3.2.1 Cluster configuration wizard . 48

3.2.2 Cluster configuration home page . 51

3.3 ..... Configure a module . 52

3.3.1 Select the new module to configure . 53

3.3.2 Module configuration wizard . 54

3.3.3 Modules configuration home page . 59

3.3.4 Edit the module configuration locally and then apply it 61

3.4 ..... Monitor a module . 62

3.4.1 Monitoring home page . 62

3.4.2 Module state . 63

3.4.3 Module control menus 65

3.4.4 Module details . 68

3.4.5 Module states timeline . 73

3.5 ..... Snapshots or logs of module for debug and support 74

3.6 ..... Secure access to the web console . 75

4. ... Tests . 77

4.1 ..... Installation and tests after boot 77

4.1.1 Test package installation . 77

4.1.2 Test license and version . 78

4.1.3 Test SafeKit services and modules after boot 78

4.1.4 Test start of SafeKit web console . 80

4.2 ..... Tests of a mirror module . 81

4.2.1 Test first start of a mirror module on 2 servers STOP (NotReady) . 81

4.2.2 Test start of a mirror module on 2 servers STOP (NotReady) . 81

4.2.3 Test stop of a mirror module on the server PRIM (Ready) . 81

4.2.4 Test start of a mirror module on the server STOP (NotReady) . 82

4.2.5 Test restart of a mirror module on the server PRIM (Ready) . 82

4.2.6 Test virtual IP address of a mirror module . 82

4.2.7 Test file replication of a mirror module . 83

4.2.8 Test shutdown of the server PRIM (Ready) . 84

4.2.9 Test power-off of the server PRIM (Ready) . 85

4.2.10 Test split-brain with a mirror module . 85

4.2.11 Continue your mirror module tests with checkers 86

4.3 ..... Tests of a farm module . 87

4.3.1 Test start of a farm module on all servers STOP (NotReady) . 87

4.3.2 Test stop of a farm module on one server UP (Ready) . 87

4.3.3 Test restart of a farm module on one server UP(Ready) . 87

4.3.4 Test virtual IP address of a farm module . 87

4.3.5 Test TCP load balancing on a virtual IP address . 89

4.3.6 Test split-brain with a farm module . 90

4.3.7 Test compatibility of the network with invisible MAC address (vmac_invisible) 91

4.3.8 Test shutdown of a server UP (Ready) . 92

4.3.9 Test power-off of a server UP (Ready) . 93

4.3.10 Continue your farm module tests with checkers . 93

4.4 ..... Tests of checkers common to mirror and farm .. 93

4.4.1 Test <errd> checker with action restart or stopstart 93

4.4.2 Test <tcp> checker with action restart or stopstart 94

4.4.3 Test <tcp> checker with action wait 95

4.4.4 Test <interface check="on"> with action wait 96

4.4.5 Test <ping> checker with action wait 96

4.4.6 Test <module> checker with action wait 97

4.4.7 Test <custom> checker with action wait 98

4.4.8 Test <custom> checker with action restart or stopstart 99

5. ... Mirror module administration . 103

5.1 ..... Operating mode of a mirror module . 103

5.2 ..... State automaton of a mirror module (STOP, WAIT, ALONE, PRIM, SECOND - NotReady, Transient, Ready) 105

5.3 ..... First start-up of a mirror module ( safekit prim command) 106

5.4 ..... Different reintegration cases (use of bitmaps) 107

5.5 ..... Start-up of a mirror module with the up-to-date data STOP (NotReady) - WAIT (NotReady) . 108

5.6 ..... Degraded replication mode ( ALONE (Ready) degraded) 109

5.7 ..... Automatic or manual failover 110

5.8 ..... Default primary server (automatic swap after reintegration) 112

5.9 ..... Prim command fails: why? ( safekit primforce command) 113

6. ... Farm module administration . 115

6.1 ..... Operating mode of a farm module . 115

6.2 ..... State automaton of a farm module (STOP, WAIT, UP - NotReady, Transient, Ready) 116

6.3 ..... Start-up of a farm module . 117

7. ... Troubleshooting . 119

7.1 ..... Connection issues with the web console . 119

7.1.1 Browser check . 119

7.1.2 Browser state clear 120

7.1.3 Server check . 120

7.2 ..... Connection issues with the HTTPS web console . 120

7.2.1 Check server certificates . 121

7.2.2 Check certificates installed in SafeKit 122

7.2.3 Revert to HTTP configuration . 123

7.3 ..... How to read logs and resources of the module? 123

7.4 ..... How to read the commands log of the server? 124

7.5 ..... Stable module (Ready) and (Ready) . 124

7.6 ..... Degraded module (Ready) and / (NotReady) . 124

7.7 ..... Out of service module / (NotReady) and / (NotReady) . 124

7.8 ..... Module STOP (NotReady) : start the module . 125

7.9 ..... Module WAIT (NotReady): repair the resource="down" 125

7.10 ... Module oscillating from (Ready) to (Transient) . 126

7.11 ... Message on stop after maxloop . 127

7.12 ... Module (Ready) but non-operational application . 127

7.13 ... Mirror module ALONE (Ready) - WAIT/ STOP (NotReady) 128

7.14 ... Farm module UP(Ready) but problem of load balancing in a farm .. 129

7.14.1 Reported network load share are not coherent 129

7.14.2 virtual IP address does not respond properly . 129

7.15 ... Problem with the virtual IP after failover 130

7.16 ... Problem after Boot 131

7.17 ... Analysis from snapshots of the module . 131

7.17.1 Module configuration files 132

7.17.2 Module dump files . 132

7.18 ... Problem with the size of SafeKit databases 135

7.19 ... Problem for retrieving the certification authority certificate from an external PKI 136

7.19.1 Export CA certificate(s) from public certificates 136

7.20 ... Issue with email sending by the SafeKit notification agent 139

7.20.1 Failed to read or parse the configuration file . 139

7.20.2 Email sending test blocked . 140

7.20.3 Curl errors 140

7.21 ... Issue with antivirus 141

7.22 ... Still in Trouble . 141

8. ... Evidian support 143

9. ... Command line interface . 145

9.1 ..... Commands to control and setup SafeKit 145

9.1.1 safeadmin service . 145

9.1.2 safewebserver service . 146

9.1.3 Email notification agent 147

9.1.4 SNMP service . 148

9.2 ..... Command lines to configure and monitor the cluster 148

9.3 ..... Command lines to control modules 150

9.4 ..... Command lines to monitor modules 152

9.5 ..... Command lines to configure modules 153

9.6 ..... Command lines for support 155

9.7 ..... Command lines during the maintenance of the module application . 156

9.7.1 Module control for maintenance . 156

9.7.2 Running the application without the module . 157

9.8 ..... Command lines distributed across multiple SafeKit servers 157

9.9 ..... Examples 159

9.9.1 Local and distributed command . 159

9.9.2 Cluster configuration with command line . 160

9.9.3 Module configuration with command line . 160

9.9.4 Module snapshot with command line . 160

10. Advanced administration and setup . 161

10.1 ... SafeKit environment variables and directories 161

10.1.1 Global 161

10.1.2 Module . 161

10.2 ... SafeKit services and daemons 163

10.2.1 SafeKit services 163

10.2.2 SafeKit daemons per module . 164

10.3 ... Firewall settings 164

10.3.1 Firewall settings in Linux . 165

10.3.2 Firewall settings in Windows . 166

10.3.3 Other firewalls . 166

10.4 ... Boot and shutdown setup in Windows 169

10.4.1 Automatic procedure . 170

10.4.2 Manual procedure . 170

10.5 ... Linux Secure boot settings for SafeKit kernel modules 170

10.6 ... Antivirus settings 171

10.7 ... Encryption of module communications 172

10.7.1 Configuration with the SafeKit Web console . 172

10.7.2 Configuration with the Command Line Interface . 172

10.7.3 Advanced configuration . 173

10.8 ... Encryption of sensitive files in SafeKit 174

10.9 ... SafeKit web service settings 175

10.9.1 Configuration files 176

10.9.2 Connection ports configuration . 177

10.9.3 HTTP/HTTPS and user authentication configuration . 177

10.9.4 SafeKit API 178

10.10 . SafeKit email notification agent 178

10.10.1 SafeKit notification agent configuration . 179

10.10.2 SMTP client credentials setup for authentication . 180

10.10.3 Email sending test 180

10.10.4 SafeKit notification agent activation . 181

10.11 . SNMP monitoring . 181

10.11.1 SNMP monitoring in Windows 181

10.11.2 SNMP monitoring in Linux . 182

10.11.3 The SafeKit MIB . 183

10.12 . Commands log of the SafeKit server 183

10.13 . SafeKit log messages in system log . 184

11. Securing the SafeKit web service . 185

11.1 ... Overview . 185

11.1.1 Default setup . 186

11.1.2 Predefined setups 186

11.2 ... HTTP setup . 187

11.2.1 Default setup . 187

11.2.2 Unsecure setup based on identical role for all 189

11.3 ... HTTPS setup . 190

11.3.1 HTTPS setup using the SafeKit PKI 191

11.3.2 HTTPS setup using an external PKI 199

11.4 ... User authentication setup . 203

11.4.1 File-based authentication setup . 203

11.4.2 LDAP/AD authentication setup . 206

11.4.3 OpenID authentication setup . 208

12. Cluster.xml for the SafeKit cluster configuration . 211

12.1 ... Cluster.xml file . 211

12.1.1 Cluster.xml example . 211

12.1.2 Cluster.xml syntax . 212

12.1.3 <lans>, <lan>, <node> attributes . 212

12.2 ... SafeKit cluster Configuration . 214

12.2.1 Configuration with the SafeKit web console . 214

12.2.2 Configuration with command line . 215

12.2.3 Configuration changes . 215

13. Userconfig.xml for a module configuration . 217

13.1 ... Time-based attributes 218

13.1.1 Time-based attribute example . 218

13.1.2 Time-based attribute syntax . 218

13.2 ... Macros - <macro> . 219

13.2.1 <macro> example . 219

13.2.2 <macro> syntax . 219

13.2.3 <macro> attributes 219

13.3 ... Farm or mirror module - <service> . 220

13.3.1 <service> example . 220

13.3.2 <service> syntax . 220

13.3.3 <service> attributes 220

13.4 ... Heartbeats - <heart>, <heartbeat > . 223

13.4.1 <heart> example . 223

13.4.2 <heart> syntax . 224

13.4.3 <heart>, <heartbeat > attributes . 224

13.5 ... Farm topology - <farm>, <lan> . 226

13.5.1 <farm> example . 226

13.5.2 <farm> syntax . 226

13.5.3 <farm>, <lan> attributes 226

13.6 ... Virtual IP address - <vip> . 228

13.6.1 <vip> example in a mirror module . 228

13.6.2 <vip> example in a farm module . 228

13.6.3 Alternative to <vip> for servers in different networks . 228

13.6.4 <vip> syntax . 229

13.6.5 <vip><interface_list>, <interface>, <virtual_interface>, <real_interface>, <virtual_addr> attributes . 230

13.6.6 <loadbalancing_list>, <group>, <cluster>, <host> attributes . 234

13.6.7 <vip> Load balancing description . 235

13.7 ... File replication - <rfs>, <replicated> . 236

13.7.1 <rfs> example . 237

13.7.2 <rfs> syntax . 237

13.7.3 <rfs>, <replicated> attributes 238

13.7.4 <rfs> description . 246

13.8 ... Module scripts - <user>, <var> . 255

13.8.1 <user> example . 255

13.8.2 <user> syntax . 255

13.8.3 <user>, <var> attributes 255

13.9 ... Virtual hostname - <vhost>, <virtualhostname> . 256

13.9.1 <vhost> example . 256

13.9.2 <vhost> syntax . 256

13.9.3 <vhost>, <virtualhostname> attributes 257

13.9.4 <vhost> description . 257

13.10 . Process or service monitoring - <errd>, <proc> . 258

13.10.1 <errd> example . 258

13.10.2 <errd> syntax . 258

13.10.3 <errd>, <proc> attributes 259

13.10.4 <errd> commands 262

13.11 . Checkers - <check> . 264

13.11.1 <check> example . 264

13.11.2 <check> syntax . 264

13.11.3 <checker> description . 265

13.12 . TCP checker - <tcp> . 268

13.12.1 <tcp> example . 268

13.12.2 <tcp> syntax . 269

13.12.3 <tcp> attributes . 269

13.13 . Ping checker - <ping> . 271

13.13.1 <ping> example . 271

13.13.2 <ping> syntax . 271

13.13.3 <ping> attributes . 272

13.14 . Interface checker - <intf> . 273

13.14.1 <intf> example . 273

13.14.2 <intf> syntax . 274

13.14.3 <intf> attributes . 274

13.15 . IP checker - <ip> . 275

13.15.1 <ip> example . 275

13.15.2 <ip> syntax . 275

13.15.3 <ip> attributes . 275

13.16 . Custom checker - <custom> . 276

13.16.1 <custom> example . 276

13.16.2 <custom> syntax . 277

13.16.3 <custom> attributes . 277

13.17 . Module checker - <module> . 279

13.17.1 <module> example . 279

13.17.2 <module> syntax . 280

13.17.3 <module> attributes 280

13.18 . Splitbrain checker - <splitbrain> . 281

13.18.1 <splitbrain> example . 282

13.18.2 <splitbrain> syntax . 282

13.18.3 <splitbrain> attributes 282

13.19 . Failover machine - <failover> . 283

13.19.1 <failover> example . 284

13.19.2 <failover> syntax . 284

13.19.3 <failover> attributes . 285

13.19.4 <failover> description . 285

14. Scripts for a module configuration . 289

14.1 ... List of scripts 289

14.1.1 Start/stop scripts . 289

14.1.2 Other scripts . 291

14.2 ... Variables and arguments passed to scripts 291

14.3 ... Scripts output 292

14.3.1 Output into script log . 292

14.3.2 Output into module log . 292

14.4 ... Scripts execution automaton . 293

14.5 ... SafeKit special commands for scripts 294

14.5.1 Commands for Windows . 295

14.5.2 Commands for Linux . 295

14.5.3 Commands for Windows and Linux . 296

15. Examples of module configurations . 299

15.1 ... Mirror module example with mirror.safe . 299

15.1.1 Cluster configuration with two networks . 300

15.1.2 Mirror module configurations . 300

15.1.3 Mirror module scripts . 303

15.2 ... Farm module example with farm.safe . 305

15.2.1 Cluster configuration with three nodes 305

15.2.2 Farm module configurations 306

15.2.3 Farm module scripts 313

15.3 ... Macro and script variables example with hyperv.safe . 316

15.3.1 Module configuration with macros and var 316

15.3.2 Module scripts with var 317

15.4 ... Process monitoring example with softerrd.safe . 318

15.4.1 Module configuration with process monitoring . 318

15.4.2 Advanced configuration of module scripts . 319

15.5 ... TCP checker example . 321

15.6 ... Ping checker example . 322

15.7 ... Custom checker example with customchecker.safe . 324

15.7.1 Module configuration with custom checker 324

15.7.2 Advanced configuration of module checker script 326

15.8 ... Split-brain checker example . 327

15.9 ... Module checker examples 328

15.9.1 Example of a farm module depending on a mirror module . 328

15.9.2 Example with leader.safe and follower.safe . 330

15.10 . Interface checker example . 330

15.11 . IP checker example . 331

15.12 . Virtual hostname example with vhost.safe . 332

15.12.1 Module configuration with a virtual hostname . 332

15.12.2 Module scripts with a virtual hostname . 333

16. SafeKit cluster in the cloud . 337

16.1 ... SafeKit cluster in Amazon AWS . 337

16.1.1 Mirror cluster in AWS . 338

16.1.2 Farm cluster in AWS . 339

16.2 ... SafeKit cluster in Microsoft Azure . 340

16.2.1 Mirror cluster in Azure . 341

16.2.2 Farm cluster in Azure . 343

16.3 ... SafeKit cluster in Google GCP . 344

16.3.1 Mirror cluster in GCP . 345

16.3.2 Farm cluster in GCP . 346

17. Third-Party Software . 349

Log Messages Index . 353

Index . 357

PDF