SafeKit User's Guide
SafeKit 8.2
Subject |
This document covers all the phases of the SafeKit implementation: architecture, installation, tests, administration & troubleshooting, support, and command line interface. |
|
Intended |
Architectures |
|
Installation |
||
Console |
||
Advanced configuration |
Cluster.xml for the SafeKit cluster configuration Userconfig.xml for a module configuration |
|
Administration |
||
Support |
||
Other |
||
Release |
SafeKit 8.2 |
|
Supported OS |
Windows and Linux; for a detailed list of supported OS, see here |
|
Web Site |
Evidian marketing site: http://www.evidian.com/safekit Evidian support site: https://support.evidian.com/safekit |
|
Ref |
39 A2 38MC 04 |
|
If you have any comments or questions related to this documentation, please contact us at https://www.evidian.com/company/contact-evidian/ |
Copyright © Evidian, 2024
The trademarks mentioned in this document are the propriety of their respective owners.
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical or otherwise without the prior written permission of the publisher.
Evidian disclaims the implied warranties of merchantability and fitness for a particular purpose and makes no express warranties except as may be stated in its written agreement with and for its customer. In no event is Evidian liable to anyone for any indirect, special, or consequential damages.
The information and specifications in this document are subject to change without notice. Consult your Evidian Marketing Representative for product or service availability.
High Availability Software for Critical Applications. 1
1.... High availability architectures. 15
1.1..... SafeKit cluster definition. 15
1.2..... SafeKit module definition - application integration. 15
1.3..... Mirror module: synchronous real time file replication and failover 16
1.3.1 File replication and failover 16
1.3.2 Step 1. Normal operation. 16
1.3.4 Step 3. Failback and reintegration. 17
1.3.5 Step 4. Return to normal operation. 17
1.3.6 Synchronous, fault-tolerant replication that loses no data when a server fails. 18
1.4..... Farm module: network load balancing and failover 18
1.4.1 Network load balancing and failover 18
1.4.2 Principle of a virtual IP address with network load balancing. 19
1.4.3 Load balancing for stateful or stateless web services 19
1.5..... Combining mirror and farm modules 20
1.5.1 Active/Active: 2 mirror modules backuping each other 20
1.5.2 N-to-1: N mirror modules with a single backup. 20
1.6..... The simplest high availability cluster in the cloud. 22
1.6.1 Mirror cluster in Microsoft Azure, Amazon AWS and Google GCP. 22
1.6.2 Farm cluster in Microsoft Azure, Amazon AWS and Google GCP. 23
2.1.1 Download the package. 25
2.1.2 Installation directories and disk space provisioning. 25
2.1.3 SafeKit install procedure. 26
2.1.4 Use the SafeKit web console or command line interface. 28
2.1.5 SafeKit license keys. 29
2.1.6 System specific procedures and characteristics 29
2.2..... Mirror installation recommendation. 30
2.2.1 Hardware prerequisites. 30
2.2.2 Network prerequisites. 30
2.2.3 Application prerequisites. 30
2.2.4 File replication prerequisites. 30
2.3..... Farm installation recommendation. 31
2.3.1 Hardware prerequisites. 31
2.3.2 Network prerequisites. 31
2.3.3 Application prerequisites. 31
2.4.3 Reinstall and postinstall procedure. 32
2.5..... SafeKit full uninstall 34
2.5.1 Uninstall on Windows as administrator 34
2.5.2 Uninstall on Linux as root 34
2.6..... SafeKit documentation. 35
3.... The SafeKit web console. 37
3.1..... Start the web console. 37
3.1.2 Connect to a SafeKit server 38
3.2..... Configure the cluster 39
3.2.1 Cluster configuration wizard. 39
3.2.2 Cluster configuration home page. 42
3.3..... Configure a module. 44
3.3.1 Select the new module to configure. 44
3.3.2 Module configuration wizard. 46
3.3.3 Modules configuration home page. 51
3.3.4 Procedure to locally edit the module configuration and then apply it 53
3.4.1 Monitoring home page. 54
3.4.5 Module states timeline. 65
3.5..... Snapshots or logs of module for debug and support 66
3.6..... Secure access to the web console. 67
4.1..... Installation and tests after boot 69
4.1.1 Test package installation. 69
4.1.2 Test license and version. 70
4.1.3 Test SafeKit services and processes running after boot 71
4.1.4 Test start of SafeKit web console. 72
4.2..... Tests of a mirror module. 72
4.2.1 Test start of a mirror module on 2 servers STOP (NotReady). 72
4.2.2 Test stop of a mirror module on the server PRIM (Ready). 72
4.2.3 Test start of a mirror module on the server STOP (NotReady). 73
4.2.4 Test restart of a mirror module on the server PRIM (Ready). 73
4.2.5 Test swap of a mirror module from one server to the other 73
4.2.6 Test virtual IP address of a mirror module. 74
4.2.7 Test file replication of a mirror module. 75
4.2.8 Test mirror module shutdown on the server PRIM (Ready). 76
4.2.9 Test mirror module power-off on the server PRIM (Ready). 77
4.2.10 Test split brain with a mirror module. 78
4.2.11 Continue your mirror module tests with checkers 78
4.3..... Tests of a farm module. 79
4.3.1 Test start of a farm module on all servers STOP (NotReady). 79
4.3.2 Test stop of a farm module on one server UP (Ready). 79
4.3.3 Test restart of a farm module on one server UP(Ready). 79
4.3.4 Test virtual IP address of a farm module. 80
4.3.5 Test TCP load balancing on a virtual IP address. 82
4.3.6 Test split brain with a farm module. 83
4.3.7 Test compatibility of the network with invisible MAC address (vmac_invisible) 84
4.3.8 Test farm module shutdown of a server UP (Ready). 85
4.3.9 Test farm module power-off of a server UP (Ready). 85
4.3.10 Continue your farm module tests with checkers. 85
4.4..... Tests of checkers common to mirror and farm.. 86
4.4.1 Test <errd>: checker of process with action restart or stopstart 86
4.4.2 Test <tcp> checker of the local application with action restart or stopstart 87
4.4.3 Test <tcp> checker of an external service with action wait 88
4.4.4 Test <interface check="on"> on a local network interface and with action wait 89
4.4.5 Test <ping> checker with action wait 90
4.4.6 Test <module> checker with action wait 91
4.4.7 Test <custom> checker with action wait 92
4.4.8 Test <custom> checker with action restart or stopstart 93
5.... Mirror module administration. 95
5.1..... Operating mode of a mirror module. 96
5.3..... First start-up of a mirror module (safekit prim command) 98
5.4..... Different reintegration cases (use of bitmaps) 99
5.5..... Start-up of a mirror module with the up-to-date data STOP (NotReady) - WAIT (NotReady). 100
5.6..... Degraded replication mode (ALONE (Ready) degraded) 101
5.7..... Automatic or manual failover 103
5.8..... Default primary server (automatic swap after reintegration) 105
5.9..... Prim command fails: why? (safekit primforce command) 106
6.... Farm module administration. 107
6.1..... Operating mode of a farm module. 107
6.2..... State automaton of a farm module (STOP, WAIT, UP - NotReady, Transient, Ready) 108
6.3..... Start-up of a farm module. 109
7.1..... Connection issues with the web console. 111
7.2..... Connection issues with the HTTPS web console. 113
7.2.1 Check server certificates. 113
7.2.2 Check certificates installed in SafeKit 115
7.2.3 Revert to HTTP configuration. 115
7.3..... How to read logs and resources of the module? 116
7.4..... How to read the commands log of the server? 116
7.5..... Stable module (Ready) and (Ready). 117
7.6..... Degraded module (Ready)and /(NotReady). 117
7.7..... Out of service module /(NotReady) and /(NotReady). 117
7.8..... Module STOP (NotReady): restart the module. 118
7.9..... Module WAIT (NotReady): repair the resource="down". 119
7.10... Module oscillating from (Ready) to (Transient). 120
7.11... Message on stop after maxloop. 121
7.12... Module (Ready) but non-operational application. 122
7.13... Mirror module ALONE (Ready) - WAIT/STOP (NotReady). 123
7.14... Farm module UP(Ready)but problem of load balancing in a farm.. 124
7.14.1 Reported network load share are not coherent 124
7.14.2 virtual IP address does not respond properly. 124
7.15... Problem after Boot 124
7.16... Analysis from snapshots of the module. 125
7.16.1 Module configuration files 125
7.17... Problem with the size of SafeKit databases 128
7.18... Problem for retrieving the certification authority certificate from an external PKI 129
7.18.1 Export CA certificate(s) from public certificates 130
8.... Access to Evidian support 133
8.1..... Home page of support site. 133
8.2..... Permanent license keys 134
8.3..... Create an account 135
8.4..... Access to your account 135
8.5..... Call desk to open a trouble ticket 136
8.5.1 Call desk operations 136
8.5.3 Attach the snapshots. 137
8.5.4 Answers to a call and exchange with support 138
8.6..... Download and upload area. 139
8.6.1 Two areas of download and upload. 139
8.6.2 Product download area. 139
8.6.3 Private upload area. 140
9.... Command line interface. 141
9.1..... Distributed commands 141
9.2..... Command lines for boot and for shutdown. 143
9.3..... Command lines to configure and monitor safekit cluster 144
9.4..... Command lines to control modules 146
9.5..... Command lines to monitor modules 148
9.6..... Command lines to configure modules 149
9.7..... Command lines for support 151
9.8.1 Cluster configuration with command line. 152
9.8.2 New module configuration with command line. 152
9.8.3 Module snapshot with command line. 153
10.. Advanced administration. 155
10.1... SafeKit environment variables and directories 155
10.2... SafeKit processes and services 157
10.3.1 Firewall settings in Linux. 158
10.3.2 Firewall settings in Windows. 159
10.4... Boot and shutdown setup in Windows 163
10.4.1 Automatic procedure. 163
10.5... Securing module internal communications 164
10.5.1 Configuration with the SafeKit Web console. 164
10.5.2 Configuration with the Command Line Interface. 164
10.5.3 Advanced configuration. 165
10.6... SafeKit web service configuration. 166
10.6.1 Configuration files 167
10.6.2 Connection ports configuration. 168
10.6.3 HTTP/HTTPS and user authentication configuration. 169
10.7... Mail notification. 169
10.8.1 SNMP monitoring in Windows 170
10.8.2 SNMP monitoring in Linux. 170
10.9... Commands log of the SafeKit server 171
10.10. SafeKit log messages in system journal 172
11.. Securing the SafeKit web service. 175
11.2.2 Unsecure setup based on identical role for all 179
11.3.1 HTTPS setup using the SafeKit PKI 181
11.3.2 HTTPS setup using an external PKI 189
11.4... User authentication setup. 194
11.4.1 File-based authentication setup. 194
11.4.2 LDAP/AD authentication setup. 196
11.4.3 OpenID authentication setup. 199
12.. Cluster.xml for the SafeKit cluster configuration. 202
12.1.1 Cluster.xml example. 202
12.1.2 Cluster.xml syntax. 203
12.1.3 <lans>, <lan>, <node> attributes. 203
12.2... SafeKit cluster Configuration. 204
12.2.1 Configuration with the SafeKit web console. 204
12.2.2 Configuration with command line. 205
12.2.3 Configuration changes. 206
13.. Userconfig.xml for a module configuration. 207
13.1... Macro definition - <macro>. 208
13.2... Farm or mirror module - <service>. 209
13.2.3 <service> attributes 209
13.3... Heartbeats - <heart>, <heartbeat >. 211
13.3.3 <heart>, <heartbeat > attributes. 213
13.4... Farm topology - <farm>, <lan>. 214
13.4.3 <farm>, <lan> attributes 215
13.5... Virtual IP address - <vip>. 215
13.5.1 <vip> example in a mirror module. 215
13.5.2 <vip> example in a farm module. 216
13.5.3 Alternative to <vip> for servers in different networks. 216
13.5.6 <loadbalancing_list>, <group>, <cluster>, <host> attributes. 221
13.5.7 <vip> Load balancing description. 222
13.6... File replication - <rfs>, <replicated>. 223
13.6.3 <rfs>, <replicated> attributes 225
13.7... Enable module scripts - <user>, <var>. 241
13.7.3 <user>, <var> attributes 242
13.8... Virtual hostname - <vhost>, <virtualhostname>. 242
13.8.3 <vhost>, <virtualhostname> attributes 243
13.8.4 <vhost> description. 243
13.9... Process or service monitoring - <errd>, <proc>. 244
13.9.3 <errd>, <proc> attributes 245
13.10. Checkers - <check>. 250
13.11. TCP checker - <tcp>. 251
13.12. Ping checker - <ping>. 253
13.12.3 <ping> attributes. 253
13.13. Interface checker - <intf>. 254
13.13.3 <intf> attributes. 255
13.15. Custom checker - <custom>. 256
13.15.3 <custom> attributes. 257
13.16. Module checker - <module>. 258
13.16.3 <module> attributes 259
13.17. Splitbrain checker - <splitbrain>. 260
13.17.1 <splitbrain> example. 261
13.17.2 <splitbrain> syntax. 261
13.17.3 <splitbrain> attributes 261
13.18. Failover machine - <failover>. 262
13.18.1 <failover> example. 262
13.18.2 <failover> syntax. 262
13.18.3 <failover> attributes. 262
13.18.4 <failover> commands. 263
14.. Scripts for a module configuration. 265
14.1.1 Start/stop scripts. 265
14.2.1 Output into script log. 267
14.2.2 Output into module log. 267
14.3... Scripts execution automaton. 267
14.4... Variables and arguments passed to scripts 268
14.5... SafeKit special commands for scripts 269
14.5.1 Commands for Windows. 269
14.5.2 Commands for Linux. 269
14.5.3 Commands for Windows and Linux. 270
15.. Examples of module configurations. 273
15.1... Mirror module example with mirror.safe. 274
15.1.1 Cluster configuration with two networks. 274
15.1.2 Mirror module configurations. 275
15.1.3 Mirror Module scripts. 276
15.2... Farm module example with farm.safe. 277
15.2.1 Cluster configuration with three nodes 278
15.2.2 Farm module configurations 279
15.2.3 Farm module scripts 283
15.3... Macro and script variables example with hyperv.safe. 284
15.3.1 Module configuration with macros and var 284
15.3.2 Module scripts with var 285
15.4... Process monitoring example with softerrd.safe. 286
15.4.1 Module configuration with process monitoring. 286
15.4.2 Advanced configuration of module scripts. 287
15.5... TCP checker example. 289
15.6... Ping checker example. 291
15.7... Custom checker example with customchecker.safe. 292
15.8... Splitbrain checker example. 294
15.9... Module checker examples 294
15.9.1 Example of a farm module depending on a mirror module. 294
15.9.2 Example with leader.safe and follower.safe. 294
15.10. Interface checker example. 295
15.11. IP checker example. 296
15.12. Mail notification example with notification.safe. 297
15.12.1 Notification on the start and the stop of the module. 297
15.12.2 Notification on module state changes. 298
15.13. Virtual hostname example with vhost.safe. 300
16.. SafeKit cluster in the cloud. 303
16.1... SafeKit cluster in Amazon AWS. 303
16.1.1 Mirror cluster in AWS. 304
16.1.2 Farm cluster in AWS. 305
16.2... SafeKit cluster in Microsoft Azure. 307
16.2.1 Mirror cluster in Azure. 307
16.2.2 Farm cluster in Azure. 309
16.3... SafeKit cluster in Google GCP. 310
16.3.1 Mirror cluster in GCP. 311
16.3.2 Farm cluster in GCP. 313
1. High availability architectures
Section 1.1 “SafeKit cluster definition”
Section 1.2 “SafeKit module definition - application integration”
Section 1.3 “Mirror module: synchronous real time file replication and failover”
Section 1.4 “Farm module: network load balancing and failover”
Section 1.5 “Combining mirror and farm modules”
Section 1.6 “The simplest high availability cluster in the cloud”
1.1 SafeKit cluster definition
A SafeKit cluster is a set of servers where SafeKit is installed and running.
All servers belonging to a given SafeKit cluster share the same cluster configuration (list of servers and networks used) and communicate with each other’s to have a global view of SafeKit modules configurations. The same server can not belong to many SafeKit clusters.
Setting the cluster configuration is a prerequisite to SafeKit modules installation and configuration since the 7.2 release of SafeKit and of the web console. The cluster configuration is set through the web console as described in section 3.2. The web console provides the ability to administer one or more SafeKit clusters.
1.2 SafeKit module definition - application integration
A SafeKit module is associated with an application. A module is customizable by the user, and it defines the behavior of the high availability solution for the application. Different modules can be defined for different applications.
In practice, an application module is an easy-to-setup file that contains:
· a main configuration file userconfig.xml, which lists networks used for communication between servers, files to replicate in real time (for a mirror module), virtual IP configuration, network load balancing criteria (for a farm module) and more...
· application stop and start scripts
· SafeKit offers two types of modules detailed in this chapter:
· the mirror module
· the farm module
· Combining multiple application modules allows the implementation of advanced architectures:
· active/active: 2 mirror modules backuping each other
· N-1: N mirror modules with a single backup
· mixed farm and mirror: mixing network load balancing, file replication and failover
1.3 Mirror module: synchronous real time file replication and failover
1.3.1 File replication and failover
The mirror architecture is a primary-backup high-availability solution that is suitable for all applications. The application runs on a primary server and is restarted automatically on a secondary server if the primary server fails.
The mirror architecture can be configured with or without file replication. With its file-replication function, this architecture is particularly suitable for providing high availability for back-end applications with critical data to protect against failure. Indeed, the secondary server data are highly synchronized with the primary server and the failover is done on the secondary server from the most up-to-date data. If the application availability is more critical than the application data synchronization, the default policy can be relaxed by allowing a failover on the secondary server when the time elapsed since the last synchronization is below a configurable delay.
Microsoft SQL Server.Safe, MySQL.Safe, and Oracle.Safe are examples of "mirror" type application modules. You can write your own mirror module for your application, based on the generic module Mirror.Safe.
The failover mechanism works as follows.
1.3.2 Step 1. Normal operation
For replication, only the names of file directories are configured in SafeKit. There are no pre-requisites on the disk organization for the two servers. Directories to replicate can be located in the system disk.
Server 1 (PRIM) runs the application.
SafeKit replicates files opened by the application. Only the changes made by the application in the files are replicated in real time across the network, thus limiting traffic.
Thanks to the synchronous replication of file write operations on the disks of both servers, no data is lost in case of failure.
1.3.3 Step 2. Failover
When Server 1 fails, Server 2 takes over. SafeKit switches the cluster’s virtual IP address and restarts the application automatically on Server 2. The application finds the files replicated by SafeKit in the identical state they were when Server 1 failed, thanks to the synchronous replication. The application continues to run on Server 2, locally modifying its files, which are no longer replicated to Server 1.
The switch-over time is equal to the fault-detection time (set to 30 seconds by default) plus the application start-up time. Unlike disk replication solutions, there is no delay for remounting file systems and running recovery procedures.
1.3.4 Step 3. Failback and reintegration
Failback involves restarting Server 1 after fixing the problem that caused it to fail. SafeKit automatically resynchronizes the files, updating only the files that were modified on Server 2 while Server 1 was stopped.
This reintegration takes place without disturbing the applications, which can continue to run on Server 2. This is a major feature that differentiates SafeKit from other solutions, which require you to stop the applications on Server 2 to resynchronize Server 1.
To optimize file reintegration, different cases are considered:
1. The module must have completed the reintegration (on the first start of the module, it runs a full reintegration) before enabling the tracking of modification into bitmaps
2. If the module was cleanly stopped on the server, then at restart of the secondary, only the modified zones of modified files are reintegrated, according to a set of modification tracking bitmaps.
3. If the secondary crashed (power off) or was incorrectly stopped (exception in nfsbox replication process), the modification bitmaps are not reliable, and are therefore discarded. All the files bearing a modification timestamp more recent than the last known synchronization point minus a graceful delay (typically one hour) are reintegrated.
4. A call to the special command second fullsync triggers a full reintegration of all replicated directories on the secondary when it is restarted.
5. If files have been modified on the primary or secondary server while SafeKit was stopped, the replicated directories are fully reintegrated on the secondary
1.3.5 Step 4. Return to normal operation
After reintegration, the files are once again in mirror mode, as in step 1. The system is back in high-availability mode, with the application running on Server 2 and SafeKit replicating file updates to the backup Server 1.
If the administrator wants to run the application on Server 1, he/she can execute a swap command either manually at an appropriate time, or automatically through configuration.
1.3.6 Synchronous, fault-tolerant replication that loses no data when a server fails
There is a significant difference between synchronous replication, as offered by the SafeKit mirror solution, and asynchronous replication traditionally offered by other file replication solutions.
With synchronous replication, when a disk IO is performed by the application or by the file cache system on the primary server onto a replicated file, SafeKit waits for the IO acknowledgement from the local disk and from the secondary server, before sending the IO acknowledgement to the application or to the file system cache.
The synchronous, in real time, replication of files updated by an application eliminates the loss of data in case of server failure. Synchronous replication ensures that any data committed on a disk by a transactional application is also present on the secondary server.
The bandwidth required to implement synchronous data replication is in the order of magnitude of a typical modern LAN, or extended LAN between two computer rooms located a few kilometers apart.
With asynchronous replication implemented by other solutions, the IOs are placed in a queue on the primary server but the primary server does not wait for the IO acknowledgments of the secondary server. So, the data that did not have time to be copied across the network on the second server is lost if the first server fails. In particular, a transactional application loses committed data in case of failure. Asynchronous replication can be used for data replication through a low-speed WAN, to back up data remotely over more than 100 kilometers.
SafeKit provides an asynchronous solution with no data loss, ensuring the asynchrony not on the primary machine but on the secondary one. In this solution, SafeKit always waits for the acknowledgement of the two machines before sending the acknowledgement to the application or the system cache. But on the secondary, there are 2 options asynchronous or synchronous. In the asynchronous case (option <rfs async="second">), the secondary sends the acknowledgement to the primary upon receipt of the IO and writes to disk after. In the synchronous case (<rfs async="none">), the secondary writes the IO to disk and then sends the acknowledgement to the primary. The async="none" mode is required if we consider a simultaneous double power outage of two servers, with inability to restart the former primary server and requirement to re-start on the secondary.
1.4 Farm module: network load balancing and failover
1.4.1 Network load balancing and failover
The farm architecture provides both network load balancing, through transparent distribution of network traffic, and software and hardware failover. This architecture provides a simple solution for increasing system load. The same application runs on each server, and the load is balanced by the distribution of network activity between the different servers of the farm.
Farm architecture accommodates/implements well with front-end applications like web services. Apache_farm.Safe and Microsoft IIS_farm.safe are examples of farm application modules. You can make your own farm module, modified to suit your application, from the generic module Farm.safe.
1.4.2 Principle of a virtual IP address with network load balancing
The virtual IP address is configured locally on each server of the farm. The input traffic for this address is split among them at low level by a filter inside each server's kernel.
The load balancing algorithm inside the filter is based on the identity of the client packets (client IP address, client TCP port). Depending on the identity of the client packet input, a single filter instance in a server farm transmits the packet to the upper network layers; the other filter instances in other servers drop it. Once a packet is accepted by the filter on a server, only the CPU and memory of this server are used by the application that responds to the request of the client. The output messages are sent directly from the application server to the client.
If a server fails, the SafeKit membership protocol reconfigures the filters in the farm to re-balance the traffic on the remaining available servers.
1.4.3 Load balancing for stateful or stateless web services
With a stateful server, there is session affinity. The same client must be connected to the same server on multiple HTTP/TCP sessions to retrieve its context from the server. In this case, the SafeKit load balancing rule is configured on the client IP address. Thus, the same client is always connected to the same server on multiple TCP sessions. And different clients are distributed across different servers in the farm. This configuration is used when there is a need for session affinity.
With a stateless server, there is no session affinity. The same client can be connected to different servers in the farm on multiple HTTP/TCP sessions; because there is no context stored locally on a server from one session to another. In this case, the SafeKit load balancing rule criteria is the TCP client session identity. This configuration is the best solution to distribute sessions between servers, but it can only load balance a TCP service without session affinity.
Other load balancing algorithms are available for UDP services.
1.5 Combining mirror and farm modules
1.5.1 Active/Active: 2 mirror modules backuping each other
Two active servers mirroring each other
In an active / active architecture, there are two servers and two mirror application modules in mutual takeover (Appli1.Safe and Appli2.Safe). Each application server is a backup of the other server.
If one application server fails, both applications will be active on the same physical server. After restart of the failed server, its application will run again on its default primary server.
A mutual takeover cluster is a more economical solution than two separate mirror clusters, because there is no need to invest in back-up servers that will spend most of their time sitting idle waiting for the primary server to fail. Note that during a failure, the remaining server must be able to handle the combined workload of both applications.
1.5.2 N-to-1: N mirror modules with a single backup
Shared backup for multiple active servers
In N-to-1 architecture, there are N mirror application modules installed on N primary servers and one backup server.
If one of the N active servers fails, the single backup server restarts the module of the failed server. Once the problem is fixed and the failed server is restarted, the application switches back to its original server.
In case of failure, unlike the active/active architecture, the backup server doesn't have to handle a double workload when a primary server fails. Assuming that there is only one failure at a time - the solution can support multiple primary server failures at the same time, but in this case the single back-up server will have to handle the combined workload of all the failed servers.
Mixed farm/mirror: network load balancing, file replication, failover
Network load balancing, file replication and failover
You can mix farm and mirror application modules on the same cluster of servers.
This option allows you to implement a multi-tier application architecture, such as Apache_farm.Safe (farm architecture with load balancing and failover) and MySQL.safe (mirror architecture with file replication and failover) on the same application servers.
As a result, load balancing, file replication and failover are managed coherently on the same servers. Specific to SafeKit, this mixed architecture is unique on the market!
1.6 The simplest high availability cluster in the cloud
SafeKit brings in the Microsoft Azure, Amazon AWS and Google clouds the simplest solution for a high availability cluster. It can be implemented on existing virtual machines or on a new virtual infrastructure, that you create by simply clicking on a button that deploys and configures everything for you in Azure or AWS clouds.
For a full description, see section 16.
1.6.1 Mirror cluster in Microsoft Azure, Amazon AWS and Google GCP
SafeKit brings in the Azure, Aws and GCP clouds the simplest solution for a high availability cluster with real-time replication and failover (mirror module).
For a quick start, refer to mirror cluster in Azure, mirror cluster in AWS or mirror cluster in GCP.
· the critical application is running on the PRIM server
· users are connected to a primary/secondary virtual IP address which is configured in the cloud load balancer
· SafeKit brings a generic checker for the load balancer. On the PRIM server, the checker returns OK to the load balancer and NOK on the SECOND server
· in each server, SafeKit monitors the critical application with process checkers and custom checkers
· SafeKit automatically restarts the critical application when there is a software failure or a hardware failure thanks to restart scripts
· SafeKit makes synchronous real-time replication of files containing critical data
· a connector for the SafeKit web console is installed in each server. Thus, the high availability cluster can be managed in a quite effortless way to avoid human errors
1.6.2 Farm cluster in Microsoft Azure, Amazon AWS and Google GCP
SafeKit brings in the Azure, AWS and Google clouds the simplest solution for a high availability cluster with load balancing and failover (farm module).
For a quick start, refer to farm cluster in Azure, farm cluster in AWS or farm cluster in GCP.
· the critical application is running in all servers of the farm
· users are connected to a virtual IP address which is configured in the cloud load balancer
· SafeKit brings a generic checker for the load balancer. When the farm module is stopped in a server, the checker returns NOK to the load balancer which stops the load balancing of requests to the server. The same behavior happens when there is a hardware failure
· in each server, SafeKit monitors the critical application with process checkers and custom checkers
· SafeKit automatically restarts the critical application in a server when there is a software failure thanks to restart scripts
· a connector for the SafeKit web console is installed in each server. Thus, the load balancing cluster can be managed in a quite effortless way to avoid human errors.
2. Installation
Section 2.1 “SafeKit install”
Section 2.2 “Mirror installation recommendation”
Section 2.3 “Farm installation recommendation”
Section 2.4 “SafeKit upgrade”
Section 2.5 “SafeKit full uninstall”
Section 2.6 “SafeKit documentation”
2.1 SafeKit install
2.1.1 Download the package
1. Connect to https://support.evidian.com/safekit
2. Go to <Version 8.2>/Platforms/<Your platform>/Current versions
3.
Download the package
In Windows, two packages are available:
· A Windows Installer package (safekit_windows_x86_64_8_2_x_y.msi). It depends on the VS2022 C runtime which must be previously installed
· A standalone executable bundle (safekit_windows_x86_64_8_2_x_y.exe), which includes the SafeKit installation and the VS2022 C runtime
Choose one or the other package depending on whether the VS2022 C runtime is installed or not.
2.1.2 Installation directories and disk space provisioning
SafeKit is installed in:
SAFE |
· in Windows SAFE=C:\safekit · in Linux SAFE=/opt/safekit |
Minimum free disk space: 97MB
|
SAFEVAR |
· in Windows SAFEVAR= C:\safekit\var if %SYSTEMDRIVE%=C: · in Linux SAFEVAR=/var/safekit |
Minimum free disk space: 20MB + at least 20MB (up to 3 GB) per module for dumps
|
2.1.3 SafeKit install procedure
2.1.3.1 Install on Windows as administrator
2.1.3.1.1 SafeKit package install
1. Log-in as administrator on Windows server
2. Locate the downloaded file safekit_windows_x86_64_8_2_x_y.msi (or safekit_windows_x86_64_8_2_x_y.exe)
3. Install in interactive mode by double-clicking it and go through the
installer wizard
It is also possible to install the .msi in non-interactive mode by running in a PowerShell terminal: msiexec /qn /i safekitwindows_8_2_x_y.msi
2.1.3.1.2 Firewall setup
This step is mandatory to enable communications between SafeKit cluster nodes and with the web console.
1. Open a PowerShell console as administrator
2. Go to the root of the SafeKit installation directory SAFE (by default SAFE=C:\safekit if %SYSTEMDRIVE%=C:)
cd c:\safekit
3. Run .\private\bin\firewallcfg.cmd add
This configures the Microsoft firewall for SafeKit. For details or other firewalls, see section 10.3.
2.1.3.1.3 Web service initialization
This step is mandatory to initialize the default configuration of the web service, which is accessed by the web console and the global safekit command. By default, authentication is required to access the service. The following script makes it easy to implement by initializing it with the admin user and the given password pwd, for example.
1. Open a PowerShell console as administrator
2. Go to the root of the SafeKit installation directory SAFE (by default SAFE=C:\safekit if %SYSTEMDRIVE%=C:)
cd c:\safekit
3. Run.\private\bin\webservercfg -passwd pwd
This then allows to access to all the web console's features, by logging in with admin/pwd, and to run distributed commands. For details, see section 11.2.1.
The password must be identical on all nodes that belong to the same SafeKit cluster. Otherwise, web console and distributed commands will fail with authentication errors. |
On upgrade, this step can be skipped if it has already been done during the previous install of SafeKit 8.2. If it is reapplied, it will reset the password with the new value. |
2.1.3.2 Install on Linux as root
2.1.3.2.1 SafeKit package install
1. Open a Shell console as root on Linux server
2. Go to the directory that contains the downloaded file safekitlinux_x86_64_8_2_x_y.bin
auto extractible zip file
3. Run chmod +x safekitlinux_x86_64_8_2_x_y.bin
4. Run./safekitlinux_8_2_x86_64_x_y.bin
it extracts the package and the safekitinstall script
5. Install in interactive mode by executing ./safekitinstall
ü reply to “Do you accept that SafeKit automatically configure the local firewall to open these ports (yes|no)?”
If you answer yes, it configures firewalld or iptable Linux firewall for SafeKit. For details or other firewalls, see section 10.3.
ü reply to “Please enter a password or "no" if you want to set it later”
This step is mandatory to initialize the default configuration of the web service. The web service requires authentication to access the service.
It initializes it with the admin user and the given password pwd, for instance. It then allows to access to all the web console's features, by logging in with admin/pwd, and run distributed commands. For details, see section 11.2.1.
|
The password must be identical on all nodes that belong to the same SafeKit cluster. Otherwise, web console and distributed commands will fail with authentication errors. |
or
5. Install in non-interactive mode, by executing:
Use the option -nofirewall for disabling the firewall automatic setup
Use the option -passwd pwd for initializing the web service authentication (where pwd is the password set for the admin user)
2.1.3.2.2 Firewall setup
No action required when firewall automatic configuration has been performed during install. Otherwise see section 10.3.
2.1.3.2.3 Web service initialization
This step is mandatory to initialize the default configuration of the web service, which is accessed by the web console and the global safekit command. The web service requires authentication to access the service. No action required when the web service initialization has been performed during install. Otherwise, see section 11.2.1.
2.1.4 Use the SafeKit web console or command line interface
Once installed, the SafeKit cluster must be defined. Then modules can be installed, configured, and administered. All these actions can be done with the SafeKit console or the command line interface.
2.1.4.1 The SafeKit web console
1. Start a web browser (Microsoft Edge, Firefox, or Chrome)
2. Connect it to the URL http://host:9010 (where host is the name or IP address of one of the SafeKit nodes)
3. In the login page, enter admin as user’s name and the password you gave on initialization (e.g., pwd)
4.
Once the console is loaded, the admin user
can access to Monitoring and
Configuration
in the navigation sidebar, as he has the default Admin role
For details see section 3.
2.1.4.2 The SafeKit command line interface
It is based on the single safekit command located at the root of the SafeKit installation directory. Almost all safekit commands can be applied locally or on a list of nodes in the SafeKit cluster. This is called global or distributed command.
To use the safekit command:
In Windows |
1. Open a PowerShell console as administrator 2. Go to the root of the SafeKit installation directory SAFE (by default SAFE=C:\safekit if %SYSTEMDRIVE%=C:) cd c:\safekit 3. Run .\safekit.exe <arguments> |
In Linux |
1. Open a Shell console as root 2. Go to the root of the SafeKit installation directory SAFE (by default SAFE=/opt/safekit) cd /opt/safekit 3. Run ./safekit <arguments> |
For details, see section 9.
2.1.5 SafeKit license keys
· If you do not install any license keys, the product will stop every 3 days
· You can download a one-month trial key (which is accepted on any hostname/any OS) from the following address: http://www.evidian.com/safekit/requestevalkey.php
· To obtain permanent keys see section 8.2
· Save the key into the SAFE/conf/license.txt file (or any other file in SAFE/conf) on each server
· If files in SAFE/conf contain more than one license keys the most favorable key will be chosen
· Check the key conformance with the command safekit level
2.1.6 System specific procedures and characteristics
2.1.6.1 Windows
· Apply a special procedure to properly stop SafeKit modules at machine shutdown and to start safeadmin service at boot: see section 10.4.
· For network interfaces with teaming and with SafeKit load balancing, it is necessary to uncheck "Vip" on physical network interfaces of teaming and keep it checked only on teaming virtual interface.
2.1.6.2 Linux
· For an updated list of required packages, see the SafeKit Release Notes.
· The user safekit and a group safekit are created: all users belonging to the safekit group, and the user root can execute SafeKit commands
· In a farm module with load balancing on a virtual IP address, the vip kernel module is compiled when the module is configured. To compile successfully, Linux packages must be installed, as well as the devel package corresponding to the kernel version installed (kernel-devel).
· For a farm with SafeKit load balancing on a bonding interface, no ARP should be set in the bonding configuration. Otherwise the association <virtual IP address, invisible virtual MAC address> is broken in client ARP caches with physical MAC address of the bonding interface: see section 4.3.4
· For a mirror, if using file replication, install nfs-util package and remove the logwatch package (rpm -e logwatch); otherwise NFS service and SafeKit are stopped every night
2.2 Mirror installation recommendation
|
virtual ip = ip 1.10
mirror(app1)= app1
dir1 dir1 |
2.2.1 Hardware prerequisites
· 2 servers with the same Operating System
· Supported OS: https://support.evidian.com/supported_versions/#safekit
· Disk drive with write-back cache recommended for the performance of the IOs
2.2.2 Network prerequisites
· 1 physical IP address per server (ip 1.1 and ip 1.2)
· If you need to set a virtual IP address (ip 1.10), both servers must be in the same IP network with the standard SafeKit configuration (LAN or extended LAN between two remote computer rooms). For setting a virtual IP address with servers in different IP networks, see section 13.5.3.
2.2.3 Application prerequisites
· The application is installed and starts on both servers
· Application can be started and stopped using command lines
· On Linux, command lines like service "service" start|stop or su –user "appli-cmd"
· On Windows, command lines like net start|stop "service"
· If necessary, application with a procedure to recover after crash
· Remove automatic application start at boot and configure the boot start of the module instead
2.2.4 File replication prerequisites
· File directories that will be replicated are created on both servers
· They are located at the same place on both servers in the file tree
· It is better to synchronize clocks of both server for file replication (NTP protocol)
· On Linux, align uids/gids on both servers for owners of replicated directories/files
· See also system specific procedures and characteristics in section 2.1.6
2.3 Farm installation recommendation
|
virtual IP = ip 1.20 ip 1.20 ip 1.20
farm (app2) = app2 app2 app2 |
2.3.1 Hardware prerequisites
· At least 2 servers with the same Operating System
· Supported OS: https://support.evidian.com/supported_versions/#safekit
· Linux: kernel compilation tools installed for vip kernel module
2.3.2 Network prerequisites
· 1 physical IP address per server (ip 1.1, ip 1.2, ip 1.3)
· If you need to set a virtual IP address (ip 1.20), servers must be in the same IP network with the standard SafeKit configuration (same LAN or extended LAN between remote computer rooms). For setting a virtual IP address with servers in different IP networks, see section 13.5.3.
· See also system specific procedures and characteristics in section 2.1.6
2.3.3 Application prerequisites
The same prerequisites as for a mirror module described in section 2.2.3
2.4 SafeKit upgrade
If you encounter a problem with SafeKit, see the Software Release Bulletin containing the list of fixes on the product.
If you want to take advantage of some new features, see the SafeKit Release Notes. This document also tells you if you are in the case of a major upgrade (ex. 7.5 to 8.2) which requires a different procedure from the one presented here.
The upgrade procedure consists in uninstalling the old package and then installing the new package. All nodes in the same cluster must be upgraded.
2.4.1 Prepare the upgrade
1. Note the state "on" or "off" of SafeKit services and modules started automatically at boot safekit boot webstatus; safekit boot status -m AM (where AM is the name of the module) and in Windows: safekit boot snmpstatus;
|
The start at boot of the module can be defined in its configuration file. If so, the use of the safekit boot command becomes unnecessary. |
2. for a mirror module
note the server in the ALONE or PRIM status to know which server holds the up-to-date replicated files
3. optionally, take snapshots of modules
Uninstalling/reinstalling will reset logs and dumps of each module. If you want to keep this information (logs and last 3 dumps and configurations), run the command safekit snapshot –m AM /path/snapshot_xx.zip (replace AM by the module name)
2.4.2 Uninstall procedure
On Windows as administrator and on Linux as root:
1. stop all modules using the command safekit shutdown
For a mirror in the PRIM-SECOND status, stop first the SECOND server to avoid an unnecessary failover
2. close all editors, file explorers, shells, or terminal under SAFE and SAFEVAR (to avoid package uninstallation error)
3. uninstall SafeKit package
Use the Control Panel-Add/Remove Programs applet |
|
In Linux |
Use the command safekit uninstall |
4. undo all configurations that you have done manually for the firewall setup (see section 10.3)
Uninstalling SafeKit includes creating a backup of the installed modules in SAFE/Application_Modules/backup, then unconfiguring them.
2.4.3 Reinstall and postinstall procedure
1. Install the new package as described in section 2.1
2. Check with the command safekit level the installed SafeKit version and the validity of the license (which has not been uninstalled)
If you have a problem with the new package and the old key, take a temporary license: see section 2.1.5
3. If you use the web console, clear the browser cache and refresh pages in the web browser
4. Since SafeKit 8.2.1, previously configured modules are automatically reconfigured on upgrade.
However, you may “still need to reconfigure module to apply any configuration changes coming with the new version (see the SafeKit Release Notes). Reconfigure the module either with:
ü the
web console by navigating to “Configuration/Modules
configuration/
Configure
the module/”
ü the web console by directly entering the URL http://host:9010/console/en/configuration/modules/AM/config/
ü the command safekit config –m AM
where AM is the module name
5. If necessary, reconfigure the automatic start of modules at boot
The start at boot of the module can be defined in its configuration file. If so, skip this step. Otherwise, run the command safekit boot –m AM on (replace AM by the module name)
6. Restart the modules
The module must be started as primary on the node with the updated replicated files (former PRIM or ALONE) either with: · the web console by navigating to Monitoring/of the node/Force start/As primary · the command safekit prim –m AM (replace AM by the module name)
Check that the application is working properly once the module is in ALONE state, before starting the other node. On the other node (former SECOND), the module must be started in secondary mode either with: · the web console by navigating to Monitoring/of the node/Force start/As secondary · the command safekit second –m AM (replace AM by the module name) Once this initial start has been performed by selecting the primary and secondary nodes, subsequent starts can be performed with: · the web console by navigating to Monitoring/of the node/Start/ · the command safekit start –m AM (replace AM by the module name) |
|
Farm module |
Start the module either with: · the web console by navigating to Monitoring/of the module/Start/ · the command safekit start –m AM (replace AM by the module name) |
Furthermore, in exceptional cases where you have modified the default setup of the SafeKit web service or SNMP monitoring :
1. the SafeKit web service safewebserver
· If its automatic start at boot had been disabled, disable it again with the command safekit boot weboff
· If you had modified configuration files and these have evolved in the new version, your modifications are saved into SAFE/web/conf before being overwritten by the new version. Carrying over your old configuration to the new version may require some adaptations. For details on the default setup and all predefined setups, see section 11.
· For HTTPS and login/password configurations, certificates, and user.conf / group.conf generated for the previous release should be compatible.
2. The SafeKit SNMP monitoring
· In Windows, if its automatic start at boot had been enabled, enable it again with the command safekit boot snmpon
· If you had modified configuration files and these have evolved in the new version, your modifications are saved into SAFE/snmp/conf before being overwritten by the new version. Carrying over your old configuration to the new version may require some adaptations. For details, see section 10.8.
2.5 SafeKit full uninstall
For completely removing the SafeKit package, follow the procedure described below.
2.5.1 Uninstall on Windows as administrator
1. Log-in as administrator on Windows server
2. stop all modules using the command safekit shutdown
3. close all editors, file explorers, shells, or cmd under SAFE and SAFEVAR (to avoid package uninstallation error)
(SAFE=C:\safekit if %SYSTEMDRIVE%=C: ; SAFEVAR=C:\safekit\var if %SYSTEMDRIVE%=C:)
4. uninstall SafeKit using the Control Panel-Add/Remove Programs applet
5. reboot the server
6. delete the folder SAFE that is the installation directory of the previous install of SafeKit
7. undo all configurations that you have done for SafeKit boot/shutdown (see section 10.4)
8. undo all configurations that you have done for firewalls rules setting (see section 10.3)
2.5.2 Uninstall on Linux as root
1. Open a Shell console as root on Linux server
2. stop all modules using the command safekit shutdown
3. close all editors, file explorers, shells, or terminal under SAFE and SAFEVAR (SAFE=/opt/safekit ; SAFEVAR=/var/safekit)
4. uninstall SafeKit using the safekit uninstall –all command and answer yes when prompted to delete all SafeKit folders
5. reboot the server
6. undo all configurations that you have done for firewalls rules setting
See section 10.3
7. delete the user/group created by the previous install (default is safekit/safekit) with the commands:
userdel safekit
groupdel safekit
2.6 SafeKit documentation
The SafeKit solution is fully described. |
|
Refer to this online training for a quick start in using SafeKit. |
|
It presents: · latest install instructions · major changes · restrictions and known problems · migration instructions |
|
Bulletin listing SafeKit 8.2 packages, with descriptions of changes and fixed issues. |
|
List of known SafeKit issues and restrictions. Other KBs are available on the Evidian support site, but are only accessible to registered users. For more details on the support site, see section 8. |
|
This is the guide. Please refer to the guide corresponding to your SafeKit version number. It is delivered with the SafeKit package and can be accessed via the web console under /User’s guide. The link opposite takes you to the latest version of this guide. |
3. The SafeKit web console
Section 3.1 “Start the web console”
Section 3.2 “Configure the cluster”
Section 3.3 “Configure a module”
Section 3.4 “Monitor a module”
Section 3.5 “Snapshots or logs of module for debug and support”
Section 3.6 “Secure access to the web console”
The SafeKit 8 web console and API have evolved from earlier versions. As a result, the console delivered with SafeKit 8 can only administer SafeKit 8 servers, which cannot be administered with an older console.
3.1 Start the web console
The web console permits to administer one SafeKit cluster. A SafeKit cluster is a set of servers where SafeKit is installed and running. All servers belonging to a given SafeKit cluster share the same cluster configuration (list of servers and networks used) and communicate with each other’s to have a global view of SafeKit modules configurations. The same server can not belong to many SafeKit clusters.
3.1.1 Start a web browser
· The web browser runs on any allowed SafeKit nodes or workstation that can reach the SafeKit servers over the network.
· Network, firewall and proxy configuration must allow access to all the servers that are administered with the web console
· JavaScript must be available and enabled in the web browser
· Tested browsers are Microsoft Edge, Firefox, and Google Chrome
· To avoid security popups in Microsoft Edge, you may add the SafeKit servers addresses into the Intranet or Trusted zone
· The messages in the web console are displayed in French or English languages, according to the selected language into the console
· After SafeKit upgrade, you must clear the browser’s cache to get the new web console pages. A quick way to do this is a keyboard shortcut:
1. Open the browser to any web page and hold CTRL and SHIFT while tapping the DELETE key
2. A dialog box will open to clear the browser. Set it to clear everything and click Clear Now or Delete at the bottom
3. Close the browser, stop all background processes that may be still running and re-open it fresh to reload the web console
3.1.2 Connect to a SafeKit server
By default, access to the web console requires the user to authenticate himself with a name and password. On SafeKit install, you had to initialize it with the user admin and assign a password. This admin name and password are sufficient to access all the console's features. For more details on this configuration, see section 11.2.1.
1. Start a web browser (Microsoft Edge, Firefox, or Chrome)
2. Connect it to the URL http://host:9010 (where host is the name or IP address of one of the SafeKit servers). If HTTPS is configured, there is an automatic redirection to https://host:9453.
3. The SafeKit server to which the console is connected (host in the URL) is called the connection node. This node acts as a proxy to communicate on behalf of the console with all other SafeKit servers.
|
You can connect to any node of the cluster since the console offer global view and actions. On connection error with one node, connect to another node. |
4. In the login page, enter admin as user’s name and the password you gave on initialization (e.g., pwd).
5. The SafeKit web console is loaded
· When the console is connected to a SafeKit server on which the cluster is configured, the name of the node corresponding to the server (as defined in the cluster configuration) is displayed in the header. This is the connection node (node1 in the example).
If the cluster is not yet configured, no name is displayed.
· (1) Click on to open the menu to read the SafeKit User’s Guide, select the language, enable/disable the dark mode and logout.
· (2) Click on to collapse or expand the navigation sidebar.
· (3) Click on “Configuration” to configure the cluster and the modules. Configuration is only authorized to users that have Admin role. By default, the admin user has the Admin role.
· (4) Click on “Monitoring” to monitor and control the configured modules. Monitoring is authorized to users that have Admin, Control and Monitor roles. With Monitor role, actions on modules (start, stop…) are prohibited.
The web console offers contextual help by clicking on the icon. |
3.2 Configure the cluster
The SafeKit cluster must be defined before installing, configuring, or starting a SafeKit module. A Safekit cluster is defined by a set of networks and the addresses, on these networks, of a group of SafeKit servers, named nodes. These nodes implement one or more modules. Each server is not necessarily connected to all the networks, but at least one.
The cluster configuration is saved on the servers’ side into the cluster.xml file (see section 12). For a correct behavior, it is required to apply the same cluster configuration on all the nodes.
You must fully define the cluster configuration before installing and configuring modules since the modification of the cluster can affect the configuration or the execution of installed modules. |
The cluster configuration home page is available :
· Directly via the URL http://host:9010/console/en/configuration/cluster
Or
· By navigating the console via “Configuration/Cluster configuration”
If the cluster is not yet configured, the cluster configuration wizard is automatically opened.
3.2.1 Cluster configuration wizard
Open the configuration wizard:
· Directly via the URL http://host:9010/console/en/configuration/cluster/config
Or
·
Navigate in the console via
“Configuration/Cluster
configuration/
Configure
the cluster/”
The cluster configuration wizard is a step-by-step guided form:
1. “Edit cluster configuration” described in section 3.2.1.1
2. “Check result” described in section 3.2.1.2
3. to “Exit cluster configuration wizard”
3.2.1.1 Edit cluster configuration
· (1) Fill in the form to first assign a user-friendly name for the network. This name is used for configuring heartbeat networks used by a module.
Click on to add another node/lan or on to remove the node/lan from the cluster.
When a node/lan is removed from the cluster, all modules using it in its configuration may become unusable. |
· (2) Fill in the IP address of the node and then press the Tab key to check the server connectivity and automatically insert the server hostname.
The icon next to the address reflects the reachability of the node.
means that the SafeKit server is available. The tooltip gives information on the server. |
means that there was no reply from the server within the timeout delay. Fix the problem to be able to administer this node. It may be a bad address, a network or host failure, a bad configuration of the web browser or the firewall, the stop of the SafeKit web service on the node. For solving the problem, refer to the section 7.1. |
· Change the node name if necessary. This name is the one that will be used by the SafeKit administration service for uniquely identifying a SafeKit node. It is also the one displayed into the SafeKit web console.
· (3) If you prefer, click on “Advanced configuration” to switch to XML cluster editing.
Click on to open the SafeKit User’s Guide on the configuration description in the cluster.xml file.
· Click on “Reload” to discard your current modifications and reload the original configuration.
· (4) Once the edition is completed, click on “Save and Apply” to save and apply the edited configuration to all nodes in the cluster.
If required, you can reapply the configuration to all nodes without modifying it. |
For examples of cluster configurations with two networks refer to section 15.1.1; with three nodes refer to section 15.2.1.
3.2.1.2 Check result
· (1) Read the result of the operation on each node: · “Success” means the configuration was successful. · “Failure” means the configuration has failed. Click to read the output of commands executed on the node and search for the error. You may need to modify the parameters entered or connect to the node to correct the problem. Once the error has been corrected, “Save and apply” again. · (2) Click on “Configure modules” to exit the cluster configuration wizard and navigate to modules configuration. Or · (3) Click on to “Exit the cluster configuration wizard” and navigate to the cluster configuration home page. |
3.2.2 Cluster configuration home page
When the cluster is configured, the cluster configuration home page is available.
Open it:
· Directly via the URL http://host:9010/console/en/configuration/cluster
Or
· By navigating the console via “Configuration/Cluster configuration”
In this example, the console is loaded from 10.0.0.107, which corresponds to node1 in the existing cluster. This is the connection node.
· (1) Click on “Configuration” in the navigation sidebar
· (2) Click on “Cluster configuration” tab
Nodes configured in the cluster are listed with their configuration date.
· (3) Click on to display details about the node: networks name and addresses defined in the cluster configuration, SafeKit version, license key, hostname, OS.
· (4) Click on one of the buttons:
· to modify the cluster configuration and/or re-apply it. This opens the cluster configuration wizard and loads the cluster configuration from the connection node.
· to download the cluster configuration in XML format from the connection node.
· to unconfigure the cluster on one or more nodes
3.3 Configure a module
Once the cluster has been set up, you can configure a new module on the cluster. The module configuration home page is accessible :
· Directly via the URL http://host:9010/console/en/configuration/modules
Or
· By navigating the console via “Configuration/Modules configuration”
If no module has been configured, the console automatically presents the page for configuring a “New module”.
For module configuration examples refer to section 15.
3.3.1 Select the new module to configure
In this example, the console is loaded from 10.0.0.107, which corresponds to node1 in the existing cluster. This is the connection node.
· (1) Click on “Configuration” in the navigation sidebar · (2) Click on “Modules configuration” tab · (3) Click on “New Module” The page proposes to select a new module among several proposals visible by clicking on : · the “Main modules”, including the generic mirror.safe (refer to section 15.1.2) and farm.safe (refer to section 15.2.2) modules for integrating a new application into a mirror or farm architecture. Here are the modules stored on the connection node, node1, under SAFE/Application_Modules/generic, SAFE/Application_Modules/demo and SAFE/Application_Modules/published. · “Backup modules” archived on the connection node, which are saved when a module is uninstalled on this node. They are loaded from node1 under SAFE/Application_Modules/backup. · “Other modules” which are examples of SafeKit features used in modules supplied for testing purposes only. Refer to section 15 for the description some of them. They are loaded from node1 under SAFE/Application_Modules/other. · A locally stored module accessible from “Upload module”. This feature can be used to configure a module for a given application (e.g., Microsoft SQL Server, PostgreSQL…) downloaded from one of the SafeKit quick installation guides. · (4) Select a module to configure from those listed above. In the example, mirror.safe. · (5) Click on the button Configure the new module. · A dialog opens to give the new module name
· (6) Enter the name of the new module. · (7) Click on “Confirm” The module configuration wizard is opened. This is described below. |
3.3.2 Module configuration wizard
The module configuration wizard is a step-by-step guided form.:
1. “Edit module configuration” described in section 3.3.2.1
2. “Edit module scripts (Optional)” described in section 3.3.2.2
3. “Enable communication encryption (Optional) ” described in section 3.3.2.3
4. “Save and apply” described in section 3.3.2.4
5. “Check result” described in section 3.3.2.5
6. to “Exit module configuration wizard”
Note that module reconfiguration can only be applied to nodes on which the module in question is not started. Therefore, stop the module before starting the configuration wizard.
|
If needed, you can reapply the module configuration on all nodes without modifying it. |
3.3.2.1 Edit module configuration
Below is an example of editing the mirror.safe module configuration.
· (1) Fill in the form to assign values to the various components, add or remove them. Click on to open the detailed panel for each component.
This form is used to enter only the main module configuration parameters.
The names of the “Heartbeat networks” proposed are the names of the lans entered during cluster configuration. |
· (2) For advanced module configuration, exhaustive compared to the form, click on “Advanced configuration”. This switches to editing the module configuration file in XML format, userconfig.xml.
Click on to open the SafeKit User’s Guide describing the configuration of the various components in the userconfig.xml file.
· If necessary, click on “Reload” to discard your modifications and reload the complete original configuration (including scripts if these were modified in the next step).
· (3) Once you have finished editing the module configuration, click on “Next step”.
For examples of mirror module configuration, refer to section 15.1.2 ; of farm module configuration, refer to section 15.2.2.
3.3.2.2 Edit module scripts
Below is an example of editing the mirror.safe module scripts.
· (1) Click on “start_prim” or “stop_prim” to edit it and insert your application start/stop.
Click on to copy the content and edit it with your favorite syntax editor. Once done, paste the modified content into the input field with .
· (2) If necessary, click on “Advanced configuration” to list the other module’s scripts and edit them (prestart, poststop, scripts for checkers…).
· Click on to open the SafeKit User’s Guide describing the module scripts.
· If necessary, click on “Reload” to discard your modifications and reload the complete original configuration (including the module configuration if it was modified in the previous step).
· (3) Once you have finished editing the module scripts, click on “Next step”.
For examples of mirror module scripts, refer to section 15.1.3 ; of farm module scripts, refer to section 15.2.3.
3.3.2.3 Enable communication encryption
Encryption of internal module communications between cluster nodes is enabled by default. For details, see section 10.5.
· (1) Click “Enable” to enable or disable encryption of module communications.
When the module's encryption key is not identical on all nodes, internal communication is impossible. The configuration must be reapplied to all nodes to propagate the same key. |
To generate new encryption keys, you need to:
1. disable encryption, then “Save and apply” configuration to all nodes
2. enable encryption, then “Save and apply” configuration to all nodes
· If necessary, click on “Reload” to discard your modifications and reload the complete original configuration (including the module configuration and scripts if these were modified in the previous steps).
· (2) Once this step is complete, click on “Next step”.
3.3.2.4 Save and apply
Step to select the nodes affected by the configuration.
· (1) Check/uncheck to select/unselect nodes. Please note that the connection node (node1 in the example) is mandatory.
There are 2 cases where “Save and Apply” is disabled:
The module on the selected node is started and, in a state, other than STOP (NotReady). |
There was no reply from the node within the timeout delay. It may be a bad address, a network or host failure, a bad configuration of the web browser or the firewall, the stop of the SafeKit web service on the node. For solving the problem, refer to the section 7.1. |
In both cases, uncheck the node or click on “Save and check” to apply it later, after stopping the module or solving the communication problem.
· (2) Click on “Save and check” to save the edited configuration on the connection node and check its consistency. It then proceeds to the next step to display the result of this operation.
Once this operation has been completed, any changes are saved on the connection node. The configuration wizard can be exited and relaunched later to apply the saved configuration. Until the saved configuration is applied, the last applied configuration of the module remains active.
· (3) Click on “Save and apply” to save and apply the edited configuration on selected nodes. It then proceeds to the next step to display the result of this operation.
If this operation is successful, the applied configuration becomes the active one for the module.
On the server side, the module configuration is saved under SAFE/modules/AM(where AM is the module name). When reconfiguring a module, this directory is deleted and overwritten with the changes made in the console. Thus, on the servers’ side, you must close all editors, file explorers, shells or cmd under SAFE/modules/AM before applying the configuration (otherwise there is a risk that the apply fails). |
3.3.2.5 Check result
The example below shows the result of the “Save and Apply” operation. The layout for “Save and Verify” is similar.
· (1) Read the result of the operation on each node:
· “Success” means the operation was successful.
· “Failure” means the operation has failed.
Click to read the output of commands executed on the node and search for the error. You may need to modify the parameters entered or connect to the node to correct the problem. Once the error has been corrected, repeat the operation from the previous step.
· (2) Click on “Monitor modules” to exit the module configuration wizard and navigate to modules monitoring.
Or
· (3) Click on to “Exit the module configuration wizard” and navigate to the modules configuration home page.
3.3.3 Modules configuration home page
Once the first module has been configured, the module configuration home page is available. It allows you to view the modules installed on the cluster and to access the configuration of a new module.
Open it:
· Directly via the URL http://host:9010/console/en/configuration/modules
Or
· By navigating the console via “Configuration/Modules configuration”
Before each reconfiguration, deconfiguration and uninstallation, on each node, close all editors, file explorer, shells or cmd under SAFE/modules/AM (or risk the operation failing). |
In the following example, the console is loaded from 10.0.0.107, which corresponds to node1 in the existing cluster. This is the connection node.
· (1) Click on “Configuration” in the navigation sidebar.
· (2) Click on “Modules configuration” tab.
· Modules installed on the cluster are listed with the date the configuration was applied and, if applicable, the date the configuration was saved but not yet applied.
· (3) Click on one of the buttons associated with the module:
· to modify its configuration or reapply its current configuration. This opens the module configuration wizard and loads its current configuration from the connection node.
· to download the .safe, consisting of all module files (userconfig.xml, scripts) from the connection node.
· to reconfigure the module from the contents of a locally stored .safe.
· to restore a previous module configuration.
SafeKit keeps a copy of the last three successful configurations (stored under SAFE/modules/lastconfig on the server side). All module configuration files are packaged in a .safe file, whose name is of the type of AM_<date>_<time> (where AM is the module name).
· to remove internal files for the module on one or more nodes, without uninstalling it. The user configuration files are kept for later re-application.
· to completely uninstall the module on one or more nodes.
All module configuration files are packaged in a .safe file, which is archived on the server side under SAFE/Application_Modules/backup.
· To configure a new module, click on “New module”
3.3.4 Edit the module configuration locally and then apply it
You may prefer to use your favorite editor to modify the module’s configuration file and scripts or need to add module scripts, such as custom checkers, to your current configuration of the module.
In this example, a script is developed locally and added to mirror module.
· (1) Click on “Configuration” in the navigation sidebar.
· (2) Click on “Modules configuration” tab.
· (3) Click on to download the mirror.safe on your workstation.
· (4) Edit the mirror.safe that is a zip file to add your module script files into bin directory (checker.ps1 in the example).
· (5) Upload the modified mirror.safe (.zip extension is also accepted).
· (6) Click on to select the file to be uploaded then “Confirm”.
The module configuration wizard is launched with the contents of this file. The new scripts are visible with the “Advanced configuration” in step 2. Got to step 4 to “Save and apply” this new configuration.
3.4 Monitor a module
Once a module is configured, you can monitor its state and run actions on it (start, stop…).
The modules monitoring home page is accessible :
· Directly via http://host:9010/console/en/monitoring
Or
· By navigating the console via “Monitoring”
3.4.1 Monitoring home page
In this example, the console is loaded from 10.0.0.107, which corresponds to node1 in the existing cluster. This is the connection node. Two modules are configured: farm and mirror.
· (1) Click on “Monitoring” in the navigation sidebar For each installed module, it displays: · the module name and nodes name on which it is installed · the module state on the node · a notification on state change if the user has allowed them, and the URL is https or http://localhost For a description, see section 3.4.2. · (2) Click on to open the menu of global actions (start, stop…) on the module that apply on all nodes (node1, node2 in the example). For a description, see section 3.4.3.1. · (3) Click on to open menu of actions (start, stop…) on the module that applies only to the node (node1 in the example). For a description, see section 3.4.3.2. · (4) Click on the node panel (mirror>node1 in the example) to open details for the module on this node (logs, resources…). Since SafeKit 8.2.2, Click instead on to open/close the details. For a description, see section 3.4.4. · (5) Click on to open/close the module states timeline on all nodes where it is installed. Available since SafeKit 8.2.2. For a description, see section 3.4.5. |
3.4.2 Module state
The module is represented real-time display of its synthetic and detailed states on the left and right panels.
3.4.2.1 Synthetic state
The console displays one of the following synthetic states for the module on the node:
WAIT (Transient)(orange) Transient state of the module |
ALONE (Transient)(orange) Transient state of a mirror module, primary without secondary |
ALONE (Ready)(green) Stable state of a mirror module, primary without secondary |
PRIM (Transient)(orange) Transient state of a mirror module, primary with secondary |
PRIM (Ready)(green) Stable state of a mirror module, primary with secondary |
SECOND (Transient)(orange) Transient state of a mirror module, secondary |
SECOND (Ready)(green) Stable state of a mirror module, secondary with primary |
UP (Transient) (orange) Transient state of a farm module |
UP (Ready)(green) Stable state of a farm module |
WAIT (NotReady) (red) Blocked state of the module, waiting for one or more resources |
NOT CONFIGURED (grey) Installed module but not configured |
ERROR (red) The node did not respond within the given time limit. This may be due to an incorrect address, a network or server failure, a misconfigured web browser or firewall, or the SafeKit web service being stopped on the node (see section 7.1). It may also be due to the temporary unavailability of the connection node. In this case, reload the console from another SafeKit node. |
For details on state changes of a mirror module, see section 5.2.
For details on state changes of a farm module, see section 6.2.
3.4.2.2 Detailed state
It is the state of the main resources or failover rules.
uptodate Replicated directories of the mirror module are uptodate |
Replicated directories of the mirror module are not uptodate not uptodate |
The mirror module is in degraded mode described in section 7.6 degraded |
50%, 100% The network load share of the farm module (e.g. 50% or 100% with 2 nodes) |
No load share taken by the farm module 0% |
The module applied the failover rule (e.g., the rule named c_checkfile c_checkfile) which triggers the actions restart, stop, stopstart, or wait on the module due to a resource going down. To analyze the issue, read the logs and resource statuses as described below |
The module is in state ERROR (red) connection The node did not respond within the given time limit error |
3.4.3 Module control menus
3.4.3.1 Global menu
The actions of global menu apply to all nodes where the module is configured.
In the example below, actions apply to the module mirror on node1 and node2.
· (1) Click on to open the module's global actions menu.
· Click on “Start” to start the module on all nodes.
For mirror module, the node with the up-to-date replicated data is started as primary.
· Click on “Stop” to stop the module on all nodes.
For mirror module, the node that is secondary is stopped first to avoid unnecessary failover.
· Click on “Debug” for debug and support as described in section 3.5.
3.4.3.2 Local menu
The actions of local menu apply only to the selected node.
3.4.3.2.1 Control a mirror module
In the example below, actions apply to the module mirror on node1.
· (1) Click on to open module's local actions menu on the desired node (e.g. node1).
· Click on “Start” to start the module on the node.
For mirror module, the node is started as primary when replicated data are up-to-date. Otherwise, it is started as secondary. For details, see section 5.5.
· Click on “Stop” to stop the module on the node.
· Click on “Restart” to restart the module on the node.
It only executes only stop then start scripts to locally restart the application without leading to a failover.
· Use “Force start” submenu when you need to decide if the node should start primary or secondary:
· Select “Force start As Primary” to force the module to start as primary on this node.
For instance, on the 1st start of a mirror module as described in section 5.3, you must “Force start As primary” the node which has the up-to-date replicated folders.
· Select “Force Start As secondary” to force the module to start as secondary on this node.
Data synchronization can be optimized based on the module's last internal state.
· Select “Force Start As secondary with full data synchronization” to start the module on this node as a secondary and to force a complete copy of the replicated data.
· Click on “Disable/enable” to control error detection as described in section 3.4.3.2.3.
· Click on “Debug” to download module logs or snapshots from this node rather than from all nodes as described in section 3.5.
To understand and check the correct behavior of a mirror module, see section 5. To test it, see section 4.
3.4.3.2.2 Control farm module
In the example below, actions apply to the module farm on node2.
· (1) Click on to open module's local actions menu on the desired node (e.g. node2).
· Click on “Start” to start the module on the node.
· Click on “Stop” to stop the module on the node.
· Click on “Restart” to restart the module on the node.
It only executes only stop then start scripts to restart the application without leading to a failover.
· Click on “Disable/enable” to control error detection as described in section 3.4.3.2.3.
· Click on “Debug” to download module logs or snapshots from this node rather than from all nodes as described in section 3.5.
To understand and check the correct behavior of a farm module, see section 6. To continue the tests, see section 4.
3.4.3.2.3 Control checkers or processes/services monitoring
To avoid false error detection and automatic failover on application maintenance, you can disable configured checkers (TCP, ping, custom….) or processes/services monitoring. Once the maintenance is completed, they can be safely re-enabled. These actions can be applied while the module is started/stopped and are not reset when the module stops-starts.
In the example below, actions apply to the module mirror on node1.
· (1) Click on to open the module's local actions menu on the desired node (e.g. node1).
· (2) Click on “Disable/enable” to open the submenu.
· (3) Click on “Checkers” or “Processes/services monitoring” to open the submenu.
· (4) Click on “Disable” to disable the error detection
This disables all checkers (TCP, ping, custom….) or processes/services monitoring configured for the module.
· (4) Click on “Enable” to re-enable error detection by checkers or processes/services monitoring.
3.4.4 Module details
You can display details for a module on one node:
· Directly via the URL http://host:9010/console/en/monitoring /modules/AM/nodes/node (replace AM by the module name and node by the node name)
Or
· By navigating the console via “Monitoring/Click on for the module>node”
The selected module>node is highlighted with a blue color.
In the example, the detail for the module mirror on node1 is displayed.
· Click on to open/close details for the module on this node (logs, resources…).
· Click on “Logs” tab to visualize the module logs.
· Click on “Resources” tab to visualize the module resources.
· Click on “Information” tab to visualize information on the node: networks name and addresses defined in the cluster configuration, SafeKit version, license key, hostname, OS.
3.4.4.1 Module logs
You can display logs of a module on one node:
· Directly via the URL http://host:9010/console/en/monitoring /modules/AM/nodes/node/logs (replace AM by the module name and node by the node name)
Or
· By navigating the console via “Monitoring/Click on the module>node/Logs tab”
The left panel displays in real-time the non verbose module log for the selected module>node.
· Click on to resume/suspend the view in real time of the module log.
Refer to section 7 for an explanation of main messages.
· Click on to download the module log (verbose or not verbose).
· Select the message type to view:
|
· C(ritical) messages such as error detection · E(vent) messages such as local and remote states · U(ser) messages when the user run action on the module · S(cript) messages when module scripts are executed |
· Click on a message to display the verbose module log or the script log (output of scripts) into the log detail into the right panel.
3.4.4.1.1 Script log
To display the script log, click on the S(cript) message whose output you want to view.
· (1) Click the S(cript) message consisting of:
· the date and time of the execution of the script
· the name of the script executed
· the name of the name of the corresponding userlog file
The userlog file content is displayed into the right panel. In the example, it is the content of the file SAFEVAR/modules/AM/userlog_2024-02-12T091410_start_prim.ulog (where AM is the module name)
3.4.4.1.2 Verbose log
To display the verbose module log, click on a message other than S(cript).
· (1) Click the message consisting of:
· the date and time of the event
· the module message
· All verbose messages between the selected message and the previous one in the table are displayed in the right-hand panel.
3.4.4.2 Module resources
You can display resources of a module on one node:
· Directly via the URL http://host:9010/console/en/monitoring /modules/AM/nodes/node/resources (replace AM by the module name and node by the node name)
Or
· By navigating the console via “Monitoring/Click on the module>node/Resources tab”
3.4.4.2.1 Ressources state
The left panel displays in real-time the current state of the resources for the selected module>node.
· (1) Select the group of resources to view:
|
· Module status Main resources, especially the ones of files replication for a mirror module · Checkers Ressources set by checkers · File replication File replication-specific resources that demonstrate synchronization progress · All resources |
· Click on a resource to display its value over time in the right panel. This history may be empty for some resources (unassigned or cleaned).
Resource’s state is controlled by the failover machine to trigger a failover on failures (see section 13.18).
3.4.4.2.2 Resource’s state value history
To display a resource's value history, click on the resource you're interested in.
· (1) Click on the line consisting of:
· the last date the resource was assigned
· the name and category of the resource. The full resource name is like category.name (custom.checkfile in the example).
The history of resource values is displayed in the right panel. In the example, this is the custom.checkfile resource corresponding to a resource assigned by a custom checker.
3.4.5 Module states timeline
Since SafeKit 8.2.2, you can display the module states timeline:
· By navigating the console via “Monitoring/Click on for the module”
This provides a global view of the module's state on the cluster. Be aware:
· that the clocks of the two nodes must be synchronized for the mapping of state changes to be meaningful
· it displays a reverse timeline of the module states on all nodes over time, by starting by the newest date.
· Click on to open/close the timeline. The timeline displayed is the one available at the time of loading.
· Click on to refresh the timeline with the latest state changes.
· Click on a state change event to display the module log for the node starting at this date
3.5 Snapshots or logs of module for debug and support
When the problem is not easily identifiable, it is recommended to download logs or snapshots of the module on all nodes as described below. Snapshots allows an offline and in-depth analysis of the module and node status as described in section 7.16. If this analysis fails, send snapshots to support as described in section 8.
In the following example, the module mirror is configured on node1 and node2. Note that a snapshot can be downloaded in any state of the module.
· (1) Click on to open the global menu of the module.
· (2) Click on “Debug” to open the debug submenu.
· (3) Click on “Download the snapshots” to create and download the snapshot of the module for each node.
The web console relies on the web browser's download settings to save the snapshot on the workstation. Some browsers may ask confirmation to download many files and zip files.
The snapshot generation command generates a new dump and creates a .zip file containing the last 3 dumps and the last 3 module configurations.
In this example, it downloads 2 snapshots : snapshot_node1_mirror.zip and snapshot_node2_mirror.zip.
· Click on “Download the logs” to download the module log (verbose or not) for each node.
· In case of file replication issues, click on “Generate the dump files” at the time the problem occurs.
The dump contains the module logs and information on the system and SafeKit state at the time of the dump. It is generated on the server side into SAFEVAR/snapshot/modules/AM/dump_AAAA_MM_DD_hh_mm_ss.
3.6 Secure access to the web console
Admin role
|
This role grants all administrative
rights by allowing access to |
Control role
|
This role grants monitoring and
control rights by allowing access only to |
Monitor role
|
This role grants only monitoring rights, prohibiting actions on modules (start, stop…) in Monitoring in the navigation sidebar. |
SafeKit provides different setups for the web service to enhance the security of the SafeKit web console. The predefined setups are listed below from least secure to most secure:
HTTP. Same role for all users without authentication
This solution can only be implemented only in HTTP and is not compatible with user authentication methods. It is intended to be used for troubleshooting only.
HTTP/HTTPS with user authentication based on Apache files and optional role management
It relies on Apache files to store username/password for authenticating users and, optionally, to store the associated role for restricting their access. To connect to the console, the user must enter the username and password as configured with the Apache mechanisms.
This is the default active configuration, applied for HTTP and initialized with a single admin user with the Admin role. The default setup can be extended to add users or to switch to HTTPS.
HTTP/HTTPS with user authentication based on LDAP/AD authentication. Optional role management
It relies on LDAP/AD authentication server to authenticate users and, optionally, restricts their access based on roles. To connect to the console, the user must enter the username and password as configured into the LDAP/AD server. It supports HTTP or HTTPS.
HTTP/HTTPS with user authentication based on OpenId Connect authentication. Optional role management
It relies on OpenID Identity Provider server to authenticate users and, optionally, restricts their access based on roles. To connect to the console, the user must enter the username and password as configured into the Identity Provider server. It supports HTTP or HTTPS.
To implement them, refer to the section 11.