m (1 revision) |
Revision as of 19:01, 24 October 2011
Contents |
csa102 Redundancy and Failover
Objective
This sample demonstrates basic HA (High Availability) and SU (Service Unit) fail-over functionality. The application has two components, both processing the same workload as csa101, that is, repeatedly printing "Hello World". The difference, however, is that in this case there is now an active component and a standby component, with only the active component performing the printing function.
csa102 is quite similar to csa101, and this section will discuss the areas in which they deviate.
What you will learn
- Keeping track of HA states and how to respond to callbacks requesting HA state changes.
Code
The code can be found within the following directory
<project-area_dir>/eval/src/app/csa102Comp
This sample component is implemented in a single C module that is quite similar to the csa101 module. We will discuss the additions in detail.
We define a static variable to keep track of the component's HA state as:
clCompAppMain.h |
---|
|
This is subsequently used to control the processing of this component's workload.
As with csa101, the clCompAppAMFCSISet()
function is called to set the component's HA state, and the following block of code assigns this requested state to the component, while verbosely detailing this process:
clCompAppMain.c |
---|
|
clCompAppMain.c |
---|
|
It is worth noting that the running
variable, as used by csa101 is not modified here. Instead, the ha_state
variable is used to control the component's workload processing, as displayed in the main worker loop:
clCompAppMain.c |
---|
|
It is also worth noting that the running
variable is still used to control the worker loop as in csa101, but it is only controlled by requests to change the EO (Executable Object) state, not the HA state of the component.
How to Run csa102 and What to Observe
As with the csa101 example we will use the SAFplus Platform Console to manipulate the administrative state of the csa102 service group.
- Start the SAFplus Platform Console
# cd /root/asp/bin # ./asp_console
- Then put the csa102SGI0 service group into lock assignment state using the following commands.
cli[Test]-> setc 1 cli[Test:SCNodeI0]-> setc cpm cli[Test:SCNodeI0:CPM]-> amsLockAssignment sg csa102SGI0
Because example 102 has two components there will be two application log files to view. These are
/root/asp/var/log/csa102CompI0Log.latest
and/root/asp/var/log/csa102CompI1Log.latest
. Viewing these application logs using thetail -f
, you should see the following./root/asp/var/log/csa102CompI0Log.latest Sun Jul 13 22:38:17 2008 (SCNodeI0.13418 : csa102CompEO.---.---.00029 : INFO) Component [csa102CompI0] : PID [13418]. Initializing Sun Jul 13 22:38:17 2008 (SCNodeI0.13418 : csa102CompEO.---.---.00030 : INFO) IOC Address : 0x1 Sun Jul 13 22:38:17 2008 (SCNodeI0.13418 : csa102CompEO.---.---.00031 : INFO) IOC Port : 0x80 Sun Jul 13 22:38:17 2008 (SCNodeI0.13418 : csa102CompEO.---.---.00032 : INFO) csa102: Instantiated as component instance csa102CompI0. Sun Jul 13 22:38:17 2008 (SCNodeI0.13418 : csa102CompEO.---.---.00033 : INFO) csa102CompI0: Waiting for CSI assignment...
//root/asp/var/log/csa102CompI1Log.latest Sun Jul 13 22:38:18 2008 (SCNodeI0.13422 : csa102CompEO.---.---.00028 : INFO) Component [csa102CompI1] : PID [13422]. Initializing Sun Jul 13 22:38:18 2008 (SCNodeI0.13422 : csa102CompEO.---.---.00029 : INFO) IOC Address : 0x1 Sun Jul 13 22:38:18 2008 (SCNodeI0.13422 : csa102CompEO.---.---.00030 : INFO) IOC Port : 0x81 Sun Jul 13 22:38:18 2008 (SCNodeI0.13422 : csa102CompEO.---.---.00031 : INFO) csa102: Instantiated as component instance csa102CompI1. Sun Jul 13 22:38:18 2008 (SCNodeI0.13422 : csa102CompEO.---.---.00032 : INFO) csa102CompI1: Waiting for CSI assignment...
- Next, unlock the service group using the following SAFplus Platform Console command.
# cli[Test:SCNodeI0:CPM]-> amsUnlock sg csa102SGI0
and in the /var/log/csa102CompI*.log files we should see:
/root/asp/var/log/csa102CompI0Log.latest Sun Jul 13 23:00:18 2008 (SCNodeI0.13422 : csa102CompEO.---.---.00487 : INFO) csa102: Hello World! . Sun Jul 13 23:00:19 2008 (SCNodeI0.13422 : csa102CompEO.---.---.00488 : INFO) csa102: Hello World! . Sun Jul 13 23:00:20 2008 (SCNodeI0.13422 : csa102CompEO.---.---.00489 : INFO) csa102: Hello World! . Sun Jul 13 23:00:21 2008 (SCNodeI0.13422 : csa102CompEO.---.---.00490 : INFO) csa102: Hello World! . Sun Jul 13 23:00:22 2008 (SCNodeI0.13422 : csa102CompEO.---.---.00491 : INFO) csa102: Hello World! . Sun Jul 13 23:00:23 2008 (SCNodeI0.13422 : csa102CompEO.---.---.00492 : INFO) csa102: Hello World! .
/root/asp/var/log/csa102CompI1Log.latest Sun Jul 13 22:43:00 2008 (SCNodeI0.13422 : csa102CompEO.---.---.00043 : INFO) csa102: New state is not the ACTIVE; deactivating service
These can be watched in a separate terminal window using
tail -f
. csa102CompI0 is the active component in this case, and csa102CompI1 is the standby. Consequently, the "Hello world!" lines appear in csa102CompI0.log and not in csa102CompI1.log. They will continue to be logged to that file until the HA state of that component changes, for example, when the process logging those lines is killed. In the mean time the standby component: csa102CompI1 just waits until it is told that it should take over the workload.
Changing the HA state of the Client/Server
The easiest way to test component fail-over is to kill the process associated with the active component using the kill
command. For this you need to know the process ID of the active component. To find the process ID issue the following command from a bash shell.
# ps -eaf | grep csa102
This should produce an output that looks similar to the following.
root 15872 15663 0 13:49 ? 00:00:01 csa102Comp -p root 16328 15663 0 13:56 ? 00:00:00 csa102Comp -p root 17304 16145 0 14:11 pts/4 00:00:00 grep csa102
Notice the two entries that end with csa102Comp -p
. These are our two component processes. The first one is usually the active process. This is the one that we will kill. In this case the process ID is 15872. So to kill the active component you issue the command:
# kill -9 15872
If this step does not result in the active component being killed then it is likely that the standby component was killed. In this case simply try killing the other process.
After executing the kill
command you can see in the csa102CompI1 application that the standby component is now active.
/root/asp/var/log/csa102CompI1Log.latest |
---|
|
This indicates that the standby component has taken over for the failed active component.
Looking in the csa102CompI0 application log you can see that this component was killed and has been restarted. Since csa102CompI1 took over as the active component this component now goes into the standby state.
/root/asp/var/log/csa102CompI0Log.latest |
---|
|
You can continue to observe this failover by alternately killing the active component.
To stop csa102 using the SAFplus Platform Console.
cli[Test:SCNodeI0:CPM]-> amsLockAssignment sg csa102SGI0
Successfully changed state of csa102SGI0 to LockAssignment
cli[Test:SCNodeI0:CPM]-> amsLockInstantiation sg csa102SGI0 cli[Test:SCNodeI0:CPM] -> end cli[Test:SCNodeI0] -> end cli[Test] -> bye
Successfully changed state of csa102SGI0 to LockInstantiation and exit.
Summary
This Sample Application has covered basic HA and failover, with changing the state of a component to active and standby.