Difference between revisions of "Doc:latest/evalguide/csa102"

 
m (1 revision)

Revision as of 19:01, 24 October 2011

Contents

csa102 Redundancy and Failover

Objective

This sample demonstrates basic HA (High Availability) and SU (Service Unit) fail-over functionality. The application has two components, both processing the same workload as csa101, that is, repeatedly printing "Hello World". The difference, however, is that in this case there is now an active component and a standby component, with only the active component performing the printing function.

csa102 is quite similar to csa101, and this section will discuss the areas in which they deviate.

What you will learn

  • Keeping track of HA states and how to respond to callbacks requesting HA state changes.

Code

The code can be found within the following directory

<project-area_dir>/eval/src/app/csa102Comp

This sample component is implemented in a single C module that is quite similar to the csa101 module. We will discuss the additions in detail.

We define a static variable to keep track of the component's HA state as:

clCompAppMain.h
                        
#define STRING_HA_STATE(S)                                                  \
(   ((S) == CL_AMS_HA_STATE_ACTIVE)             ? "Active" :                \
    ((S) == CL_AMS_HA_STATE_STANDBY)            ? "Standby" :               \
    ((S) == CL_AMS_HA_STATE_QUIESCED)           ? "Quiesced" :              \
    ((S) == CL_AMS_HA_STATE_QUIESCING)          ? "Quiescing" :             \
    ((S) == CL_AMS_HA_STATE_NONE)               ? "None" :                  \
                                                  "Unknown" )
  

This is subsequently used to control the processing of this component's workload.

As with csa101, the clCompAppAMFCSISet() function is called to set the component's HA state, and the following block of code assigns this requested state to the component, while verbosely detailing this process:

clCompAppMain.c
clCompAppAMFCSISet()

        switch ( haState )
        {
            case CL_AMS_HA_STATE_ACTIVE:
            {
                /*
                 * AMF has requested application to take the active HA state 
                 * for the CSI.
                 */

                /*
                 * ---BEGIN_APPLICATION_CODE---
                 */

                clprintf(CL_LOG_SEV_INFO,"csa102: ACTIVE state requested; activating service\n");
                running = 1;

                /*
                 * ---END_APPLICATION_CODE---
                 */

                clCpmResponse(cpmHandle, invocation, CL_OK);
                break;
            }
    
            case CL_AMS_HA_STATE_STANDBY:
            {
                /*
                 * AMF has requested application to take the standby HA state 
                 * for this CSI.
                 */

                /*
                 * ---BEGIN_APPLICATION_CODE---
                 */

                clprintf(CL_LOG_SEV_INFO,"csa102: New state is not the ACTIVE; deactivating service\n");
                running = 0;

                /*
                 * ---END_APPLICATION_CODE---
                 */

                clCpmResponse(cpmHandle, invocation, CL_OK);
                break;
            }

  
clCompAppMain.c
            case CL_AMS_HA_STATE_QUIESCED:
            {
                /*
                 * AMF has requested application to quiesce the CSI currently
                 * assigned the active or quiescing HA state. The application 
                 * must stop work associated with the CSI immediately.
                 */

                /*
                 * ---BEGIN_APPLICATION_CODE---
                 */

                clprintf(CL_LOG_SEV_INFO,"csa102: Acknowledging new state\n");
                running = 0;

                /*
                 * ---END_APPLICATION_CODE---
                 */

                clCpmResponse(cpmHandle, invocation, CL_OK);
                break;
            }

            case CL_AMS_HA_STATE_QUIESCING:
            {
                /*
                 * AMF has requested application to quiesce the CSI currently
                 * assigned the active HA state. The application must stop work
                 * associated with the CSI gracefully and not accept any new
                 * workloads while the work is being terminated.
                 */

                /*
                 * ---BEGIN_APPLICATION_CODE---
                 */

                clprintf(CL_LOG_SEV_INFO,"csa102: Signaling completion of QUIESCING\n");
                running = 0;

                /*
                 * ---END_APPLICATION_CODE---
                 */

                clCpmCSIQuiescingComplete(cpmHandle, invocation, CL_OK);
                break;
            }

            default:
            {
                break;
            }
        }

  

It is worth noting that the running variable, as used by csa101 is not modified here. Instead, the ha_state variable is used to control the component's workload processing, as displayed in the main worker loop:

clCompAppMain.c
ClCompAppInitialize()

        while (!exiting)
        {
            if (running)
            {
                clprintf(CL_LOG_SEV_INFO,"csa102: Hello World! %s\n", show_progress());
            }
            sleep(1);
        }

  

It is also worth noting that the running variable is still used to control the worker loop as in csa101, but it is only controlled by requests to change the EO (Executable Object) state, not the HA state of the component.

How to Run csa102 and What to Observe

As with the csa101 example we will use the SAFplus Platform Console to manipulate the administrative state of the csa102 service group.

  1. Start the SAFplus Platform Console
     # cd /root/asp/bin
     # ./asp_console
  2. Then put the csa102SGI0 service group into lock assignment state using the following commands.
     cli[Test]-> setc 1
     cli[Test:SCNodeI0]-> setc cpm
     cli[Test:SCNodeI0:CPM]-> amsLockAssignment sg csa102SGI0
    

    Because example 102 has two components there will be two application log files to view. These are /root/asp/var/log/csa102CompI0Log.latest and /root/asp/var/log/csa102CompI1Log.latest. Viewing these application logs using the tail -f, you should see the following.

    /root/asp/var/log/csa102CompI0Log.latest
    Sun Jul 13 22:38:17 2008   (SCNodeI0.13418 : csa102CompEO.---.---.00029 :   INFO) 
     Component [csa102CompI0] : PID [13418]. Initializing
    
    Sun Jul 13 22:38:17 2008   (SCNodeI0.13418 : csa102CompEO.---.---.00030 :   INFO) 
        IOC Address             : 0x1
    
    Sun Jul 13 22:38:17 2008   (SCNodeI0.13418 : csa102CompEO.---.---.00031 :   INFO)
        IOC Port                : 0x80
    
    Sun Jul 13 22:38:17 2008   (SCNodeI0.13418 : csa102CompEO.---.---.00032 :   INFO)
     csa102: Instantiated as component instance csa102CompI0.
    
    Sun Jul 13 22:38:17 2008   (SCNodeI0.13418 : csa102CompEO.---.---.00033 :   INFO)
     csa102CompI0: Waiting for CSI assignment...
      
    //root/asp/var/log/csa102CompI1Log.latest
    Sun Jul 13 22:38:18 2008   (SCNodeI0.13422 : csa102CompEO.---.---.00028 :   INFO)
     Component [csa102CompI1] : PID [13422]. Initializing
    
    Sun Jul 13 22:38:18 2008   (SCNodeI0.13422 : csa102CompEO.---.---.00029 :   INFO)
        IOC Address             : 0x1
    
    Sun Jul 13 22:38:18 2008   (SCNodeI0.13422 : csa102CompEO.---.---.00030 :   INFO)
        IOC Port                : 0x81
    
    Sun Jul 13 22:38:18 2008   (SCNodeI0.13422 : csa102CompEO.---.---.00031 :   INFO)
     csa102: Instantiated as component instance csa102CompI1.
    
    Sun Jul 13 22:38:18 2008   (SCNodeI0.13422 : csa102CompEO.---.---.00032 :   INFO)
     csa102CompI1: Waiting for CSI assignment...
      
  3. Next, unlock the service group using the following SAFplus Platform Console command.
    # cli[Test:SCNodeI0:CPM]-> amsUnlock sg csa102SGI0
    

    and in the /var/log/csa102CompI*.log files we should see:

    /root/asp/var/log/csa102CompI0Log.latest
    Sun Jul 13 23:00:18 2008   (SCNodeI0.13422 : csa102CompEO.---.---.00487 :   INFO)
     csa102: Hello World!       .
    
    Sun Jul 13 23:00:19 2008   (SCNodeI0.13422 : csa102CompEO.---.---.00488 :   INFO)
     csa102: Hello World!        .
    
    Sun Jul 13 23:00:20 2008   (SCNodeI0.13422 : csa102CompEO.---.---.00489 :   INFO)
     csa102: Hello World!         .
    
    Sun Jul 13 23:00:21 2008   (SCNodeI0.13422 : csa102CompEO.---.---.00490 :   INFO)
     csa102: Hello World!          .
    
    Sun Jul 13 23:00:22 2008   (SCNodeI0.13422 : csa102CompEO.---.---.00491 :   INFO)
     csa102: Hello World! .
    
    Sun Jul 13 23:00:23 2008   (SCNodeI0.13422 : csa102CompEO.---.---.00492 :   INFO)
     csa102: Hello World!  .
      
    /root/asp/var/log/csa102CompI1Log.latest
    Sun Jul 13 22:43:00 2008   (SCNodeI0.13422 : csa102CompEO.---.---.00043 :   INFO)
     csa102: New state is not the ACTIVE; deactivating service
      

    These can be watched in a separate terminal window using tail -f. csa102CompI0 is the active component in this case, and csa102CompI1 is the standby. Consequently, the "Hello world!" lines appear in csa102CompI0.log and not in csa102CompI1.log. They will continue to be logged to that file until the HA state of that component changes, for example, when the process logging those lines is killed. In the mean time the standby component: csa102CompI1 just waits until it is told that it should take over the workload.


Changing the HA state of the Client/Server

The easiest way to test component fail-over is to kill the process associated with the active component using the kill command. For this you need to know the process ID of the active component. To find the process ID issue the following command from a bash shell.

# ps -eaf | grep csa102

This should produce an output that looks similar to the following.

root     15872 15663  0 13:49 ?        00:00:01 csa102Comp -p
root     16328 15663  0 13:56 ?        00:00:00 csa102Comp -p
root     17304 16145  0 14:11 pts/4    00:00:00 grep csa102

Notice the two entries that end with csa102Comp -p. These are our two component processes. The first one is usually the active process. This is the one that we will kill. In this case the process ID is 15872. So to kill the active component you issue the command:

# kill -9 15872

OpenClovis Note.pngIf this step does not result in the active component being killed then it is likely that the standby component was killed. In this case simply try killing the other process.

After executing the kill command you can see in the csa102CompI1 application that the standby component is now active.

/root/asp/var/log/csa102CompI1Log.latest
Sun Jul 13 23:00:18 2008   (SCNodeI0.13422 : csa102CompEO.---.---.00487 :   INFO)
 csa102: Hello World!       .

Sun Jul 13 23:00:19 2008   (SCNodeI0.13422 : csa102CompEO.---.---.00488 :   INFO)
 csa102: Hello World!        .

Sun Jul 13 23:00:20 2008   (SCNodeI0.13422 : csa102CompEO.---.---.00489 :   INFO)
 csa102: Hello World!         .

Sun Jul 13 23:00:21 2008   (SCNodeI0.13422 : csa102CompEO.---.---.00490 :   INFO)
 csa102: Hello World!          .

Sun Jul 13 23:00:22 2008   (SCNodeI0.13422 : csa102CompEO.---.---.00491 :   INFO)
 csa102: Hello World! .

Sun Jul 13 23:00:23 2008   (SCNodeI0.13422 : csa102CompEO.---.---.00492 :   INFO)
 csa102: Hello World!  .     .
  

This indicates that the standby component has taken over for the failed active component.

Looking in the csa102CompI0 application log you can see that this component was killed and has been restarted. Since csa102CompI1 took over as the active component this component now goes into the standby state.

/root/asp/var/log/csa102CompI0Log.latest
Sun Jul 13 22:53:02 2008   (SCNodeI0.13712 : csa102CompEO.---.---.00040 :   INFO)
 Component [csa102CompI0] : PID [13712]. Initializing

Sun Jul 13 22:53:02 2008   (SCNodeI0.13712 : csa102CompEO.---.---.00041 :   INFO)
    IOC Address             : 0x1

Sun Jul 13 22:53:02 2008   (SCNodeI0.13712 : csa102CompEO.---.---.00042 :   INFO)
    IOC Port                : 0x80

Sun Jul 13 22:53:02 2008   (SCNodeI0.13712 : csa102CompEO.---.---.00043 :   INFO)
 csa102: Instantiated as component instance csa102CompI0.

Sun Jul 13 22:53:02 2008   (SCNodeI0.13712 : csa102CompEO.---.---.00044 :   INFO)
 csa102CompI0: Waiting for CSI assignment...
  

You can continue to observe this failover by alternately killing the active component.

To stop csa102 using the SAFplus Platform Console.

cli[Test:SCNodeI0:CPM]-> amsLockAssignment sg csa102SGI0

Successfully changed state of csa102SGI0 to LockAssignment

cli[Test:SCNodeI0:CPM]-> amsLockInstantiation sg csa102SGI0
cli[Test:SCNodeI0:CPM] -> end
cli[Test:SCNodeI0] -> end
cli[Test] -> bye

Successfully changed state of csa102SGI0 to LockInstantiation and exit.

Summary

This Sample Application has covered basic HA and failover, with changing the state of a component to active and standby.