Difference between revisions of "Doc:latest/sdkguide/troubleshooting"

 
(Blanked the page)
 
(6 intermediate revisions by 4 users not shown)
Line 1: Line 1:
==Troubleshooting SAFplus Platform==
 
  
 
'''Conventions:'''
 
 
* '''Sandbox'''
 
 
When you extract SAFplus Platform image (output of make images) into particular location, the following directory structure will be created
 
 
** <top-level>
 
*** bin -- Location of all SAFplus Platform as well as application binaries
 
*** etc -- Location of all SAFplus Platform configuration files
 
*** lib -- Location of all the libraries used by SAFplus Platform (except system)
 
*** modules -- Ignore this
 
*** share -- SNMP MIB related stuff. Ignore if you are not using SNMP features provided by SAFplus Platform.
 
*** var -- All the runtime data generated by SAFplus Platform and their locations inside var/
 
*** all the log files -- var/log
 
*** all the non persistent database files and other run time files -- var/run
 
*** all the persistent database files -- var/lib/<comp-name> where component name is currently ams and cor
 
 
This <top-level> directory is called '''sandbox'''.
 
 
===Environment Variables===
 
 
* '''ASP_SAVE_PREV_LOGS''' -- if set to 1 will save logs in the sandbox from the previous run of SAFplus Platform (if any) during SAFplus Platform startup into location specified by the environment variable ASP_PREV_LOG_DIR or to /tmp/asp_saved_logs, creating the directory if necessary. If set to 0, the logs will deleted. Default value is 1.
 
 
* '''ASP_PREV_LOG_DIR''' -- place where logs of previous SAFplus Platform run will be stored. Default value is /tmp/asp_saved_logs
 
 
* '''ASP_LOG_FILTER_DIR''' -- place where log filter file should be present. It should be named logfilter.txt. Default value /tmp
 
 
* '''CL_LOG_SEVERITY''' -- Environment variable used to set the log filter. You can export this env variable to any log severity like DEBUG, INFO, NOTICE, ERROR, etc. Note that setting of this env variable has higher priority than the log filter file explained above.
 
 
* '''CL_LOG_CODE_LOCATION_ENABLE''' -- Environment variable if set to 1, will show the file name, function name and line number when displaying logs. Default value is 0.
 
 
* '''ASP_RESTART_SAFplus''' -- if set to 1 will restart SAFplus Platform in case AMF crashed or SAFplus Platform was not able to come up for some reason, if set to 0 will not restart SAFplus Platform. Default value is 1.
 
 
* '''ASP_APP_BINDIR''' -- Can be used to specify if the application binaries are stored in a different place than <sandbox>/bin directory.
 
 
===Logging===
 
 
* The logs generated by SAFplus Platform are present in two places:
 
** Standard out/standard error & startup/shutdown: -- stored in <sandbox>/var/log/<node-name.log>
 
*** Output can be controlled by log filter
 
** By Log infrastructure -- stored in <sandbox>/var/log/<stream-name>.latest (by default)
 
*** sys and app are two streams created by default. The "sys" logs are logs coming from the SAFplus Platform, "app" logs come from your programs. For e.g. var/log/sys.latest will contain log messages written to the system stream.
 
*** Output cannot be controlled by log filter
 
* Log filter file
 
 
The log filter file lets you selectively filter logging at log generation time, so that you can focus on debugging a subset of components. The log filter file is a text file located at /tmp/logfilter.txt OR $(ASP_LOG_FILTER_DIR)/logfilter.txt (i.e. you may set this env variable to your preferred directory).
 
 
Log filter file should contain one rule per line, where the format of the rule is as follows:
 
 
<pre>
 
(<node-name>:<component-eo-name>.<area>.<context>)[file_name]=<severity>
 
</pre>
 
 
The '*' matches anything and can be used in any field. Any line starting with # starts a comment.
 
 
<code><node-name></code> is the name of the SAFplus Platform node as found in etc/asp.conf
 
 
<code><component-eo-name></code> can be found by looking in etc/clEoConfig.xml
 
 
 
<code><area></code> and <code><context></code> are string that denote particular sub component of the component.
 
 
 
<code><file_name></code> refers to the name of the file in which the message was logged.
 
 
E.g.:
 
 
* show debugging (and greater severity) messages from all the components, useful when debugging.
 
<code><pre>
 
(*:*.*.*)[*]=DEBUG
 
</pre></code>
 
 
* show debugging messages only from AMF and error messages from all the other components (note that both lines are required)
 
<code><pre> 
 
(*:*.*.*)[*]=ERROR
 
(*:AMF.*.*)[*]=DEBUG
 
</pre></code>
 
 
* show debugging messages of AMF and CKPT
 
<code><pre>
 
(*:*.*.*)[*]=ERROR
 
(*:AMF.*.*)[*]=DEBUG
 
(*:CKP.*.*)[*]=DEBUG
 
</pre></code>
 
 
 
Please refer to section 5.1.1.8 of OpenClovis 3.1 SDK guide for more information.
 
 
* Enabling openais logs
 
** In etc/clGmsConfig.xml file change the value of tag "openAisLoggingOption" from "none" to "stderr".
 
 
===Possible errors during startup of SAFplus Platform and solutions/workarounds===
 
 
As you troubleshoot using the logs, note that when an error occurs the system
 
shuts down and issues a lot of logging describing its shutdown actions. These
 
logs are not going to help diagnose the problem. Instead you need to find the
 
error that caused the shutdown. A general rule of thumb is to look for the
 
first error in the file...
 
 
Please look for messages at the log levels of ERROR, CRITICAL and if they are any of the following, follow the steps as mentioned in the Solution section.
 
 
<hr>
 
<p> '''Error:'''
 
<br>
 
<code>
 
Fri Jan 2 04:50:09 1970 (SCMI0.256802883 : AMF.---.---.00035 : NOTICE) AMF
 
server fully up
 
<br>
 
Fri Jan 2 04:50:09 1970 (SCMI0.256802883 : AMF.CPM.LCM.00037 : NOTICE)
 
Launching binary image [asp_logd] as component [logServer_SCMI0]...
 
<br>
 
Fri Jan 2 04:50:09 1970 (SCMI0.256921623 : LOG.---.---.00015 : ERROR) Error
 
finding opening shared library 'libClSQLiteDB.so': No such file or directory
 
<br>
 
Fri Jan 2 04:50:09 1970 (SCMI0.256921623 : LOG. EO.INI.00016 : CRITIC)
 
Failed to initialize basic library [Dbal], error [0x30011
 
<br>
 
Fri Jan 2 04:50:09 1970 (SCMI0.256921623 : LOG. EO.INI.00017 : CRITIC)
 
Failed to initialize all basic libraries, error [0x30011]
 
<br>
 
Fri Jan 2 04:50:09 1970 (SCMI0.256921623 : LOG. EO.INI.00018 : CRITIC)
 
Exiting : EO setup failed, error [0x30011]"
 
</code>
 
</p>
 
<p> '''Solution:'''
 
<br>
 
This problem means that proper database has not been specified in the file
 
etc/clDbalConfig.xml. Please uncomment the proper "Engine" tag in this
 
file. (For qnx it is libBerkeleyDB.so)
 
</p>
 
 
<hr>
 
<p> '''Error:'''
 
<br>
 
<code>
 
Wed Nov 19 10:20:35 2008 (SysCtrlI0.29013 : GMS.GEN.---.00032 : NOTICE) GMS
 
Server registering with debug service
 
<br>
 
Wed Nov 19 10:20:35 2008 (SysCtrlI0.29013 : GMS.GEN.---.00033 : DEBUG)
 
LINK_NAME env is exported. Value is eth1
 
<br>
 
Wed Nov 19 10:20:36 2008 (SysCtrlI0.29013 : GMS.GEN.---.00034 : EMRGN)
 
ioctl() system call failed while getting details of interface [eth1]
 
<br>
 
Wed Nov 19 10:20:36 2008 (SysCtrlI0.29013 : GMS.GEN.---.00035 : EMRGN) -
 
ioctl() returned error [Cannot assign requested address]
 
<br>
 
Wed Nov 19 10:20:36 2008 (SysCtrlI0.29013 : GMS.GEN.---.00036 : EMRGN) -
 
This could be because the interface [eth1] is not configured on the system
 
<br>
 
Wed Nov 19 10:21:25 2008 (SysCtrlI0.28990 : AMF.TSK.POL.00042 : INFO)
 
Creating new task
 
<br>
 
Wed Nov 19 10:21:25 2008 (SysCtrlI0.28990 : AMF.---.---.00043 : ERROR)
 
Component [gmsServer_SysCtrlI0] did not instantiated within the specified
 
limit
 
<br>
 
Wed Nov 19 10:21:25 2008 (SysCtrlI0.28990 : AMF.---.---.00044 : ERROR)
 
Component gmsServer_SysCtrlI0 did not start within the specified limit
 
</code>
 
</p>
 
<p> '''Solution:'''
 
This error is because of the ethernet interface specified in asp.conf (as
 
value of LINK_NAME env variable) is not present in the system.
 
</p>
 
 
<hr>
 
<p> '''Error:'''
 
<br>
 
<code>
 
Thu Jan 1 02:25:00 1970 (SCMI1.2818074 : AMF.CPM.GMS.00075 : ERROR) Failed
 
to receive GMS cluster track callback, error [0x10014]
 
</code>
 
</p>
 
<p> '''Solution:'''
 
Please make sure that firewall is not enabled on the machine. Try increasing
 
the value of "bootElectionTimeout" in etc/clGmsConfig.xml to some higher
 
value (default value is 5, try making it 10 or 15). If this doesn't help,
 
try enabling the openais logs as indicated in the logging section, to see if
 
any other unwanted node is communicating with this node.
 
</p>
 
 
<hr>
 
<p> '''Error:'''
 
<br>
 
<code>
 
Wed Nov 19 09:33:39 2008  (SysControllerI0.6139 : GMS.LEA.---.00068 :
 
CRITIC) No nodes in the cluster view to run leader election.
 
<br>
 
Wed Nov 19 09:33:39 2008  (SysControllerI0.6139 : GMS.LEA.---.00069 :
 
CRITIC) - This could be because:
 
<br>
 
Wed Nov 19 09:33:39 2008  (SysControllerI0.6139 : GMS.LEA.---.00070 :
 
CRITIC) - Firewall is enabled on your machine which is restricting multicast
 
messages
 
<br>
 
Wed Nov 19 09:33:39 2008  (SysControllerI0.6139 : GMS.LEA.---.00071 :
 
CRITIC) - Use 'iptables -F' to disable firewall
 
<br>
 
Wed Nov 19 09:33:39 2008  (SysControllerI0.6139 : GMS.LEA.---.00072 :
 
ERROR) Error in boot time leader election. rc [0x11]
 
</code>
 
</p>
 
<p> '''Solution:'''
 
This error is coming because, GMS uses IP Multicast and if firewall is
 
configured on your machine, then it would be blocking the multicast
 
messages. Hence GMS is not able to see any node joins in the cluster, as
 
described in the above error message.
 
 
To resolve this problem you can run "iptables -F" command on the machine
 
which will flush all the firewall rules and enable IP multicast. Note that
 
this setting remains until next reboot of the node. So you need to run
 
this command every time you reboot your machine. Otherwise, contact your
 
system administrator to disable or configure the firewall to allow IP
 
multicast.
 
</p>
 
 
<hr>
 
<p> '''Error:'''
 
<br>
 
<code>
 
Thu Jan 1 02:26:55 1970 (SCMI1.2818074 : AMF.AMS.BOO.00116 : CRITIC)
 
Inconsistency between GMS and TIPC configuration detected, master address as
 
per GMS is [0x1], but master address as per TIPC is [0x2]
 
</code>
 
</p>
 
 
<p> '''Solution:'''
 
Please make sure that the value of "multicastPort" is not clashing with that
 
of any other nodes, except the one that are configured in the cluster. Similary make sure that TIPC netid (exported as TIPC_NETID in etc/asp.conf) is not clashing with other unwanted nodes.
 
</p>
 
 
<hr>
 
<p> '''Error:'''
 
<br>
 
<code>
 
Thu Jan 1 01:26:28 1970 (SCMI0.13049881 : AMF.---.---.01234 : CRITIC)
 
Duplicate entry SCMI0 in AMS configuration"
 
</code>
 
</p>
 
 
<p> '''Solution :''' same as above </p>
 
 
<hr>
 
<p> '''Error:'''
 
<br>
 
<code>
 
Error in parsing XML tag <tag-name> in file [clAmfDefinitions.xml]
 
<br>
 
Fri May 23 20:25:17 2008 (SCMI0.9282 : AMF.---.---.00071 : CRITIC) Fn
 
[clAmsParserMain (DEFN_FILE_NAME,CONFIG_FILE_NAME)] returned [0x220105]
 
<br>
 
Fri May 23 20:25:17 2008 (SCMI0.9282 : AMF.---.---.00072 : ERROR) ALERT
 
[clAmsInitialize:293] : Fn
 
[clAmsParserMain(DEFN_FILE_NAME,CONFIG_FILE_NAME)] returned [0x220105]
 
<br>
 
Fri May 23 20:25:17 2008 (SCMI0.9282 : AMF.CPM.AMS.00073 : CRITIC) Unable
 
to initialize AMS, error = [0x220105]
 
</code>
 
</p>
 
<p> '''Solution:'''
 
This means that there was an error in parsing the XML configuration file. This error should not occur unless you have hand edited the XML file. In case the error does occur, it indicates either SAFplus Platform IDE modelling error or bug in SAFplus Platform.
 
</p>
 
 
<hr>
 
<p> '''Error:'''
 
<br>
 
<code>
 
DBAL: .: No such file or directory
 
<br>
 
DBAL: PANIC: No such file or directory
 
<br>
 
DBAL: environment open: DB_RUNRECOVERY: Fatal error, run database recovery
 
</code>
 
</p>
 
 
<p> '''Solution:'''
 
Unfortunately this errors comes only on QNX and we are still looking into resolving the same. Simply restarting the SAFplus Platform may make the problem go away.
 
</p>
 
 
<hr>
 
<p> '''Error:'''
 
<br>
 
<code>
 
Tue Jun 10 21:45:41 2008 (SCMI1.13591 : AMF.CPM.AMS.00372 : CRITIC)
 
Node failfast called on [standby] node [ SCMI0] !!!
 
This indicates that the cluster has become unstable. Doing self
 
shutdown...
 
</code>
 
</p>
 
 
<p> '''Solution:'''
 
This error indicates that:
 
* There was a change in network configuration of the cluster.
 
* Some other nodes are interfering with the cluster. Please make sure TIPC netid and GMS port are same on all the nodes of the cluster and is unique per cluster.
 
</p>
 
 
<hr>
 
<p> '''Error:'''
 
<br>
 
<code>
 
Tue Jul 22 10:33:43 2008 (SCMI0.5935 : AMF.---.---.00192 : ERROR) Component
 
[<comp-name>] did not instantiated within the specified limit
 
<br>
 
Thu Jan 1 01:00:55 1970 (SCMI0.3211296 : AMF.---.---.00376 : ERROR)
 
Component [<comp-name>] instantiate error [0x220014]. Will cleanup
 
</code>
 
</p>
 
 
<p> '''Solution:'''
 
These error messages indicates that SAFplus Platform was not able to start the component within the configured timeout value in clAmfDefinitions.xml file. Possible reasons for this happening are:
 
* Component is taking too long to start. For e.g. it may be taking a lot of time in initialization etc. Increasing the instantiation timeout value (using IDE or if you know what you are doing, directly in clAmfDefinitions.xml file itself) should help in this case.
 
* Component has crashed before registering with AMF. Look for core dump in <sandbox>/var/run/ (for linux) or in /root (for qnx) to see if the component has dumped core.
 
</p>
 
 
<hr>
 
<p> '''Error:'''
 
<br>
 
<code>
 
Thu Jan 1 01:25:43 1970 (SCMI0.13049881 : AMF.---.---.00814 : ERROR) SI
 
[<si-name>] assignment to SU [<su-name>] returned Error [0x220014]
 
</code>
 
</p>
 
 
<p> '''Solution:'''
 
This error message indicates that SAFplus Platform was not able to assign work(SI) for particular SU due to time out. Possible reasons for this error are:
 
* The component crashed in the work assignment processing callback i.e. csi set/remove callback.
 
* The component is not responding with saAmfResponse(clCpmResponse) function within the csi set/remove timeout value configured in clAmfDefinitions.xml file.
 
</p>
 
 
<hr>
 
<p> '''Error:'''
 
<br>
 
<code>
 
Wed Nov 19 09:50:49 2008  (SysControllerI0.20584 : AMF.TRIGGER.INI.00073
 
WARN) Trigger XML parse error. Running with defaults
 
</code>
 
</p>
 
<p> ''' Solution:'''
 
Above warning message are related to AMF entity trigger framework and they
 
can be ignored.
 
</p>
 
 
<hr>
 
<p> '''Error:'''
 
<br>
 
<code>
 
12:25:33:Mon Nov 17 15:32:59 2008  (SysControllerI0.11049 : GMS.GEN.---.
 
00201 :  ERROR) This sync message is not intended for this node
 
</code>
 
</p>
 
 
<p> '''Solution:'''
 
This error message is releated to GMS process group service and can be ignored.
 
</p>
 
 
<hr>
 
<p> '''Error:'''
 
<br>
 
<code>
 
Thu Jan 1 04:25:48 1970 (SCMI0.39657513 : AlarmMgr_EO.ACU.IDG.39051 : ERROR) Failed to get the alarm index for probable cause [2] and specific problem [0]
 
<br>
 
Thu Jan 1 04:25:48 1970 (SCMI0.39657513 : AlarmMgr_EO.ALM.ALE.39052 : ERROR) Failed to get the alarm index for the probable cause [2]:Specific Problem [0]. rc[0x4]
 
<br>
 
Thu Jan 1 04:25:48 1970 (SCMI0.39657513 : AlarmMgr_EO.ALM.ALR.39053 : ERROR) Failed while processing the alarm . rc[0x4]
 
<br>
 
Thu Jan 1 04:25:48 1970 (SCMI0.39657513 : AlarmMgr_EO.ALM.ALR.39054 : ERROR) Failed while Sending RMD to the client owning the alarm resource. rc[0x4]
 
<br>
 
Thu Jan 1 04:25:48 1970 (SCMI0.39657513 : AlarmMgr_EO.---.---.39055 : ERROR) Error in raising alarm, rc [0x4]
 
</code>
 
</p>
 
 
<p> '''Solution:'''
 
This error message shows that the user is trying to raise alarm with invalid probable cause and specific problem combination which is not modeled. Please check the list of alarm profiles attached to that resource, inside alarm owner
 
cl<comp-name>alarmMetaStruct.c file.
 
</p>
 
 
===Informational messages to observe in SAFplus Platform log files===
 
<hr>
 
<p> '''Message:'''
 
<br>
 
<code>
 
Mon Nov 17 15:35:08 2008  (SysControllerI0.11049 : GMS.OPN.AIS.00206 : DEBUG) GMS CONFIGURATION CHANGE
 
<br>
 
Mon Nov 17 15:35:08 2008  (SysControllerI0.11049 : GMS.OPN.AIS.00207 : DEBUG) GMS Configuration:
 
<br>
 
Mon Nov 17 15:35:08 2008  (SysControllerI0.11049 : GMS.OPN.AIS.00208 : DEBUG)        r(0) ip(192.168.11.11)
 
<br>
 
Mon Nov 17 15:35:08 2008  (SysControllerI0.11049 : GMS.OPN.AIS.00209 : DEBUG)        r(0) ip(192.168.90.44)
 
<br>
 
Mon Nov 17 15:35:08 2008  (SysControllerI0.11049 : GMS.OPN.AIS.00210 : DEBUG) Members Left:
 
<br>
 
Mon Nov 17 15:35:08 2008  (SysControllerI0.11049 : GMS.OPN.AIS.00211 : DEBUG)        r(0) ip(192.168.90.43)
 
<br>
 
Mon Nov 17 15:35:08 2008  (SysControllerI0.11049 : GMS.OPN.AIS.00212 : DEBUG) Members Joined:
 
</code>
 
</p>
 
<p> '''Description:'''
 
These messages above are indicating the join/leave states of the nodes in
 
the cluster. In the above sample output, you can note that the node with IP
 
192.168.90.43 has left the cluster and at this point the cluster configuration has nodes 192.168.11.11 and 192.168.90.44.
 
 
This info will be useful when debugging few issues where a node is repeatedly going down or not behaving as expected.
 
</p>
 
 
<hr>
 
<p> '''Message:'''
 
<br>
 
<code>
 
Mon Nov 17 15:35:10 2008  (SysControllerI0.11014 : AMF.CPM.AMS.00827 :
 
CRITIC) CPM/G active got IOC/TIPC notification for node [3] --
 
<br>
 
Mon Nov 17 15:35:10 2008  (SysControllerI0.11014 : AMF.CPM.AMS.00828 :
 
CRITIC) - Possible reasons for this are on node [3] :
 
<br>
 
Mon Nov 17 15:35:10 2008  (SysControllerI0.11014 : AMF.CPM.AMS.00829 :
 
CRITIC) - 1. AMF crashed.
 
<br>
 
Mon Nov 17 15:35:10 2008  (SysControllerI0.11014 : AMF.CPM.AMS.00830 :
 
CRITIC) - 2. AMF was killed.
 
<br>
 
Mon Nov 17 15:35:10 2008  (SysControllerI0.11014 : AMF.CPM.AMS.00831 :
 
CRITIC) - 3. Critical component failed.
 
<br>
 
Mon Nov 17 15:35:10 2008  (SysControllerI0.11014 : AMF.CPM.AMS.00832 :
 
CRITIC) - 4. Kernel panicked.
 
<br>
 
Mon Nov 17 15:35:10 2008  (SysControllerI0.11014 : AMF.CPM.AMS.00833 :
 
CRITIC) - 5. Communication was lost.
 
<br>
 
Mon Nov 17 15:35:10 2008  (SysControllerI0.11014 : AMF.CPM.AMS.00834 :
 
CRITIC) - 6. AMF was shutdown.
 
</code>
 
</p>
 
<p> '''Description:'''
 
Above error message means that this node has detected TIPC notification for node death for node 3 and the message also explains probable reasons for this to happen. So user need to observe if this node death was due to admin operations or due to any error.
 
</p>
 
 
<hr>
 
<p> '''Message:'''
 
<br>
 
<code>
 
Mon Nov 17 15:35:44 2008  (SysControllerI0.11014 : AMF.CPM.GMS.00844 : DEBUG) Received cluster track callback from GMS on node [SysControllerI0]
 
...
 
<br>
 
Mon Nov 17 15:35:44 2008  (SysControllerI0.11014 : AMF.CPM.GMS.00845 : DEBUG) - Leader : [1]
 
<br>
 
Mon Nov 17 15:35:44 2008  (SysControllerI0.11014 : AMF.CPM.GMS.00846 : DEBUG) - Deputy : [3] (-1 -> No deputy)
 
</code>
 
</p>
 
 
<p> '''Description:'''
 
Above message indicates the result of leader election for system controllers. As it is described after this leader election node 1 has become active (leader) and node 3 is standby (deputy). If Deputy value is -1, it indicates that there is only 1 system controller running in the cluster.
 
</p>
 
 
<hr>
 
<p> '''Message:'''
 
<br>
 
<code>
 
Wed Nov 19 10:11:49 2008  (PayloadNodeI0.23101 : GMS.LEA.---.00100 :
 
WARN) Could not elect any leader from the current cluster view.
 
<br>
 
Wed Nov 19 10:11:49 2008  (PayloadNodeI0.23101 : GMS.LEA.---.00101 :  WARN)
 
- Possibly no system controller is running
 
<br>
 
Wed Nov 19 10:11:49 2008  (PayloadNodeI0.23101 : GMS.CLM.---.00102 :  ERROR)
 
Leader election failed. rc = 0x0
 
<br>
 
Wed Nov 19 10:12:11 2008  (PayloadNodeI0.23068 : AMF.CPM.CPM.00081 :  WARN)
 
CPM/G standby/Worker blade waiting for CPM/G active to come up...
 
</code>
 
</p>
 
 
<p> '''Description:'''
 
Above 2 warning messages indicate that a payload node was started without having any system controller node running in the cluster.
 
</p>
 
 
<hr>
 
<p> '''Message:'''
 
<br>
 
<code>
 
Tue Nov 18 16:24:48 2008  (ctrlI0.17729 : AMF.CPM.LCM.00339 : NOTICE)
 
[clCpmComponent.c:1885] Launching binary image [ClusterMgr] as component
 
[ClusterMgrI0]...
 
<br>
 
Tue Nov 18 16:24:48 2008  (ctrlI0.17729 : AMF.CPM.LCM.00340 :  INFO)
 
[clCpmComponent.c:1952] Component [ClusterMgrI0] started, PID [17822]
 
</code>
 
</p>
 
 
<p> '''Description:'''
 
These logs indicate that the AMF attempted to start your component.
 
</p>
 
 
<hr>
 
<p> '''Message:'''
 
<br>
 
<code>
 
Tue Nov 18 16:24:48 2008 (ctrlI0.17729 : AMF. SU.INST.00343 : INFO)
 
[clAmsPolicyEngine.c:6615] SU [ClusterMgrSUI0] instantiated [1] components
 
at level [1]
 
</code>
 
</p>
 
 
<p> '''Description:'''
 
This log indicates that your component registered with the AMF.
 
</p>
 
 
===Essential <code>asp_console</code> commands to understand the system state of SAFplus Platform===
 
 
The asp_console command in <sandbox>/bin directory is a very useful utility
 
to understand the SAFplus Platform state at any point of time. After starting SAFplus Platform, you
 
can start it using:
 
 
<code><pre>
 
$ cd <sandbox>
 
$ ./bin/asp_console
 
</pre></code>
 
 
You will see a message and the following prompt like this:
 
 
<code><pre>
 
To get started, type 'help intro'
 
 
cli[Test]->
 
</pre></code>
 
 
If you are a first time user please give the command "help intro" as
 
indicated and read the brief help message.
 
 
You can press TAB to autocomplete any command or show the list of all the commands.
 
 
Press ? to see the list of available commands
 
 
Typing list (or li, if a command has unique prefix, then just typing that and pressing enter is equivalent to giving that command) in the above prompt gives:
 
 
<code><pre>
 
cli[Test]-> list
 
 
Slot    Node
 
1      SCNode0
 
 
 
cli[Test]->
 
</pre></code>
 
 
It means that currently cluster is running with one node named SCNode0. You
 
MUST see in this list as many nodes as you have started, that are meant to
 
be part of this cluster. Otherwise it means that the nodes are not communicating because of either GMS or TIPC issues as indicated above. Make sure that GMS multicast port as well as TIPC netid is same on all the nodes involved.
 
 
Type the command setc ("set context") followed by the slot to change context to the node "master" i.e. the active node in the cluster.
 
 
<code><pre>
 
cli[Test]-> setc master
 
</pre></code>
 
 
and you should see a prompt like this:
 
 
<code><pre>
 
cli[Test:SCNode0]->
 
</pre></code>
 
 
(please remember that you can type either 'help' or ? to see list of
 
available commands)
 
   
 
You can also "setc" to a particular slot by passing a number:
 
 
<code><pre>
 
cli[Test]-> setc 1
 
</pre></code>
 
 
 
but this is used rarely.
 
 
Type the command setc again this time to change context to a particular  component within node 1 i.e. SCNode0. You can type the command 'list' to see the list of available components. You can register your application component too with this console using debug cli client library.
 
 
<code><pre>
 
cli[Test:SCNode0]-> list
 
cpm
 
corServer_SCNode0
 
faultServer_SCNode0
 
eventServer_SCNode0
 
nameServer_SCNode0
 
txnServer_SCNode0
 
gmsServer_SCNode0
 
alarmServer_SCNode0
 
logServer_SCNode0
 
ckptServer_SCNode0
 
 
 
cli[Test:SCNode0]->
 
</pre></code>
 
 
to change context to AMF for e.g. give the following command
 
 
<code><pre>
 
cli[Test:SCNode0]-> setc cpm
 
 
cli[Test:SCNode0:CPM]->
 
</pre></code>
 
 
You can type help to see the available commands followed by a brief descriptive message about what the command does.
 
 
To move up a context (i.e. from component context to node context or from node context to top level context) type the command 'end'. For e.g.
 
 
<code><pre>
 
cli[Test:SCNode0:CPM]-> end
 
 
cli[Test:SCNode0]->
 
</pre></code>
 
 
The SAFplus Platform components except for the AMF, are suffixed by the node name.
 
 
To view if the cluster is in a consistent state give this command:
 
 
<code><pre>
 
cli[Test:SCNode0]->  setc gmsServer_SCNode0
 
 
cli[Test:SCNode0:gms]-> memberList 0
 
 
--------------------------------------------------------------------------------------------------------
 
 
Cluster/Group Name : cluster0
 
bootTime          : Wed Nov 12 17:36:56 2008
 
View Number        : 2
 
--------------------------------------------------------------------------------------------------------
 
NodeId NodeName        HostAddr Port Leader Credentials PrefLead LeadshipSet BootTime
 
--------------------------------------------------------------------------------------------------------
 
1      SCNode0        1        9    Yes    100        No      No          Wed Nov 12 17:36:56 2008
 
2      SCNode1        2        9    No    100        No      No          Wed Nov 12 19:28:42 2008
 
 
cli[Test:SCNode0:gms]->
 
</pre></code>
 
 
Again, you should see as many nodes as configured in the cluster. In the
 
above output, node with id 1 is a leader.
 
 
To see if a node has come up properly:
 
(don't forget:
 
setc master
 
setc cpm
 
to get into the correct context!)
 
 
<code><pre>
 
cli[Test:SCNode0:CPM]-> amsentityprint node SCNode0
 
 
Name                                          | SCNode0
 
Configuration ------------------------------- | ---------------------------
 
Start time                                    | Wed Nov 12 17:37:02 2008
 
Admin State                                  | Unlocked
 
Computed Admin State                          | Unlocked
 
Id                                            | 0
 
Class Type                                    | Class B
 
SubClass Type                                | 
 
Is Swappable                                  | True
 
Is Restartable                                | True
 
Is SAFplus Platform Aware                                  | True
 
Auto Repair on Join                          | True
 
SU Failover Probation Time                    | 10000 ms
 
SU Failover Count                            | 10
 
Node Dependents                              | Count[0]
 
                                              |    <None>
 
Node Dependencies                            | Count[0]
 
                                              |    <None>
 
SUs in Node                                  | Count[6]
 
                                              |    SC0_twoNAdmin_SU0
 
                                              |    SC0_amsTestGenTC_SU0
 
                                              |    SC0_noRedundancy_SU0
 
                                              |    SC0_twoNRank_SU0
 
                                              |    SC0_twoNRank_Spare_SU1
 
                                              |    SC0_twoNRank_Spare_SU0
 
Status -------------------------------------- | ---------------------------
 
Presence State                                | Instantiated
 
Oper State                                    | Enabled
 
Is Instantiable                              | True
 
Is Cluster Member                            | True
 
Was Cluster Member Before                    | False
 
Last Recovery                                | None
 
Num Instantiated SUs                          | 0
 
Num Assigned SUs                              | 0
 
SU Failover Probation Timer                  | Inactive
 
SU Failover Count                            | 0
 
--------------------------------------------- | ---------------------------
 
Timers Running                                | 0
 
Debug Flags                                  | 0x1
 
</pre></code>
 
 
===Miscellaneous===
 
 
* Please note that by default, SAFplus Platform assumes that the shared memory location (tmpfs) is mounted on standard location /dev/shm and cleans up whatever shared memory segments created by the SAFplus Platform during previous run. If you have mounted your tmpfs some where else, you have two options:
 
** Mount the tmpfs to /dev/shm so that SAFplus Platform will take care of cleaning up shared memory segments. ('''Recommended''')
 
** During packaging of the model add a cleanup script in model-name/src/extras/asp.d/ (creating this location if it is not there) which will be called with 'start' argument before SAFplus Platform starts. This cleanup script can then remove the shared memory segments from whichever location the tmpfs is currently mounted on.
 
* Please note that starting SAFplus Platform "by hand" is not supported, i.e. starting asp_amf executable by command line is not supported. The starting of SAFplus Platform depends for some setup/cleanup actions on the SAFplus Platform startup script and hence SAFplus Platform should always be started using the etc/init.d/asp script.
 

Latest revision as of 07:42, 11 June 2024