|
|
Line 1: |
Line 1: |
− | ==Troubleshooting SAFplus Platform==
| + | http://www.buyxanaxonlinepill.com/ buy xanax online - buy generic xanax online |
− | | + | |
− | | + | |
− | '''Conventions:'''
| + | |
− | | + | |
− | * '''Sandbox'''
| + | |
− | | + | |
− | When you extract SAFplus Platform image (output of make images) into particular location, the following directory structure will be created
| + | |
− | | + | |
− | ** <top-level>
| + | |
− | *** bin -- Location of all SAFplus Platform as well as application binaries
| + | |
− | *** etc -- Location of all SAFplus Platform configuration files
| + | |
− | *** lib -- Location of all the libraries used by SAFplus Platform (except system)
| + | |
− | *** modules -- Ignore this
| + | |
− | *** share -- SNMP MIB related stuff. Ignore if you are not using SNMP features provided by SAFplus Platform.
| + | |
− | *** var -- All the runtime data generated by SAFplus Platform and their locations inside var/
| + | |
− | *** all the log files -- var/log
| + | |
− | *** all the non persistent database files and other run time files -- var/run
| + | |
− | *** all the persistent database files -- var/lib/<comp-name> where component name is currently ams and cor
| + | |
− | | + | |
− | This <top-level> directory is called '''sandbox'''.
| + | |
− | | + | |
− | ===Environment Variables===
| + | |
− | | + | |
− | * '''ASP_SAVE_PREV_LOGS''' -- if set to 1 will save logs in the sandbox from the previous run of SAFplus Platform (if any) during SAFplus Platform startup into location specified by the environment variable ASP_PREV_LOG_DIR or to /tmp/asp_saved_logs, creating the directory if necessary. If set to 0, the logs will deleted. Default value is 1.
| + | |
− | | + | |
− | * '''ASP_PREV_LOG_DIR''' -- place where logs of previous SAFplus Platform run will be stored. Default value is /tmp/asp_saved_logs
| + | |
− | | + | |
− | * '''ASP_LOG_FILTER_DIR''' -- place where log filter file should be present. It should be named logfilter.txt. Default value /tmp
| + | |
− | | + | |
− | * '''CL_LOG_SEVERITY''' -- Environment variable used to set the log filter. You can export this env variable to any log severity like DEBUG, INFO, NOTICE, ERROR, etc. Note that setting of this env variable has higher priority than the log filter file explained above.
| + | |
− | | + | |
− | * '''CL_LOG_CODE_LOCATION_ENABLE''' -- Environment variable if set to 1, will show the file name, function name and line number when displaying logs. Default value is 0.
| + | |
− | | + | |
− | * '''ASP_RESTART_SAFplus''' -- if set to 1 will restart SAFplus Platform in case AMF crashed or SAFplus Platform was not able to come up for some reason, if set to 0 will not restart SAFplus Platform. Default value is 1.
| + | |
− | | + | |
− | * '''ASP_APP_BINDIR''' -- Can be used to specify if the application binaries are stored in a different place than <sandbox>/bin directory.
| + | |
− | | + | |
− | ===Logging===
| + | |
− | | + | |
− | * The logs generated by SAFplus Platform are present in two places:
| + | |
− | ** Standard out/standard error & startup/shutdown: -- stored in <sandbox>/var/log/<node-name.log>
| + | |
− | *** Output can be controlled by log filter
| + | |
− | ** By Log infrastructure -- stored in <sandbox>/var/log/<stream-name>.latest (by default)
| + | |
− | *** sys and app are two streams created by default. The "sys" logs are logs coming from the SAFplus Platform, "app" logs come from your programs. For e.g. var/log/sys.latest will contain log messages written to the system stream.
| + | |
− | *** Output cannot be controlled by log filter
| + | |
− | * Log filter file
| + | |
− | | + | |
− | The log filter file lets you selectively filter logging at log generation time, so that you can focus on debugging a subset of components. The log filter file is a text file located at /tmp/logfilter.txt OR $(ASP_LOG_FILTER_DIR)/logfilter.txt (i.e. you may set this env variable to your preferred directory).
| + | |
− | | + | |
− | Log filter file should contain one rule per line, where the format of the rule is as follows:
| + | |
− | | + | |
− | <pre>
| + | |
− | (<node-name>:<component-eo-name>.<area>.<context>)[file_name]=<severity>
| + | |
− | </pre>
| + | |
− | | + | |
− | The '*' matches anything and can be used in any field. Any line starting with # starts a comment.
| + | |
− | | + | |
− | <code><node-name></code> is the name of the SAFplus Platform node as found in etc/asp.conf
| + | |
− | | + | |
− | <code><component-eo-name></code> can be found by looking in etc/clEoConfig.xml
| + | |
− |
| + | |
− | <code><area></code> and <code><context></code> are string that denote particular sub component of the component.
| + | |
− |
| + | |
− | <code><file_name></code> refers to the name of the file in which the message was logged.
| + | |
− | | + | |
− | E.g.:
| + | |
− | | + | |
− | * show debugging (and greater severity) messages from all the components, useful when debugging.
| + | |
− | <code><pre>
| + | |
− | (*:*.*.*)[*]=DEBUG
| + | |
− | </pre></code>
| + | |
− | | + | |
− | * show debugging messages only from AMF and error messages from all the other components (note that both lines are required)
| + | |
− | <code><pre>
| + | |
− | (*:*.*.*)[*]=ERROR
| + | |
− | (*:AMF.*.*)[*]=DEBUG
| + | |
− | </pre></code>
| + | |
− | | + | |
− | * show debugging messages of AMF and CKPT
| + | |
− | <code><pre>
| + | |
− | (*:*.*.*)[*]=ERROR
| + | |
− | (*:AMF.*.*)[*]=DEBUG
| + | |
− | (*:CKP.*.*)[*]=DEBUG
| + | |
− | </pre></code>
| + | |
− |
| + | |
− | Please refer to section 5.1.1.8 of OpenClovis 3.1 SDK guide for more information.
| + | |
− | | + | |
− | * Enabling openais logs
| + | |
− | ** In etc/clGmsConfig.xml file change the value of tag "openAisLoggingOption" from "none" to "stderr".
| + | |
− | | + | |
− | ===Possible errors during startup of SAFplus Platform and solutions/workarounds===
| + | |
− | | + | |
− | As you troubleshoot using the logs, note that when an error occurs the system
| + | |
− | shuts down and issues a lot of logging describing its shutdown actions. These
| + | |
− | logs are not going to help diagnose the problem. Instead you need to find the
| + | |
− | error that caused the shutdown. A general rule of thumb is to look for the
| + | |
− | first error in the file...
| + | |
− | | + | |
− | Please look for messages at the log levels of ERROR, CRITICAL and if they are any of the following, follow the steps as mentioned in the Solution section.
| + | |
− | | + | |
− | <hr>
| + | |
− | <p> '''Error:'''
| + | |
− | <br>
| + | |
− | <code>
| + | |
− | Fri Jan 2 04:50:09 1970 (SCMI0.256802883 : AMF.---.---.00035 : NOTICE) AMF
| + | |
− | server fully up
| + | |
− | <br>
| + | |
− | Fri Jan 2 04:50:09 1970 (SCMI0.256802883 : AMF.CPM.LCM.00037 : NOTICE)
| + | |
− | Launching binary image [asp_logd] as component [logServer_SCMI0]...
| + | |
− | <br>
| + | |
− | Fri Jan 2 04:50:09 1970 (SCMI0.256921623 : LOG.---.---.00015 : ERROR) Error
| + | |
− | finding opening shared library 'libClSQLiteDB.so': No such file or directory
| + | |
− | <br>
| + | |
− | Fri Jan 2 04:50:09 1970 (SCMI0.256921623 : LOG. EO.INI.00016 : CRITIC)
| + | |
− | Failed to initialize basic library [Dbal], error [0x30011
| + | |
− | <br>
| + | |
− | Fri Jan 2 04:50:09 1970 (SCMI0.256921623 : LOG. EO.INI.00017 : CRITIC)
| + | |
− | Failed to initialize all basic libraries, error [0x30011]
| + | |
− | <br>
| + | |
− | Fri Jan 2 04:50:09 1970 (SCMI0.256921623 : LOG. EO.INI.00018 : CRITIC)
| + | |
− | Exiting : EO setup failed, error [0x30011]"
| + | |
− | </code>
| + | |
− | </p>
| + | |
− | <p> '''Solution:'''
| + | |
− | <br>
| + | |
− | This problem means that proper database has not been specified in the file
| + | |
− | etc/clDbalConfig.xml. Please uncomment the proper "Engine" tag in this
| + | |
− | file. (For qnx it is libBerkeleyDB.so)
| + | |
− | </p>
| + | |
− | | + | |
− | <hr>
| + | |
− | <p> '''Error:'''
| + | |
− | <br>
| + | |
− | <code>
| + | |
− | Wed Nov 19 10:20:35 2008 (SysCtrlI0.29013 : GMS.GEN.---.00032 : NOTICE) GMS
| + | |
− | Server registering with debug service
| + | |
− | <br>
| + | |
− | Wed Nov 19 10:20:35 2008 (SysCtrlI0.29013 : GMS.GEN.---.00033 : DEBUG)
| + | |
− | LINK_NAME env is exported. Value is eth1
| + | |
− | <br>
| + | |
− | Wed Nov 19 10:20:36 2008 (SysCtrlI0.29013 : GMS.GEN.---.00034 : EMRGN)
| + | |
− | ioctl() system call failed while getting details of interface [eth1]
| + | |
− | <br>
| + | |
− | Wed Nov 19 10:20:36 2008 (SysCtrlI0.29013 : GMS.GEN.---.00035 : EMRGN) -
| + | |
− | ioctl() returned error [Cannot assign requested address]
| + | |
− | <br>
| + | |
− | Wed Nov 19 10:20:36 2008 (SysCtrlI0.29013 : GMS.GEN.---.00036 : EMRGN) -
| + | |
− | This could be because the interface [eth1] is not configured on the system
| + | |
− | <br>
| + | |
− | Wed Nov 19 10:21:25 2008 (SysCtrlI0.28990 : AMF.TSK.POL.00042 : INFO)
| + | |
− | Creating new task
| + | |
− | <br>
| + | |
− | Wed Nov 19 10:21:25 2008 (SysCtrlI0.28990 : AMF.---.---.00043 : ERROR)
| + | |
− | Component [gmsServer_SysCtrlI0] did not instantiated within the specified
| + | |
− | limit
| + | |
− | <br>
| + | |
− | Wed Nov 19 10:21:25 2008 (SysCtrlI0.28990 : AMF.---.---.00044 : ERROR)
| + | |
− | Component gmsServer_SysCtrlI0 did not start within the specified limit
| + | |
− | </code>
| + | |
− | </p>
| + | |
− | <p> '''Solution:'''
| + | |
− | This error is because of the ethernet interface specified in asp.conf (as
| + | |
− | value of LINK_NAME env variable) is not present in the system.
| + | |
− | </p>
| + | |
− | | + | |
− | <hr>
| + | |
− | <p> '''Error:'''
| + | |
− | <br>
| + | |
− | <code>
| + | |
− | Thu Jan 1 02:25:00 1970 (SCMI1.2818074 : AMF.CPM.GMS.00075 : ERROR) Failed
| + | |
− | to receive GMS cluster track callback, error [0x10014]
| + | |
− | </code>
| + | |
− | </p>
| + | |
− | <p> '''Solution:'''
| + | |
− | Please make sure that firewall is not enabled on the machine. Try increasing
| + | |
− | the value of "bootElectionTimeout" in etc/clGmsConfig.xml to some higher
| + | |
− | value (default value is 5, try making it 10 or 15). If this doesn't help,
| + | |
− | try enabling the openais logs as indicated in the logging section, to see if
| + | |
− | any other unwanted node is communicating with this node.
| + | |
− | </p>
| + | |
− | | + | |
− | <hr>
| + | |
− | <p> '''Error:'''
| + | |
− | <br>
| + | |
− | <code>
| + | |
− | Wed Nov 19 09:33:39 2008 (SysControllerI0.6139 : GMS.LEA.---.00068 :
| + | |
− | CRITIC) No nodes in the cluster view to run leader election.
| + | |
− | <br>
| + | |
− | Wed Nov 19 09:33:39 2008 (SysControllerI0.6139 : GMS.LEA.---.00069 :
| + | |
− | CRITIC) - This could be because:
| + | |
− | <br>
| + | |
− | Wed Nov 19 09:33:39 2008 (SysControllerI0.6139 : GMS.LEA.---.00070 :
| + | |
− | CRITIC) - Firewall is enabled on your machine which is restricting multicast
| + | |
− | messages
| + | |
− | <br>
| + | |
− | Wed Nov 19 09:33:39 2008 (SysControllerI0.6139 : GMS.LEA.---.00071 :
| + | |
− | CRITIC) - Use 'iptables -F' to disable firewall
| + | |
− | <br>
| + | |
− | Wed Nov 19 09:33:39 2008 (SysControllerI0.6139 : GMS.LEA.---.00072 :
| + | |
− | ERROR) Error in boot time leader election. rc [0x11]
| + | |
− | </code>
| + | |
− | </p>
| + | |
− | <p> '''Solution:'''
| + | |
− | This error is coming because, GMS uses IP Multicast and if firewall is
| + | |
− | configured on your machine, then it would be blocking the multicast
| + | |
− | messages. Hence GMS is not able to see any node joins in the cluster, as
| + | |
− | described in the above error message.
| + | |
− | | + | |
− | To resolve this problem you can run "iptables -F" command on the machine
| + | |
− | which will flush all the firewall rules and enable IP multicast. Note that
| + | |
− | this setting remains until next reboot of the node. So you need to run
| + | |
− | this command every time you reboot your machine. Otherwise, contact your
| + | |
− | system administrator to disable or configure the firewall to allow IP
| + | |
− | multicast.
| + | |
− | </p>
| + | |
− | | + | |
− | <hr>
| + | |
− | <p> '''Error:'''
| + | |
− | <br>
| + | |
− | <code>
| + | |
− | Thu Jan 1 02:26:55 1970 (SCMI1.2818074 : AMF.AMS.BOO.00116 : CRITIC)
| + | |
− | Inconsistency between GMS and TIPC configuration detected, master address as
| + | |
− | per GMS is [0x1], but master address as per TIPC is [0x2]
| + | |
− | </code>
| + | |
− | </p>
| + | |
− | | + | |
− | <p> '''Solution:'''
| + | |
− | Please make sure that the value of "multicastPort" is not clashing with that
| + | |
− | of any other nodes, except the one that are configured in the cluster. Similary make sure that TIPC netid (exported as TIPC_NETID in etc/asp.conf) is not clashing with other unwanted nodes.
| + | |
− | </p>
| + | |
− | | + | |
− | <hr>
| + | |
− | <p> '''Error:'''
| + | |
− | <br>
| + | |
− | <code>
| + | |
− | Thu Jan 1 01:26:28 1970 (SCMI0.13049881 : AMF.---.---.01234 : CRITIC)
| + | |
− | Duplicate entry SCMI0 in AMS configuration"
| + | |
− | </code>
| + | |
− | </p>
| + | |
− | | + | |
− | <p> '''Solution :''' same as above </p>
| + | |
− | | + | |
− | <hr>
| + | |
− | <p> '''Error:'''
| + | |
− | <br>
| + | |
− | <code>
| + | |
− | Error in parsing XML tag <tag-name> in file [clAmfDefinitions.xml]
| + | |
− | <br>
| + | |
− | Fri May 23 20:25:17 2008 (SCMI0.9282 : AMF.---.---.00071 : CRITIC) Fn
| + | |
− | [clAmsParserMain (DEFN_FILE_NAME,CONFIG_FILE_NAME)] returned [0x220105]
| + | |
− | <br>
| + | |
− | Fri May 23 20:25:17 2008 (SCMI0.9282 : AMF.---.---.00072 : ERROR) ALERT
| + | |
− | [clAmsInitialize:293] : Fn
| + | |
− | [clAmsParserMain(DEFN_FILE_NAME,CONFIG_FILE_NAME)] returned [0x220105]
| + | |
− | <br>
| + | |
− | Fri May 23 20:25:17 2008 (SCMI0.9282 : AMF.CPM.AMS.00073 : CRITIC) Unable
| + | |
− | to initialize AMS, error = [0x220105]
| + | |
− | </code>
| + | |
− | </p>
| + | |
− | <p> '''Solution:'''
| + | |
− | This means that there was an error in parsing the XML configuration file. This error should not occur unless you have hand edited the XML file. In case the error does occur, it indicates either SAFplus Platform IDE modelling error or bug in SAFplus Platform.
| + | |
− | </p>
| + | |
− | | + | |
− | <hr>
| + | |
− | <p> '''Error:'''
| + | |
− | <br>
| + | |
− | <code>
| + | |
− | DBAL: .: No such file or directory
| + | |
− | <br>
| + | |
− | DBAL: PANIC: No such file or directory
| + | |
− | <br>
| + | |
− | DBAL: environment open: DB_RUNRECOVERY: Fatal error, run database recovery
| + | |
− | </code>
| + | |
− | </p>
| + | |
− | | + | |
− | <p> '''Solution:'''
| + | |
− | Unfortunately this errors comes only on QNX and we are still looking into resolving the same. Simply restarting the SAFplus Platform may make the problem go away.
| + | |
− | </p>
| + | |
− | | + | |
− | <hr>
| + | |
− | <p> '''Error:'''
| + | |
− | <br>
| + | |
− | <code>
| + | |
− | Tue Jun 10 21:45:41 2008 (SCMI1.13591 : AMF.CPM.AMS.00372 : CRITIC)
| + | |
− | Node failfast called on [standby] node [ SCMI0] !!!
| + | |
− | This indicates that the cluster has become unstable. Doing self
| + | |
− | shutdown...
| + | |
− | </code>
| + | |
− | </p>
| + | |
− | | + | |
− | <p> '''Solution:'''
| + | |
− | This error indicates that:
| + | |
− | * There was a change in network configuration of the cluster.
| + | |
− | * Some other nodes are interfering with the cluster. Please make sure TIPC netid and GMS port are same on all the nodes of the cluster and is unique per cluster.
| + | |
− | </p>
| + | |
− | | + | |
− | <hr>
| + | |
− | <p> '''Error:'''
| + | |
− | <br>
| + | |
− | <code>
| + | |
− | Tue Jul 22 10:33:43 2008 (SCMI0.5935 : AMF.---.---.00192 : ERROR) Component
| + | |
− | [<comp-name>] did not instantiated within the specified limit
| + | |
− | <br>
| + | |
− | Thu Jan 1 01:00:55 1970 (SCMI0.3211296 : AMF.---.---.00376 : ERROR)
| + | |
− | Component [<comp-name>] instantiate error [0x220014]. Will cleanup
| + | |
− | </code>
| + | |
− | </p>
| + | |
− | | + | |
− | <p> '''Solution:'''
| + | |
− | These error messages indicates that SAFplus Platform was not able to start the component within the configured timeout value in clAmfDefinitions.xml file. Possible reasons for this happening are:
| + | |
− | * Component is taking too long to start. For e.g. it may be taking a lot of time in initialization etc. Increasing the instantiation timeout value (using IDE or if you know what you are doing, directly in clAmfDefinitions.xml file itself) should help in this case.
| + | |
− | * Component has crashed before registering with AMF. Look for core dump in <sandbox>/var/run/ (for linux) or in /root (for qnx) to see if the component has dumped core.
| + | |
− | </p>
| + | |
− | | + | |
− | <hr>
| + | |
− | <p> '''Error:'''
| + | |
− | <br>
| + | |
− | <code>
| + | |
− | Thu Jan 1 01:25:43 1970 (SCMI0.13049881 : AMF.---.---.00814 : ERROR) SI
| + | |
− | [<si-name>] assignment to SU [<su-name>] returned Error [0x220014]
| + | |
− | </code>
| + | |
− | </p>
| + | |
− | | + | |
− | <p> '''Solution:'''
| + | |
− | This error message indicates that SAFplus Platform was not able to assign work(SI) for particular SU due to time out. Possible reasons for this error are:
| + | |
− | * The component crashed in the work assignment processing callback i.e. csi set/remove callback.
| + | |
− | * The component is not responding with saAmfResponse(clCpmResponse) function within the csi set/remove timeout value configured in clAmfDefinitions.xml file.
| + | |
− | </p>
| + | |
− | | + | |
− | <hr>
| + | |
− | <p> '''Error:'''
| + | |
− | <br>
| + | |
− | <code>
| + | |
− | Wed Nov 19 09:50:49 2008 (SysControllerI0.20584 : AMF.TRIGGER.INI.00073
| + | |
− | WARN) Trigger XML parse error. Running with defaults
| + | |
− | </code>
| + | |
− | </p>
| + | |
− | <p> ''' Solution:'''
| + | |
− | Above warning message are related to AMF entity trigger framework and they
| + | |
− | can be ignored.
| + | |
− | </p>
| + | |
− | | + | |
− | <hr>
| + | |
− | <p> '''Error:'''
| + | |
− | <br>
| + | |
− | <code>
| + | |
− | 12:25:33:Mon Nov 17 15:32:59 2008 (SysControllerI0.11049 : GMS.GEN.---.
| + | |
− | 00201 : ERROR) This sync message is not intended for this node
| + | |
− | </code>
| + | |
− | </p>
| + | |
− | | + | |
− | <p> '''Solution:'''
| + | |
− | This error message is releated to GMS process group service and can be ignored.
| + | |
− | </p>
| + | |
− | | + | |
− | <hr>
| + | |
− | <p> '''Error:'''
| + | |
− | <br>
| + | |
− | <code>
| + | |
− | Thu Jan 1 04:25:48 1970 (SCMI0.39657513 : AlarmMgr_EO.ACU.IDG.39051 : ERROR) Failed to get the alarm index for probable cause [2] and specific problem [0]
| + | |
− | <br>
| + | |
− | Thu Jan 1 04:25:48 1970 (SCMI0.39657513 : AlarmMgr_EO.ALM.ALE.39052 : ERROR) Failed to get the alarm index for the probable cause [2]:Specific Problem [0]. rc[0x4]
| + | |
− | <br>
| + | |
− | Thu Jan 1 04:25:48 1970 (SCMI0.39657513 : AlarmMgr_EO.ALM.ALR.39053 : ERROR) Failed while processing the alarm . rc[0x4]
| + | |
− | <br>
| + | |
− | Thu Jan 1 04:25:48 1970 (SCMI0.39657513 : AlarmMgr_EO.ALM.ALR.39054 : ERROR) Failed while Sending RMD to the client owning the alarm resource. rc[0x4]
| + | |
− | <br>
| + | |
− | Thu Jan 1 04:25:48 1970 (SCMI0.39657513 : AlarmMgr_EO.---.---.39055 : ERROR) Error in raising alarm, rc [0x4]
| + | |
− | </code>
| + | |
− | </p>
| + | |
− | | + | |
− | <p> '''Solution:'''
| + | |
− | This error message shows that the user is trying to raise alarm with invalid probable cause and specific problem combination which is not modeled. Please check the list of alarm profiles attached to that resource, inside alarm owner
| + | |
− | cl<comp-name>alarmMetaStruct.c file.
| + | |
− | </p>
| + | |
− | | + | |
− | ===Informational messages to observe in SAFplus Platform log files===
| + | |
− | <hr>
| + | |
− | <p> '''Message:'''
| + | |
− | <br>
| + | |
− | <code>
| + | |
− | Mon Nov 17 15:35:08 2008 (SysControllerI0.11049 : GMS.OPN.AIS.00206 : DEBUG) GMS CONFIGURATION CHANGE
| + | |
− | <br>
| + | |
− | Mon Nov 17 15:35:08 2008 (SysControllerI0.11049 : GMS.OPN.AIS.00207 : DEBUG) GMS Configuration:
| + | |
− | <br>
| + | |
− | Mon Nov 17 15:35:08 2008 (SysControllerI0.11049 : GMS.OPN.AIS.00208 : DEBUG) r(0) ip(192.168.11.11)
| + | |
− | <br>
| + | |
− | Mon Nov 17 15:35:08 2008 (SysControllerI0.11049 : GMS.OPN.AIS.00209 : DEBUG) r(0) ip(192.168.90.44)
| + | |
− | <br>
| + | |
− | Mon Nov 17 15:35:08 2008 (SysControllerI0.11049 : GMS.OPN.AIS.00210 : DEBUG) Members Left:
| + | |
− | <br>
| + | |
− | Mon Nov 17 15:35:08 2008 (SysControllerI0.11049 : GMS.OPN.AIS.00211 : DEBUG) r(0) ip(192.168.90.43)
| + | |
− | <br>
| + | |
− | Mon Nov 17 15:35:08 2008 (SysControllerI0.11049 : GMS.OPN.AIS.00212 : DEBUG) Members Joined:
| + | |
− | </code>
| + | |
− | </p>
| + | |
− | <p> '''Description:'''
| + | |
− | These messages above are indicating the join/leave states of the nodes in
| + | |
− | the cluster. In the above sample output, you can note that the node with IP
| + | |
− | 192.168.90.43 has left the cluster and at this point the cluster configuration has nodes 192.168.11.11 and 192.168.90.44.
| + | |
− | | + | |
− | This info will be useful when debugging few issues where a node is repeatedly going down or not behaving as expected.
| + | |
− | </p>
| + | |
− | | + | |
− | <hr>
| + | |
− | <p> '''Message:'''
| + | |
− | <br>
| + | |
− | <code>
| + | |
− | Mon Nov 17 15:35:10 2008 (SysControllerI0.11014 : AMF.CPM.AMS.00827 :
| + | |
− | CRITIC) CPM/G active got IOC/TIPC notification for node [3] --
| + | |
− | <br>
| + | |
− | Mon Nov 17 15:35:10 2008 (SysControllerI0.11014 : AMF.CPM.AMS.00828 :
| + | |
− | CRITIC) - Possible reasons for this are on node [3] :
| + | |
− | <br>
| + | |
− | Mon Nov 17 15:35:10 2008 (SysControllerI0.11014 : AMF.CPM.AMS.00829 :
| + | |
− | CRITIC) - 1. AMF crashed.
| + | |
− | <br>
| + | |
− | Mon Nov 17 15:35:10 2008 (SysControllerI0.11014 : AMF.CPM.AMS.00830 :
| + | |
− | CRITIC) - 2. AMF was killed.
| + | |
− | <br>
| + | |
− | Mon Nov 17 15:35:10 2008 (SysControllerI0.11014 : AMF.CPM.AMS.00831 :
| + | |
− | CRITIC) - 3. Critical component failed.
| + | |
− | <br>
| + | |
− | Mon Nov 17 15:35:10 2008 (SysControllerI0.11014 : AMF.CPM.AMS.00832 :
| + | |
− | CRITIC) - 4. Kernel panicked.
| + | |
− | <br>
| + | |
− | Mon Nov 17 15:35:10 2008 (SysControllerI0.11014 : AMF.CPM.AMS.00833 :
| + | |
− | CRITIC) - 5. Communication was lost.
| + | |
− | <br>
| + | |
− | Mon Nov 17 15:35:10 2008 (SysControllerI0.11014 : AMF.CPM.AMS.00834 :
| + | |
− | CRITIC) - 6. AMF was shutdown.
| + | |
− | </code>
| + | |
− | </p>
| + | |
− | <p> '''Description:'''
| + | |
− | Above error message means that this node has detected TIPC notification for node death for node 3 and the message also explains probable reasons for this to happen. So user need to observe if this node death was due to admin operations or due to any error.
| + | |
− | </p>
| + | |
− | | + | |
− | <hr>
| + | |
− | <p> '''Message:'''
| + | |
− | <br>
| + | |
− | <code>
| + | |
− | Mon Nov 17 15:35:44 2008 (SysControllerI0.11014 : AMF.CPM.GMS.00844 : DEBUG) Received cluster track callback from GMS on node [SysControllerI0]
| + | |
− | ...
| + | |
− | <br>
| + | |
− | Mon Nov 17 15:35:44 2008 (SysControllerI0.11014 : AMF.CPM.GMS.00845 : DEBUG) - Leader : [1]
| + | |
− | <br>
| + | |
− | Mon Nov 17 15:35:44 2008 (SysControllerI0.11014 : AMF.CPM.GMS.00846 : DEBUG) - Deputy : [3] (-1 -> No deputy)
| + | |
− | </code>
| + | |
− | </p>
| + | |
− | | + | |
− | <p> '''Description:'''
| + | |
− | Above message indicates the result of leader election for system controllers. As it is described after this leader election node 1 has become active (leader) and node 3 is standby (deputy). If Deputy value is -1, it indicates that there is only 1 system controller running in the cluster.
| + | |
− | </p>
| + | |
− | | + | |
− | <hr>
| + | |
− | <p> '''Message:'''
| + | |
− | <br>
| + | |
− | <code>
| + | |
− | Wed Nov 19 10:11:49 2008 (PayloadNodeI0.23101 : GMS.LEA.---.00100 :
| + | |
− | WARN) Could not elect any leader from the current cluster view.
| + | |
− | <br>
| + | |
− | Wed Nov 19 10:11:49 2008 (PayloadNodeI0.23101 : GMS.LEA.---.00101 : WARN)
| + | |
− | - Possibly no system controller is running
| + | |
− | <br>
| + | |
− | Wed Nov 19 10:11:49 2008 (PayloadNodeI0.23101 : GMS.CLM.---.00102 : ERROR)
| + | |
− | Leader election failed. rc = 0x0
| + | |
− | <br>
| + | |
− | Wed Nov 19 10:12:11 2008 (PayloadNodeI0.23068 : AMF.CPM.CPM.00081 : WARN)
| + | |
− | CPM/G standby/Worker blade waiting for CPM/G active to come up...
| + | |
− | </code>
| + | |
− | </p>
| + | |
− | | + | |
− | <p> '''Description:'''
| + | |
− | Above 2 warning messages indicate that a payload node was started without having any system controller node running in the cluster.
| + | |
− | </p>
| + | |
− | | + | |
− | <hr>
| + | |
− | <p> '''Message:'''
| + | |
− | <br>
| + | |
− | <code>
| + | |
− | Tue Nov 18 16:24:48 2008 (ctrlI0.17729 : AMF.CPM.LCM.00339 : NOTICE)
| + | |
− | [clCpmComponent.c:1885] Launching binary image [ClusterMgr] as component
| + | |
− | [ClusterMgrI0]...
| + | |
− | <br>
| + | |
− | Tue Nov 18 16:24:48 2008 (ctrlI0.17729 : AMF.CPM.LCM.00340 : INFO)
| + | |
− | [clCpmComponent.c:1952] Component [ClusterMgrI0] started, PID [17822]
| + | |
− | </code>
| + | |
− | </p>
| + | |
− | | + | |
− | <p> '''Description:'''
| + | |
− | These logs indicate that the AMF attempted to start your component.
| + | |
− | </p>
| + | |
− | | + | |
− | <hr>
| + | |
− | <p> '''Message:'''
| + | |
− | <br>
| + | |
− | <code>
| + | |
− | Tue Nov 18 16:24:48 2008 (ctrlI0.17729 : AMF. SU.INST.00343 : INFO)
| + | |
− | [clAmsPolicyEngine.c:6615] SU [ClusterMgrSUI0] instantiated [1] components
| + | |
− | at level [1]
| + | |
− | </code>
| + | |
− | </p>
| + | |
− | | + | |
− | <p> '''Description:'''
| + | |
− | This log indicates that your component registered with the AMF.
| + | |
− | </p>
| + | |
− | | + | |
− | ===Essential <code>asp_console</code> commands to understand the system state of SAFplus Platform===
| + | |
− | | + | |
− | The asp_console command in <sandbox>/bin directory is a very useful utility
| + | |
− | to understand the SAFplus Platform state at any point of time. After starting SAFplus Platform, you
| + | |
− | can start it using:
| + | |
− | | + | |
− | <code><pre>
| + | |
− | $ cd <sandbox>
| + | |
− | $ ./bin/asp_console
| + | |
− | </pre></code>
| + | |
− | | + | |
− | You will see a message and the following prompt like this:
| + | |
− | | + | |
− | <code><pre>
| + | |
− | To get started, type 'help intro'
| + | |
− | | + | |
− | cli[Test]->
| + | |
− | </pre></code>
| + | |
− | | + | |
− | If you are a first time user please give the command "help intro" as
| + | |
− | indicated and read the brief help message.
| + | |
− | | + | |
− | You can press TAB to autocomplete any command or show the list of all the commands.
| + | |
− | | + | |
− | Press ? to see the list of available commands
| + | |
− | | + | |
− | Typing list (or li, if a command has unique prefix, then just typing that and pressing enter is equivalent to giving that command) in the above prompt gives:
| + | |
− | | + | |
− | <code><pre>
| + | |
− | cli[Test]-> list
| + | |
− | | + | |
− | Slot Node
| + | |
− | 1 SCNode0
| + | |
− | | + | |
− | | + | |
− | cli[Test]->
| + | |
− | </pre></code>
| + | |
− | | + | |
− | It means that currently cluster is running with one node named SCNode0. You
| + | |
− | MUST see in this list as many nodes as you have started, that are meant to
| + | |
− | be part of this cluster. Otherwise it means that the nodes are not communicating because of either GMS or TIPC issues as indicated above. Make sure that GMS multicast port as well as TIPC netid is same on all the nodes involved.
| + | |
− | | + | |
− | Type the command setc ("set context") followed by the slot to change context to the node "master" i.e. the active node in the cluster.
| + | |
− | | + | |
− | <code><pre>
| + | |
− | cli[Test]-> setc master
| + | |
− | </pre></code>
| + | |
− | | + | |
− | and you should see a prompt like this:
| + | |
− | | + | |
− | <code><pre>
| + | |
− | cli[Test:SCNode0]->
| + | |
− | </pre></code>
| + | |
− | | + | |
− | (please remember that you can type either 'help' or ? to see list of
| + | |
− | available commands)
| + | |
− |
| + | |
− | You can also "setc" to a particular slot by passing a number:
| + | |
− | | + | |
− | <code><pre>
| + | |
− | cli[Test]-> setc 1
| + | |
− | </pre></code>
| + | |
− |
| + | |
− | but this is used rarely.
| + | |
− | | + | |
− | Type the command setc again this time to change context to a particular component within node 1 i.e. SCNode0. You can type the command 'list' to see the list of available components. You can register your application component too with this console using debug cli client library.
| + | |
− | | + | |
− | <code><pre>
| + | |
− | cli[Test:SCNode0]-> list
| + | |
− | cpm
| + | |
− | corServer_SCNode0
| + | |
− | faultServer_SCNode0
| + | |
− | eventServer_SCNode0
| + | |
− | nameServer_SCNode0
| + | |
− | txnServer_SCNode0
| + | |
− | gmsServer_SCNode0
| + | |
− | alarmServer_SCNode0
| + | |
− | logServer_SCNode0
| + | |
− | ckptServer_SCNode0
| + | |
− | | + | |
− | | + | |
− | cli[Test:SCNode0]->
| + | |
− | </pre></code>
| + | |
− | | + | |
− | to change context to AMF for e.g. give the following command
| + | |
− | | + | |
− | <code><pre>
| + | |
− | cli[Test:SCNode0]-> setc cpm
| + | |
− | | + | |
− | cli[Test:SCNode0:CPM]->
| + | |
− | </pre></code>
| + | |
− | | + | |
− | You can type help to see the available commands followed by a brief descriptive message about what the command does.
| + | |
− | | + | |
− | To move up a context (i.e. from component context to node context or from node context to top level context) type the command 'end'. For e.g.
| + | |
− | | + | |
− | <code><pre>
| + | |
− | cli[Test:SCNode0:CPM]-> end
| + | |
− | | + | |
− | cli[Test:SCNode0]->
| + | |
− | </pre></code>
| + | |
− | | + | |
− | The SAFplus Platform components except for the AMF, are suffixed by the node name.
| + | |
− | | + | |
− | To view if the cluster is in a consistent state give this command:
| + | |
− | | + | |
− | <code><pre>
| + | |
− | cli[Test:SCNode0]-> setc gmsServer_SCNode0
| + | |
− | | + | |
− | cli[Test:SCNode0:gms]-> memberList 0
| + | |
− | | + | |
− | --------------------------------------------------------------------------------------------------------
| + | |
− | | + | |
− | Cluster/Group Name : cluster0
| + | |
− | bootTime : Wed Nov 12 17:36:56 2008
| + | |
− | View Number : 2
| + | |
− | --------------------------------------------------------------------------------------------------------
| + | |
− | NodeId NodeName HostAddr Port Leader Credentials PrefLead LeadshipSet BootTime
| + | |
− | --------------------------------------------------------------------------------------------------------
| + | |
− | 1 SCNode0 1 9 Yes 100 No No Wed Nov 12 17:36:56 2008
| + | |
− | 2 SCNode1 2 9 No 100 No No Wed Nov 12 19:28:42 2008
| + | |
− | | + | |
− | cli[Test:SCNode0:gms]->
| + | |
− | </pre></code>
| + | |
− | | + | |
− | Again, you should see as many nodes as configured in the cluster. In the
| + | |
− | above output, node with id 1 is a leader.
| + | |
− | | + | |
− | To see if a node has come up properly:
| + | |
− | (don't forget:
| + | |
− | setc master
| + | |
− | setc cpm
| + | |
− | to get into the correct context!)
| + | |
− | | + | |
− | <code><pre>
| + | |
− | cli[Test:SCNode0:CPM]-> amsentityprint node SCNode0
| + | |
− | | + | |
− | Name | SCNode0
| + | |
− | Configuration ------------------------------- | ---------------------------
| + | |
− | Start time | Wed Nov 12 17:37:02 2008
| + | |
− | Admin State | Unlocked
| + | |
− | Computed Admin State | Unlocked
| + | |
− | Id | 0
| + | |
− | Class Type | Class B
| + | |
− | SubClass Type |
| + | |
− | Is Swappable | True
| + | |
− | Is Restartable | True
| + | |
− | Is SAFplus Platform Aware | True
| + | |
− | Auto Repair on Join | True
| + | |
− | SU Failover Probation Time | 10000 ms
| + | |
− | SU Failover Count | 10
| + | |
− | Node Dependents | Count[0]
| + | |
− | | <None>
| + | |
− | Node Dependencies | Count[0]
| + | |
− | | <None>
| + | |
− | SUs in Node | Count[6]
| + | |
− | | SC0_twoNAdmin_SU0
| + | |
− | | SC0_amsTestGenTC_SU0
| + | |
− | | SC0_noRedundancy_SU0
| + | |
− | | SC0_twoNRank_SU0
| + | |
− | | SC0_twoNRank_Spare_SU1
| + | |
− | | SC0_twoNRank_Spare_SU0
| + | |
− | Status -------------------------------------- | ---------------------------
| + | |
− | Presence State | Instantiated
| + | |
− | Oper State | Enabled
| + | |
− | Is Instantiable | True
| + | |
− | Is Cluster Member | True
| + | |
− | Was Cluster Member Before | False
| + | |
− | Last Recovery | None
| + | |
− | Num Instantiated SUs | 0
| + | |
− | Num Assigned SUs | 0
| + | |
− | SU Failover Probation Timer | Inactive
| + | |
− | SU Failover Count | 0
| + | |
− | --------------------------------------------- | ---------------------------
| + | |
− | Timers Running | 0
| + | |
− | Debug Flags | 0x1
| + | |
− | </pre></code>
| + | |
− | | + | |
− | ===Miscellaneous===
| + | |
− | | + | |
− | * Please note that by default, SAFplus Platform assumes that the shared memory location (tmpfs) is mounted on standard location /dev/shm and cleans up whatever shared memory segments created by the SAFplus Platform during previous run. If you have mounted your tmpfs some where else, you have two options:
| + | |
− | ** Mount the tmpfs to /dev/shm so that SAFplus Platform will take care of cleaning up shared memory segments. ('''Recommended''')
| + | |
− | ** During packaging of the model add a cleanup script in model-name/src/extras/asp.d/ (creating this location if it is not there) which will be called with 'start' argument before SAFplus Platform starts. This cleanup script can then remove the shared memory segments from whichever location the tmpfs is currently mounted on.
| + | |
− | * Please note that starting SAFplus Platform "by hand" is not supported, i.e. starting asp_amf executable by command line is not supported. The starting of SAFplus Platform depends for some setup/cleanup actions on the SAFplus Platform startup script and hence SAFplus Platform should always be started using the etc/init.d/asp script.
| + | |