Description of the Chassis Management.
More...
Description of the Chassis Management.
The OpenClovis Chassis Manager (CM) is part of the OpenClovis ASP middleware and is the gatekeeper between ASP and the underlying hardware platform such as an AdvancedTCA chassis or other intelligent, manageable hardware platform. CM interfaces the underlying platform via a Hardware Platform Interface (HPI), which is a standardized C API specified by the Service Availability Forum (SA Forum).
The OpenClovis CM complies to the B.01.01 version of the HPI specification. CM is integrated with and verified against two well-known implementations of the above standard, namely:
- The OpenHPI implementation. OpenHPI is an open-source project and supports various low-level hardware interfaces including the IPMI and direct-IPMI used in most AdvancedTCA chassis. We recommend the use of OpenHPI for all AdvancedTCA chassis that are integrated with Pigeon Point Systems' (PPS) shelf manager cards (ShMM500 or ShMM300). A large subset of the commercially available AdvancedTCA chassis use PPS shelf manager cards. The current version of OpenHPI used and tested by OpenClovis, Inc. is version 2.8.1.
- The proprietary implementation of HPI found in Radysis Promentum SYS-60x0 chassis, made and distributed by Radisys Corporation.
Note that on unmanaged clusters that do not support HPI-based system management, such as a cluster of ordinary desktop computers, the OpenClovis CM cannot communicate with the hardware platform and therefore is not expected to be started. This is arranged by a configuration option used during the "configure" step (see the SDK User's Guide).
CM runs only on the system controller node(s) in ASP, and can run either in a non-redundant mode (when there is only one system controller node in the system), or in dual (1+1) redundant mode when there is on active and one standby CM running.
CM depends on the Shelf Manager of the underlying hardware to monitor the platform and to issue control operations to the platform. Specifically, it provides the following services to ASP:
- It receives Hot Swap events from HPI and notifying AMF about the arrival, imminent departure, or abrupt departure of entities that AMF cares about. Note that this involves mapping between two addressing domains, one being the FRU resource ids used by HPI and the other is the node ids or physical location information (e.g., slot number) used by ASP to identify ASP nodes. This mapping is not necessarily a 1:1 mapping, as the removal of a single FRU may affect multiple ASP/system nodes.
- It receives sensory and other events from HPI and forwards them to AMF or Fault Manager components of ASP, depending on the severity of the event. If the event is service threatening/impacting event, such as a major or critical alarm, it is forwarded to AMF to trigger the service removal and failover from the impacted device(s). For minor alarms the event is passed to Fault Manager to allow optional, customized handling of the event. Note that this step also involves mapping from an FRU address to the node id(s) and physical locator of affected node(s).
- Provides an interface for ASP to reset a node and to request other hardware level operations. The node address needs to be mapped back to FRU resource id by CM.
- It also provides a simple debug CLI command to trigger state changes to any boards.