Doc:Sdk 4.1/awdguide/upgradedirector

Revision as of 16:49, 30 January 2013 by Stone (Talk | contribs)



Contents

Upgrade Director

The Upgrade Director is a separate entity that controls the upgrade of the SAFplus Platform middleware and optionally the OS. To upgrade applications, please see the "Software Management" subsection.

The core of the Upgrade Director consists of a python program that is run on both system controllers that upgrades a cluster using an in-service (rolling) algorithm. This is difficult because the node that the upgrade director's themselves are running on may need to be rebooted. However, since the upgrade director is run redundantly (on both system controllers) there is always one live node to direct operations.

Care was taken during the design of the Upgrade Director to ensure that it continues to operate in adverse conditions and does not leave the cluster in an untenable state. In particular:

  • All required software is deployed on-cluster on both system controllers before the upgrade starts this ensures that a network outage will not affect the upgrade.
  • Once started, the upgrade is driven by software running on the cluster rather then at the management station. This ensures that an outage between the NOC and the cluster will not affect the upgrade.
  • The Upgrade Director executable is delivered within the upgrade package, ensuring that it can be upgraded itself before executing an upgrade.
  • The Upgrade Director uses the SAFplus Platform middleware as little as possible. Essentially, it only uses it to query the state of the cluster, not to store its own state. This reduces the possibility that a bug in the middleware will cause the upgrade director to fail.

Extensibility

The Upgrade Director is made to be operating system distribution-neutral. To upgrade operating system packages or the operating system itself, you must overload certain virtual functions that actually install the packages on disk. This allows you to control how the packages are organized on disk: "rpm", "dpkg", an A/B disk partitioning system, or something custom to your system.

But the upgrade director simplifies the process dramatically by scheduling when your operations occur relative to the ongoing upgrade in the rest of the cluster.

Operational Walkthrough

This operational walkthrough is written based on the OpenClovis supplied bundle that uses tar files. If you add another bundle file format, some steps may be different.

Development

  1. The engineer first creates an SAFplus Platform model with the IDE, compiles it, makes images, tests it, etc as is normally done.
    1. The SAFplus Platform image should be made with the "INSTANTIATE_IMAGES=NO" flag in target.conf so a single SAFplus Platform tarball is created for all nodes.
  2. The engineer implements upgrade manager customization including optional callouts to upgrade the operating system and application data.
  3. The engineer must create a bundle definition xml file, as defined below.
  4. The "createUpgradeBundle.py" script is run to generate an "asp.bundle" file.

This file is copied to a system controller node on the target chassis

Upgrade

Prerequisites: The SAFplus Platform middleware bundle should be copied to one system controller node.

  1. The user executes the bundle with a single argument that is the path to the running asp
    1. for example: chmod a+x asp.bundle; ./asp.bundle /root/asp
  2. The upgrade does its work (see "UpgradeDirector algorithm walkthrough" below).
  3. Now the user has the option to either commit the upgrade, or revert it.
    1. /root/asp.newver/bin /tmp/udtmp1/upgradeDirector/upgradeDirector/upgradeDirector.py --commit
    2. /root/asp.newver/bin /tmp/udtmp1/upgradeDirector/upgradeDirector/upgradeDirector.py --revert

UpgradeDirector "Upgrade" Algorithm Walkthrough

Prerequisites: The SAFplus Platform middleware bundle is executed on one system controller node.

  1. The bundle contains a shell script executes:
    1. It detars itself into /tmp/udtmp1
    2. It starts upgradeDirector.py running under "asp_run"
  2. upgradeDirector.py first verifies that the cluster is healthy
  3. Then it installs itself onto the second system controller, and starts up.
  4. It establishes communications with the second upgrade director and synchronizes state.
  5. It chooses a "master".
  6. The master starts upgrading the secondary system controller:
    1. Installs the new SAFplus Platform
    2. Stops the secondary upgrade director
    3. Stops SAFplus Platform
    4. Copies and upgrades the runtime data from the old SAFplus Platform location to the new.
    5. Calls customer implemented callbacks to upgrade the OS, and reboot the system (if required for OS kernal upgrade -- otherwise a reboot is not required) at this time.
    6. Restarts SAFplus Platform
    7. Verify that SAFplus Platform came up successfully.
    8. Restarts the secondary upgrade director.
    9. Resynchronizes with the secondary
    10. Gives up "master" status
  7. The new master now upgrades the other controller node in the same manner described above
  8. At this point both controllers are upgraded, but no payload nodes are upgraded. So software written for the payload nodes can rely on the fact that the controllers are running the latest software, but controllers cannot assume that for payload nodes or for other controller nodes.
  9. The master now upgrades each payload node one at a time following the following algorithm
    1. Copy the bundle to the node
    2. Installs the new SAFplus Platform
    3. Stops SAFplus Platform
    4. Copies and upgrades the runtime data from the old SAFplus Platform location to the new.
    5. Calls customer implemented callbacks to upgrade the OS, and reboot the system at this time.
    6. Restarts SAFplus Platform
    7. Verify that SAFplus Platform came up successfully.
  1. Both upgrade director applications exit.

At this point all nodes are upgraded, but the old SAFplus Platform still remains on disk. The user may now operate in this mode for as long as desired (but may not run another upgrade). When there is confidence in the successful operation of the system, the user may commit or revert the upgrade.

Committing the upgrade is simply a matter of deleting the old SAFplus Platform files and deleting the upgrade bundle out of /tmp. There is no effect on the running SAFplus Platform.

Reverting an upgrade is also made as simple as possible, since it occurs after user confidence in the system is lost. The revert operation simply stops the currently running SAFplus Platform on all payload nodes and starts the old version. Then it does the same for the controller nodes. In fact in the worst case of upgrade software failure, the operator can do this revert manually on each node (essentially just "<newver>/etc/init.d/asp stop; <oldver>/etc/init.d/asp start". This makes the revert operation resilient even to failures in the upgrade software itself!

The reversion order ensures that all payload nodes are downgraded before controller nodes.

Delivery

The Upgrade Director requires an "upgrade bundle" file to run. This file contains the Upgrade Director itself and all files necessary to do an upgrade. New upgrade bundle file formats can be used by deriving new classes from the "UpgradeBundle" and "UpgradeRemote" classes.

The Upgrade Director provides a default, self-executing, linux-distribution-neutral bundle file format based on gzipped tar files. A script is also provided that will create this bundle given a compiled SAFplus Platform tree and an xml configuration file.

XML File Format

The following is an example XML file.

<bundle_config ver="1.0.0.0">

    <asp>
        <image version="4.1.0" arch="i686" os="ubuntu" osVersion="9.04" files='i686-linux-2.6.22.tgz'/>
    </asp>

    <app>
        <virtualIp version="1.4.0.1" arch="i686" os="linux" files='vipapp1.4.0.1.tgz' />
    </app>

    <upgradeDirector>
        <default fromAspVersion="4.0" aspVersion="4.1" files='/code/svn/clustermgt/clustermgt/root/ocms/src/app/asppybinding /code/svn/clustermgt/clustermgt/root/ocms/src/app/upgradeDirector /code/svn/clustermgt/clustermgt/root/ocms/target/i686/linux-2.6.22/lib/' />
    </upgradeDirector>

    <os>
        <Ubuntu fromVersion="8.04" version="9.10" files='fakeubuntu9.10.tgz' />
    </os>

</bundle_config>

This XML file is included in the upgrade bundle for use when upgrading the cluster. Any unknown tag or attribute is simply ignored, so this file format may be extended to supply additional functionality. If the entity is applicable to all possibilities (such as what operating system the code runs on, or what version must be running on the system before the upgrade) then the specifying attribute may be removed.

In this release, only a single entry in each major section is allowed -- for example, the software does not determine which SAFplus Platform image or OS image should be used in the upgrade. There should be only one image specified.


Description of tags

  • <bundle_config> This tag defines the start and end of the config file. It contains as single attribute which is the version of this XML config file format. Currently "ver" which must be "1.0.0.0".
  • <asp> This tag contains the SAFplus Platform images available within this bundle. Currently only a single SAFplus Platform image is supported per bundle.
  • <image> This tag defines the SAFplus Platform image. Its attributes are:
    • version The SAFplus Platform version
    • arch The architecture this SAFplus Platform is compiled for
    • os The OS this image was compiled on
    • osVersion The OS version this image was compiled on
    • files The SAFplus Platform image file. This is the output of "make images" executed in an SAFplus Platform model, with your target.conf set to generate a single binary image, and separate .conf files.
  • <app> Simultaneous application upgrade is not supported in this release
  • <upgradeDirector> This section specifies the upgrade director software available in this bundle
    • fromAspVersion Specifies what version of SAFplus Platform the upgrade director can upgrade from.
    • aspVersion Specifies what version of SAFplus Platform the upgrade director can upgrade to.
    • files A space-separated list of files and directories that must contain the upgrade director application and all required libraries. These file will be copied into the bundle.
  • <os> What operating system upgrades are contained in this bundle
    • fromVersion The node must be running this version for this file to be applicable
    • version The version of the os in this file
    • files A space-separated list of files and directories that contain the operating system. These files/directories will be copied into the bundle.


Bundle Creation

A bundle is created by running the "createUpgradeBundle.py" script located in the upgradeDirector directory. It is run like this:

export UPGRADE_IMAGE_PATH=<colon separated list of directories>
python createUpgradeBundle.py <xmlConfigFile>

The UPGRADE_IMAGE_PATH directory allows you to specify the set of search directories when looking for files referenced in the xml config file. The purpose of the UPGRADE_IMAGE_PATH variable twofold: First, to allow the script to handle changes in the directory hierarchy without requiring the config file to be changed. This is useful for multiple-developer source-controlled environments since different "checkouts" may use different root trees. Second, to allow the config file to be copied into the bundle itself and still have applicable file names.

Bundle Execution

The created bundle is a gzipped tar file which is wrapped in a simple self-extracting script (selfextractor.sh). This script can either extract all files in the archive (i.e. detar them) or extract and start the runtime director. This latter is the rolling upgrade operation.

For example:

To extract all files in the archive:

<bundle> extract
specifically:
./asp4.1.0.bdl extract


To start the upgrade director:

<bundle> <current SAFplus Platform dir>
specifically:
./asp4.1.0.bdl /root/asp40

Note that this command should only be run on a single controller node in your cluster. It will automatically identify the other controller node and run the upgrade director redundantly on it.

Please see the "selfextractor.sh" script for more details and for customization.

Upgrade Director Command Line

Once the bundle has been extracted it automatically runs the upgrade director. But you can also run it explicitly like this:

  • --resume

To continue an upgrade if it stopped for any reason, use the --resume option <aspDir>/bin/asp_run python /tmp/udtmp/upgradeDirector/upgradeDirector.py --resume

  • --commit

To commit an upgrade, run the upgrade director under asp_run with the --commit option: <aspDir>/bin/asp_run python /tmp/udtmp/upgradeDirector/upgradeDirector.py --commit

  • --revert

To revert an upgrade, run the upgrade director under asp_run with the --revert option: <aspDir>/bin/asp_run python /tmp/udtmp/upgradeDirector/upgradeDirector.py --revert

  • --restart

To restart the upgrade (losing current state), run with the --restart option. <aspDir>/bin/asp_run python /tmp/udtmp/upgradeDirector/upgradeDirector.py --restart <aspdir>

  • --nocopy

To skip the step that pushes the upgrade director to the other controller and starts it, use "--nocopy" along with one of the other flags above.

  • --display=<ip>

If you would like it to open progress and debugging windows, use "--display=<ip address>". Note your X server on "display" machine must be configured to accept incoming x client requests (xhost +) on the default "X" port.

Software Overview

Please see the API documentation for details on the software.