fhg
Volume 9, Number 27 -- September 2, 2009

Admin Alert: The Road to Live CBU Fail Over, Part 1

Published: September 2, 2009

by Joe Hertvik

One of the companies I work with performed its first live Capacity BackUp (CBU) switch test last month, where they switched over and used their CBU system as their live production system for several days. In the next few issues, I'll use their experience in prepping for a live switch as a possible guide for others trying to ensure that their CBU can substitute for a live system.

CBU 101: Understanding the CBU

A CBU is an i, System i, or iSeries machine that is an exact duplicate of a live production system. CBUs generally contain the same amount of memory, disk, and CPU activations as their source counterparts. With the help of replication software from companies such as Vision Solutions, IBM's DataMirror division, or Bug Busters Software Engineering, production information (including databases, programs, and system objects) is automatically replicated from the source machine to the target CBU. In the event of an emergency where the production system is not available, you can keep your business moving by quickly switching processing over to the target box. See the "Related Stories" section for more articles describing i5/OS high availability and CBUs.

While the CBU is an i5/OS machine in waiting, many companies consider it a big step to actually switch over and run the target machine as a production system substitute for any appreciable amount of time. Our example company took the following steps to reach this goal.

Certifying the CBU Switch-Over Process

A live CBU switch-over doesn't happen overnight. It takes a great deal of planning and testing to gain confidence that if you switch live processing to the CBU, you are not putting business processing at risk. To allay this fear, the staff developed the idea of certifying the CBU for use as a production machine.

CBU certification evolved because switching live production processing to a duplicate machine was a scary thought to both management and IT staff. Imagine what might happen if you were processing orders and a key data library was out of sync, such that thousands of orders were filled, delivered, and invoiced to customers with incorrect pricing? Or what would happen if you switched over and your key application wouldn't work, holding up your production and shipping line for days? Company executives and the IT staff were looking for a comfort level that the business would continue to function efficiently if they lost the production machine.

The certification process encompassed a series of switch tests and accompanying documentation that tested critical processing features that the company relied on every day. To meet this end, CBU deployment was subdivided into the following certification steps.

  • Initial CBU configuration and infrastructure certification--Determine that the CBU itself is set up correctly to impersonate the production machine. This step tests the basic mechanics of switching over to the CBU.
  • Application certification--Determine whether all the critical custom-written and homegrown applications can function on the CBU. This includes obtaining software licensing, license keys, and testing the applications to see whether they work as intended on the machine.
  • User certification--Determine whether the user community can perform its essential business processing on the CBU.
  • Process certification--Determine whether critical automated processing can run on the machine.
  • Audit certification--Confirm with an outside authority that the company's CBU configuration was correct and that no key pieces were missing.
  • Extended switch over certification--Determine whether the company can actually switch processing over to and run their business on the CBU.

Each completed step led to the next step and cumulatively, all the steps would give the company confidence and documentation that the CBU would perform correctly in a crisis. The group felt that by certifying CBU fitness for duty this way, they could reap the following benefits.

  • Certification by step would slowly build confidence that the CBU would work as intended. IT, management, and users could watch the progress as the CBU was readied for usage.
  • Segmentation would create ownership and comfort that each group's particular needs were being addressed. The system administrators would ensure the infrastructure worked correctly. The applications people would tend to application configuration. The users would directly test that their needs were being met.
  • Documentation after each step would create a reporting system for CBU progress. It would produce accountability and motivation for each group to ensure that they tested thoroughly before they gave the go-ahead to move on.
  • Certification would provide flexibility to reconfigure and retest. The company could identify problems and ensure that each step was perfected as much as possible before moving on the next step. It also provided structure for how to deploy the CBU.

It's also worth noting that this framework didn't appear overnight. It was the result of two or three earlier switch tests where the company worked with the CBU and determined that this was the best course of action to follow. In particular, most of the initial CBU configuration and infrastructure configuration and the entire application configuration were completed before the company determined that the other steps were needed. Once all the steps were identified, the rest of the certifications proceeded as presented here.

Initial CBU Configuration and Infrastructure Certification

After the CBU was purchased, the company hired an outside consultant to perform the initial configuration. They used Vision Solutions' MIMIX HA software as its high availability solution. The consultant worked with the company to install the software and determine what information (data, programs, and system objects) should be replicated to the CBU, set up the replication configuration, and started the process of replicating information from one machine to another. He helped the company create their initial "run book", which is the set of instructions the company follows to switch processing from the production machine to the target machine and back again. The consultant also helped them set up HA audits that would alert staff by email when libraries or objects were out of sync between the machines and when libraries were added to the production box that were not available on the CBU.

When dealing with high availability scenarios, one of the hardest situations is performing the first switch-over test. This test does nothing more than run the procedures for switching processing from the production machine to the CBU and back again. When switching over in this test, the CBU performed little information processing. Rather, this exercise tested the mechanics of switching over and switching back again to see if it was possible to perform the switch using their existing run book.

The first test also helped the company understand if their replication scheme was valid. When processing was temporarily switched over to the CBU, the company shut off all normal information processing functions (interactive jobs, Website updating, remote updates, batch jobs, etc.). The testers had to remember that a CBU switch-over is a fundamentally different animal than a traditional disaster recovery test. In a switch-over, the CBU is functioning as the production machine and any processing that occurs will be replicated back to the source production system at the end of the test (i.e., all CBU testing uses live data).

To check that changed CBU data would be replicated correctly back to the production machine at the end of the test, the testers only changed data on a few insignificant files on the CBU. When the testers switched processing back to the CBU, they checked the test files on the source box to ensure that any changes that were made on the target machine during a switch test were replicated back to the production machine.

The goal of the first test was to create and test the basic structure of a switch-over, including basic data update and replication. The testers wanted to be comfortable enough with the exercise that they could perform this switch again and again as needed for later tests. The initial test answered a few simple questions:

  • Can a switch-over be performed?
  • Is data replicated correctly from the production system to the target system and back again?
  • What steps should be taken to make succeeding switch over tests more successful?

The first test was the building block on which all of the other CBU testing would rest. Until the CBU infrastructure was correct, the company couldn't move on to the more complicated CBU functions.

For the example company, it was necessary to run two tests to make sure the basic infrastructure of the CBU was correct. That is, the CBU needed to totally impersonate the production machine so that the outside world (including network equipment, DNS servers, communications partners, printers, etc.) couldn't tell the difference between the two machines. After two tests, the testers felt that they could move on to the next certification step.

Between Tests: Tweaking the Run Book

During each switch test, the testers took detailed notes in the run book as to what went right with each step, what went wrong during the test, and what they did to fix it. After each test, those notes formed the basis of the next run book. The previous run book was archived for reference and a new run book was created.

The new run book contains all the fixes, shortcuts, and expansions needed to make the next test more successful. It became mandatory to update the run book during the first few days after the test completed, while all the events were still fresh in the testers' minds. If the run book sat for a few weeks before being updated, the testers could misunderstand some of their own notes and accidentally omit important changes that were needed for the next test.

More To Come

As I mentioned, this company identified CBU configuration as a series of steps. Next week, I'll look at what was required for the next certification steps and how they led up to the ultimate goal of a live switch-over.


RELATED STORIES

Beyond Replication in an i5/OS High-Availability Environment

Common Mistakes When Failing Over to a CBU

Five Benefits of a High Availability System

How System i Boxes Impersonate Each Other, Part 1

How System i Boxes Impersonate Each Other, Part 2

The System i High Availability Roadmap



                     Post this story to del.icio.us
               Post this story to Digg
    Post this story to Slashdot


Sponsored By
MANTA TECHNOLOGIES

BIG SAVINGS on IBM i TRAINING COURSES

Whether you're looking to improve your skills or learn something new,
Manta's training library is filled with courses for all levels of i users.
Courses are self-contained, interactive sessions that can be
completed over the Internet or on CD.

Browse our catalog and take advantage of SALE pricing!

Order by October 15 and SAVE 25%

To order, visit www.mantatech.com

Manta is your complete source for IBM i training.


Senior Technical Editor: Ted Holt
Technical Editor: Joe Hertvik
Contributing Technical Editors: Edwin Earley, Brian Kelly, Michael Sansoterra
Publisher and Advertising Director: Jenny Thomas
Advertising Sales Representative: Kim Reed
Contact the Editors: To contact anyone on the IT Jungle Team
Go to our contacts page and send us a message.

Sponsored Links

Maximum Availability:  Upgrade to *noMAX - save 20% on current fees
ARCAD Software:  Start 5250 emulation sessions from your RDi workspace - download freeware!
COMMON:  Celebrate our 50th anniversary at annual conference, May 2 - 6, 2010, in Orlando


 

IT Jungle Store Top Book Picks

Easy Steps to Internet Programming for AS/400, iSeries, and System i: List Price, $49.95
The iSeries Express Web Implementer's Guide: List Price, $49.95
The System i RPG & RPG IV Tutorial and Lab Exercises: List Price, $59.95
The System i Pocket RPG & RPG IV Guide: List Price, $69.95
The iSeries Pocket Database Guide: List Price, $59.00
The iSeries Pocket SQL Guide: List Price, $59.00
The iSeries Pocket Query Guide: List Price, $49.00
The iSeries Pocket WebFacing Primer: List Price, $39.00
Migrating to WebSphere Express for iSeries: List Price, $49.00
Getting Started With WebSphere Development Studio Client for iSeries: List Price, $89.00
Getting Started with WebSphere Express for iSeries: List Price, $49.00
Can the AS/400 Survive IBM?: List Price, $49.00
Chip Wars: List Price, $29.95


 
The Four Hundred
CIOs Say Power Systems Are the Most Reliable

A Closer Look at IBM's Q2 Server Sales

Has IBM Given Up on the i?

Mad Dog 21/21: Terms and Conditions

Jack Henry Lays Out $17 Million for Goldleaf After Good 4Q

Four Hundred Stuff
Managed File Transfer: A New Product Category That's Here to Stay

IBM to Formally Announce EGL Community Edition Today

Linoma Introduces MFT Software for External Exchanges

SEQUEL Updates i OS Time and Date Override Software

Cosyn Augments BPCS Accounting with AP Minder

Four Hundred Monitor
Four Hundred Monitor's
Full iSeries Events Calendar

System i PTF Guide
August 29, 2009: Volume 11, Number 35

August 22, 2009: Volume 11, Number 34

August 15, 2009: Volume 11, Number 33

August 8, 2009: Volume 11, Number 32

August 1, 2009: Volume 11, Number 31

July 25, 2009: Volume 11, Number 30

July 18, 2009: Volume 11, Number 29

TPM at The Register
VMware vSphere gets more gadgets

Semiconductor sales rise 5.3% in July

Sun sales plummet 30.6% in Q4

AMD plays it cool with low-volt Istanbuls

Xen packages build-your-own-cloud kit

OpSource floats VMware cloud

Intel boosts Q3 guidance

Novell profits even as sales slide

VMware goes into hyper-drive with vSphere 4.0

Cray nabs PathScale compilers from SiCortex

Tibco snaps up DataSynapse for $28m

Sun goes over Rainbow Falls

Amazon does virtual private clouds

Big chip for big boxes: IBM cracks open lid on Power7

THIS ISSUE SPONSORED BY:

ProData Computer Services
East Coast Computer
Manta Technologies


Printer Friendly Version


TABLE OF CONTENTS
Use the Dup Key in Subfiles

An Overview of User-Defined Types in DB2 for i

Admin Alert: The Road to Live CBU Fail Over, Part 1

Four Hundred Guru

BACK ISSUES




 
Subscription Information:
You can unsubscribe, change your email address, or sign up for any of IT Jungle's free e-newsletters through our Web site at http://www.itjungle.com/sub/subscribe.html.

Copyright © 1996-2009 Guild Companies, Inc. All Rights Reserved.
Guild Companies, Inc., 50 Park Terrace East, Suite 8F, New York, NY 10034

Privacy Statement