fhg
Volume 10, Number 24 -- June 30, 2010

Admin Alert: High Availability Eliminates Disaster Recovery. . . Right?

Published: June 30, 2010

by Joe Hertvik

Imagine you're a flea with one limitation. Whenever you want to go anywhere, you can only jump half the distance to your objective and no further. The second hop halves the remaining distance. The third halves that again, and so on. How long will it take to reach your goal? Administrators dealing with High Availability (HA) and Disaster Recovery (DR) are a lot like that flea.

I thought of the jumping flea after we started revisiting our DR plan. I was feeling pretty smug about things. After all, I have an off-site HA setup for my production Power i machine, I've performed at least 20 HA switchover exercises since 2007, my system has been certified to work correctly by both the Applications staff and the user community, and I've even run on my HA box for a week when we had trouble completing a production system upgrade. My disaster recovery work is done, I thought. I am ready to go if the big one hits. Or was I?

The truth is that for all our HA work, we were still only in a good position, not a great position. There was still work to be done. Here are some of the things I discovered when I started exploring the ground where high availability meets disaster recovery.

Let's Define

Before I start the discussion, it's helpful to have a good working definition of HA and DR.

A High Availability system uses a capacity backup (CBU) system to provide near-continuous availability for an iSeries, System i, or Power i box. A CBU is a system in waiting, continuously receiving replicated data and other system objects from a production partition. In an emergency, the CBU can be quickly activated to stand in for its partner system, minimizing the amount of downtime the company experiences. See the Related Stories section for other articles describing i5/OS HA and CBUs.

A Disaster Recovery plan can take advantage of your HA system, but it contains more than just HA. (Very) loosely speaking, a DR plan is a preset group of instructions for what your IT department does to restore computer capabilities when a disaster takes out your network and i/OS capabilities. It answers the (not so) simple question of what you will do when everything falls apart due to fire, earthquake, tornado, electrical outage, terrorist attack, etc.

Where HA Ends

As I said in the intro, I get a little cocky when HA plans come up. I've done and tested everything I can possibly think of to ensure my HA solution will be up and ready if the unthinkable occurs. However, when I started rewriting our DR plan, I ran into the following additional items that I never even thought about with HA. If you're looking at HA in relationship with your DR plan, you might also want to think about these issues.

Where do your HA and DR plans live?--If your HA and DR plans live on your computer network, what happens if the network is destroyed along with your i/OS machine? Have you printed out a paper copy of each plan, and do these copies exist off-site where your DR team can retrieve them in an emergency? It's wise to keep multiple copies of your HA and DR documents both on-site and off-site. You may even want to require that your key personnel keep copies of each document in their cars.

Passwords and contact numbers for retrieving recovery media from an off-site vendor-- If the building burns down, do you know who to call to retrieve backup tapes, as needed? This is another item that is worth keeping off-site.

Companion servers--For all our HA technology, we've found that a lot of our purchase orders, invoices, and other documents were still being faxed to our customers on a companion fax server. In addition to having a plan for restarting i/OS processing, you also need to plan for companion servers that may or may not be needed during a disaster. Ask yourself, can I do without this capability? If not, what's your backup plan for replacing it? The same goes for other companion servers or capabilities that connect you to valuable customers or business partners. (Anybody still have dial-up modems?) How will you compensate for losing those connections and functionality in a disaster, where recovery may take several days, weeks, or even months?

Development i/OS partitions--In our shop, i/OS software changes are managed and promoted to production through a secondary partition using Aldon Lifecycle Manager. If the entire computer room is taken down for a month or two, do you know how your developers will keep working? This is another issue beyond HA that should be looked at if a real disaster occurs.

Going home--In our situation, all of our HA scenarios have been run according to the following scenarios:

  • Switch production processing over to the CBU
  • Run production on the CBU for a specified period of time
  • Bring up the production machine as the target machine and synchronize processed data back from the CBU (which is functioning as the source machine) to the production machine (which is functioning as the target CBU machine)
  • Switch processing back to the production Power i machine

This is fine for limited types of disasters, such as an electrical outage or any disaster that takes out user connectivity to the computer room without destroying your i/OS machine. But what happens when a tornado, hurricane, explosion, or any other type of disaster destroys your computer room along with your i/OS machine(s)? Eventually, you will probably replace your wrecked machines with new Power i systems. When going home to a new machine that has never been configured for HA, you would then also need instructions for:

Restoring and resetting your new production system as the CBU--This is tricky because any full system backup tapes you have were most likely taken when your old system was configured for production. So after restoring the production box from tape to a new system, you would also need to flip your new production box from functioning as a production system to functioning as a CBU system before putting it back on the network.

Synchronizing data from the CBU with your production system--This is also trickier than it sounds. If you've been running production on the CBU for several weeks or months, it may be difficult to resynchronize your data with the CBU because a) there may not be a sync point to start data resynchronization from; and b) it may take too long to process every transaction by just using file replication. To fill this hole in your HA and DR plan, you'll have to contact your vendor to determine what their best practices are for returning production to a brand new machine after an extended period without replication. Given this, it's wise to have a plan for how you resynchronize production data with your CBU after an extended outage.

To complete my analogy, running HA with DR is similar to our jumping flea. You can continue to refine your procedures, but you may never get 100 percent to your goal. Something will always come up when faced with a real disaster. However, if you continue studying the problem and keep identifying and solving the issues involved, you can get very, very close.


RELATED STORIES

Preparing Your CBU for a Real Emergency

The Road to Live CBU Fail-Over, Part 1

The Road to Live CBU Fail-Over, Part 2

Beyond Replication in an i5/OS High-Availability Environment

Common Mistakes When Failing Over to a CBU

How System i Boxes Impersonate Each Other, Part 2

How System i Boxes Impersonate Each Other, Part 1

Five Benefits of a High Availability System

The System i High Availability Roadmap



                     Post this story to del.icio.us
               Post this story to Digg
    Post this story to Slashdot


Sponsored By
WORKSRIGHT SOFTWARE

Do you need area code information?
Do you need ZIP Code information?
Do you need ZIP+4 information?
Do you need city name information?
Do you need county information?
Do you need a nearest dealer locator system?

We can HELP! We have affordable AS/400 software and data to do all of the above. Whether you need a simple city name retrieval system or a sophisticated CASS postal coding system, we have it for you!

The ZIP/CITY system is based on 5-digit ZIP Codes. You can retrieve city names, state names, county names, area codes, time zones, latitude, longitude, and more just by knowing the ZIP Code. We supply information on all the latest area code changes. A nearest dealer locator function is also included. ZIP/CITY includes software, data, monthly updates, and unlimited support. The cost is $495 per year.

PER/ZIP4 is a sophisticated CASS certified postal coding system for assigning ZIP Codes, ZIP+4, carrier route, and delivery point codes. PER/ZIP4 also provides county names and FIPS codes. PER/ZIP4 can be used interactively, in batch, and with callable programs. PER/ZIP4 includes software, data, monthly updates, and unlimited support. The cost is $3,900 for the first year, and $1,950 for renewal.

Just call us and we'll arrange for 30 days FREE use of either
ZIP/CITY or PER/ZIP4.

WorksRight Software, Inc.
Phone: 601-856-8337
Fax: 601-856-9432
E-mail: software@worksright.com
Web site: www.worksright.com


Senior Technical Editor: Ted Holt
Technical Editor: Joe Hertvik
Contributing Technical Editors: Erwin Earley, Brian Kelly, Michael Sansoterra
Publisher and Advertising Director: Jenny Thomas
Advertising Sales Representative: Kim Reed
Contact the Editors: To contact anyone on the IT Jungle Team
Go to our contacts page and send us a message.

Sponsored Links

ManageEngine:  Who says iSeries systems monitoring software has to be expensive?
IBS:  Free e-book: The Six Margin Killers in Wholesale Distribution
COMMON:  Join us at the Fall 2010 Conference & Expo, Oct. 4 - 6, in San Antonio, Texas


 

IT Jungle Store Top Book Picks

Easy Steps to Internet Programming for AS/400, iSeries, and System i: List Price, $49.95
The iSeries Express Web Implementer's Guide: List Price, $49.95
The System i RPG & RPG IV Tutorial and Lab Exercises: List Price, $59.95
The System i Pocket RPG & RPG IV Guide: List Price, $69.95
The iSeries Pocket Database Guide: List Price, $59.00
The iSeries Pocket SQL Guide: List Price, $59.00
The iSeries Pocket Query Guide: List Price, $49.00
The iSeries Pocket WebFacing Primer: List Price, $39.00
Migrating to WebSphere Express for iSeries: List Price, $49.00
Getting Started With WebSphere Development Studio Client for iSeries: List Price, $89.00
Getting Started with WebSphere Express for iSeries: List Price, $49.00
Can the AS/400 Survive IBM?: List Price, $49.00
Chip Wars: List Price, $29.95


 
The Four Hundred
Top Concerns Survey Is Ready for IBM Eyes

Infor Commits Itself to Microsoft and Windows Technologies

Developing for IBM i: Why Does It Need To Be So Hard?

Mad Dog 21/21: Microclients: Thin Enough? Rich Enough?

IBM Tweaks More Rebate Deals to Cut Power7 Prices

Four Hundred Stuff
IdF, Logic Trends Fill a Gap in Microsoft Identity Software

ADC Austin Updates AJAX Generator for CA Plex

ASD Unveils BI Solution for Insurance Companies

nuBridges' Token Manager Gets Enterprise Upgrade

SafeData Gets Bought by Data Storage

Four Hundred Monitor
Four Hundred Monitor's
Full iSeries Events Calendar

System i PTF Guide
May 29, 2010: Volume 12, Number 22

May 22, 2010: Volume 12, Number 21

May 15, 2010: Volume 12, Number 20

May 8, 2010: Volume 12, Number 19

May 1, 2010: Volume 12, Number 18

April 24, 2010: Volume 12, Number 17

TPM at The Register
Oracle refreshes Sun Xeon server lineup

CPU, GPU makers gussie up their wares for Hot Chips

Azul goes virtual with Java appliance

Nimbula puffs up 'cloud operating system'

Oracle uses Sun as springboard in Q4

Neon to take mainframe complaints to Europe

Red Hat turns the crank of KVM enterprise virt

IBM sued over failed virtual PC server projects

AMD muscles Nvidia with fanless GPU coprocessors

AMD's Opteron 4100s march into x64 price war

Red Hat revenues swell to $209.1m

Tilera to stuff 200 cores onto single chip

THIS ISSUE SPONSORED BY:

SEQUEL Software
ProData Computer Services
WorksRight Software


Printer Friendly Version


TABLE OF CONTENTS
Generic Database Access with .NET 2.0

Spaces, Braces, and Semicolons

Admin Alert: High Availability Eliminates Disaster Recovery. . . Right?

Four Hundred Guru

BACK ISSUES




 
Subscription Information:
You can unsubscribe, change your email address, or sign up for any of IT Jungle's free e-newsletters through our Web site at http://www.itjungle.com/sub/subscribe.html.

Copyright © 1996-2010 Guild Companies, Inc. All Rights Reserved.
Guild Companies, Inc., 50 Park Terrace East, Suite 8F, New York, NY 10034

Privacy Statement