• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Admin Alert: High Availability Eliminates Disaster Recovery. . . Right?

    June 30, 2010 Joe Hertvik

    Imagine you’re a flea with one limitation. Whenever you want to go anywhere, you can only jump half the distance to your objective and no further. The second hop halves the remaining distance. The third halves that again, and so on. How long will it take to reach your goal? Administrators dealing with High Availability (HA) and Disaster Recovery (DR) are a lot like that flea.

    I thought of the jumping flea after we started revisiting our DR plan. I was feeling pretty smug about things. After all, I have an off-site HA setup for my production Power i machine, I’ve performed at least 20 HA switchover exercises since 2007, my system has been certified to work correctly by both the Applications staff and the user community, and I’ve even run on my HA box for a week when we had trouble completing a production system upgrade. My disaster recovery work is done, I thought. I am ready to go if the big one hits. Or was I?

    The truth is that for all our HA work, we were still only in a good position, not a great position. There was still work to be done. Here are some of the things I discovered when I started exploring the ground where high availability meets disaster recovery.

    Let’s Define

    Before I start the discussion, it’s helpful to have a good working definition of HA and DR.

    A High Availability system uses a capacity backup (CBU) system to provide near-continuous availability for an iSeries, System i, or Power i box. A CBU is a system in waiting, continuously receiving replicated data and other system objects from a production partition. In an emergency, the CBU can be quickly activated to stand in for its partner system, minimizing the amount of downtime the company experiences. See the Related Stories section for other articles describing i5/OS HA and CBUs.

    A Disaster Recovery plan can take advantage of your HA system, but it contains more than just HA. (Very) loosely speaking, a DR plan is a preset group of instructions for what your IT department does to restore computer capabilities when a disaster takes out your network and i/OS capabilities. It answers the (not so) simple question of what you will do when everything falls apart due to fire, earthquake, tornado, electrical outage, terrorist attack, etc.

    Where HA Ends

    As I said in the intro, I get a little cocky when HA plans come up. I’ve done and tested everything I can possibly think of to ensure my HA solution will be up and ready if the unthinkable occurs. However, when I started rewriting our DR plan, I ran into the following additional items that I never even thought about with HA. If you’re looking at HA in relationship with your DR plan, you might also want to think about these issues.

    • Where do your HA and DR plans live?–If your HA and DR plans live on your computer network, what happens if the network is destroyed along with your i/OS machine? Have you printed out a paper copy of each plan, and do these copies exist off-site where your DR team can retrieve them in an emergency? It’s wise to keep multiple copies of your HA and DR documents both on-site and off-site. You may even want to require that your key personnel keep copies of each document in their cars.

    • Passwords and contact numbers for retrieving recovery media from an off-site vendor— If the building burns down, do you know who to call to retrieve backup tapes, as needed? This is another item that is worth keeping off-site.

    • Companion servers–For all our HA technology, we’ve found that a lot of our purchase orders, invoices, and other documents were still being faxed to our customers on a companion fax server. In addition to having a plan for restarting i/OS processing, you also need to plan for companion servers that may or may not be needed during a disaster. Ask yourself, can I do without this capability? If not, what’s your backup plan for replacing it? The same goes for other companion servers or capabilities that connect you to valuable customers or business partners. (Anybody still have dial-up modems?) How will you compensate for losing those connections and functionality in a disaster, where recovery may take several days, weeks, or even months?

    • Development i/OS partitions–In our shop, i/OS software changes are managed and promoted to production through a secondary partition using Aldon Lifecycle Manager. If the entire computer room is taken down for a month or two, do you know how your developers will keep working? This is another issue beyond HA that should be looked at if a real disaster occurs.

    • Going home–In our situation, all of our HA scenarios have been run according to the following scenarios:

    • Switch production processing over to the CBU
    • Run production on the CBU for a specified period of time
    • Bring up the production machine as the target machine and synchronize processed data back from the CBU (which is functioning as the source machine) to the production machine (which is functioning as the target CBU machine)
    • Switch processing back to the production Power i machine

    This is fine for limited types of disasters, such as an electrical outage or any disaster that takes out user connectivity to the computer room without destroying your i/OS machine. But what happens when a tornado, hurricane, explosion, or any other type of disaster destroys your computer room along with your i/OS machine(s)? Eventually, you will probably replace your wrecked machines with new Power i systems. When going home to a new machine that has never been configured for HA, you would then also need instructions for:

    • Restoring and resetting your new production system as the CBU–This is tricky because any full system backup tapes you have were most likely taken when your old system was configured for production. So after restoring the production box from tape to a new system, you would also need to flip your new production box from functioning as a production system to functioning as a CBU system before putting it back on the network.

    • Synchronizing data from the CBU with your production system–This is also trickier than it sounds. If you’ve been running production on the CBU for several weeks or months, it may be difficult to resynchronize your data with the CBU because a) there may not be a sync point to start data resynchronization from; and b) it may take too long to process every transaction by just using file replication. To fill this hole in your HA and DR plan, you’ll have to contact your vendor to determine what their best practices are for returning production to a brand new machine after an extended period without replication. Given this, it’s wise to have a plan for how you resynchronize production data with your CBU after an extended outage.

    To complete my analogy, running HA with DR is similar to our jumping flea. You can continue to refine your procedures, but you may never get 100 percent to your goal. Something will always come up when faced with a real disaster. However, if you continue studying the problem and keep identifying and solving the issues involved, you can get very, very close.

    RELATED STORIES

    Preparing Your CBU for a Real Emergency

    The Road to Live CBU Fail-Over, Part 1

    The Road to Live CBU Fail-Over, Part 2

    Beyond Replication in an i5/OS High-Availability Environment

    Common Mistakes When Failing Over to a CBU

    How System i Boxes Impersonate Each Other, Part 2

    How System i Boxes Impersonate Each Other, Part 1

    Five Benefits of a High Availability System

    The System i High Availability Roadmap



                         Post this story to del.icio.us
                   Post this story to Digg
        Post this story to Slashdot

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags:

    Sponsored by
    Raz-Lee Security

    Protect Your IBM i and/or AIX Servers with a Free Virus Scan

    Cyber threats are a reality for every platform, including IBM i and AIX servers. No system is immune, and the best defense is prompt detection and removal of viruses to prevent costly damage. Regulatory standards across industries mandate antivirus protection – ensure your systems are compliant and secure.

    Get My Free Virus Scan

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Sponsored Links

    ManageEngine:  Who says iSeries systems monitoring software has to be expensive?
    IBS:  Free e-book: The Six Margin Killers in Wholesale Distribution
    COMMON:  Join us at the Fall 2010 Conference & Expo, Oct. 4 - 6, in San Antonio, Texas

    IT Jungle Store Top Book Picks

    Easy Steps to Internet Programming for AS/400, iSeries, and System i: List Price, $49.95
    The iSeries Express Web Implementer's Guide: List Price, $49.95
    The System i RPG & RPG IV Tutorial and Lab Exercises: List Price, $59.95
    The System i Pocket RPG & RPG IV Guide: List Price, $69.95
    The iSeries Pocket Database Guide: List Price, $59.00
    The iSeries Pocket SQL Guide: List Price, $59.00
    The iSeries Pocket Query Guide: List Price, $49.00
    The iSeries Pocket WebFacing Primer: List Price, $39.00
    Migrating to WebSphere Express for iSeries: List Price, $49.00
    Getting Started With WebSphere Development Studio Client for iSeries: List Price, $89.00
    Getting Started with WebSphere Express for iSeries: List Price, $49.00
    Can the AS/400 Survive IBM?: List Price, $49.00
    Chip Wars: List Price, $29.95

    Heartland Bank Selects Outsourced i/OS Offering from Jack Henry IBM’s Evolving Power Systems Rollout

    Leave a Reply Cancel reply

Volume 10, Number 24 -- June 30, 2010
THIS ISSUE SPONSORED BY:

SEQUEL Software
ProData Computer Services
WorksRight Software

Table of Contents

  • Remove Trailing Blanks from Legacy Columns with the IBM OLE DB Providers
  • How Did I Do That?
  • Admin Alert: Six Things You May Not Know About i/OS Passwords
  • Generic Database Access with .NET 2.0
  • Spaces, Braces, and Semicolons
  • Admin Alert: High Availability Eliminates Disaster Recovery. . . Right?

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • POWERUp 2025 –Your Source For IBM i 7.6 Information
  • Maxava Consulting Services Does More Than HA/DR Project Management – A Lot More
  • Guru: Creating An SQL Stored Procedure That Returns A Result Set
  • As I See It: At Any Cost
  • IBM i PTF Guide, Volume 27, Number 19
  • IBM Unveils Manzan, A New Open Source Event Monitor For IBM i
  • Say Goodbye To Downtime: Update Your Database Without Taking Your Business Offline
  • i-Rays Brings Observability To IBM i Performance Problems
  • Another Non-TR “Technology Refresh” Happens With IBM i TR6
  • IBM i PTF Guide, Volume 27, Number 18

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle