fhg
Volume 7, Number 37 -- October 24, 2007

Admin Alert: The System i High Availability Roadmap

Published: October 24, 2007

by Joe Hertvik

I've recently been involved in a high availability (HA) project where the goal is to be able to switch processing from a failed System i box to a Capacity Backup (CBU) machine within one hour of failure. Growing from that experience, this week's column is the first of an occasional series where I'll discuss HA concepts and how i5/OS administrators can expedite continuous availability procedures in their shop.

What Does High Availability Mean To a System i Shop?

Although there are more complicated definitions for high availability, my favorite explanation is that high availability consists of whatever you do to provide continuous access to computer resources in the event of component failures on the system or a total system failure.

Given this broad definition, there are many different ways that i5/OS techniques and procedures can contribute to continuous system processing, many of which you might not consider high availability by today's standards. It's possible to make an administrative roadmap of sorts that provides a rudimentary checklist of common techniques and methods that support high availability functions in System i, iSeries, and AS/400 shops. To that end, I humbly submit the following techniques and methods that System i shops can use to get started with and use to refine their high availability strategies. Some of these techniques are elementary and common to all shops. Others require more planning and investment capital.

Daily and Weekly Backups: Backups provide a way to recover data and applications in the event of file corruption or destruction. Regular valid backups are the core component of a high availability system and for most organizations, not much can be done in terms of restoring a system unless a good a backup strategy is in place.

Backup Media and Equipment Maintenance, and Backup System Auditing: The best backup in the world will not do you any good if your tape drive is malfunctioning and needs to be cleaned. In addition to backing up the data, you should occasionally audit your backups and perform test restores on a regular basis to make sure that what you're backing up can be restored. If your company falls under Sarbanes-Oxley compliance, backup monitoring and test restores may be an auditing requirement.

Off-Site Backup Media Storage: Backup media should always be physically separated from the systems where their source data resides. And the tapes should be stored in a protected environment with limited access, preferably in a location that is far enough away from the data source so that a disaster can't take out both locations. Secured storage of backup tapes protects your system from total calamity if a disaster takes out your entire computer room. Auditing requirements may also require your company to keep a log of all tapes that are kept off site and record the movement of each tape to and from the storage facility.

Save-While-Active Backups (SWA): If you're able to run them, SWA backups save data as it is being used by applications and system processing, providing access while the information is still being used. Properly executed, an SWA strategy extends your processing window while still backing up your data. And in a 24x7 environment where Web users and business partners from around the world are looking to access your site at any hour of the day or night, continuous access to data is a key requirement. To make sure that there are always complete backups of your entire system, you may want to supplement regular backups with occasional full system backups once a quarter or whenever you perform system maintenance.

Uninterruptible Power Supply (UPS) Monitoring Systems on Your System i Box and All Attached Components: UPS systems provide continuous power so that short-term power outages or full-fledged blackouts don't unexpectedly take down your system, possibly damaging data and applications in the process. During a short-term outage, a UPS helps your system keep working during the time that it takes for the power to stabilize. In an extended outage, UPS systems provide administrators with a chance to get to the computer room and take down the system in an orderly fashion to avoid a system crash. These systems can also keep your systems going long enough for a secondary power supply, such as a generator, to kick in and keep the system running for a longer period of time. The last benefit of a good UPS system is that it can absorb and block electrical spikes coming down the power line.

Computer Room Generator: In an extended outage, a generator can power an entire computer room or section of the building until the power returns, avoiding the disruption that may occur. Once in place, a generator allows your System i to continue with its daily processing (batch, interactive, and server), even if there aren't any on-site users to take advantage of it.

Disaster Recover Contracts, Services, and Testing: Detailed plans and contracts to restore company and System i processing in an off-site location when a disaster occurs are staples in many System i shops. A good disaster recovery contract includes an off-site facility where you can restore and restart system processing and where users can access the system until the main facility is available again. Disaster recovery plans should be tested at least once a year. In an ideal situation, they should also be combined with a business continuity plan, where a logistical plan for how the organization (with and without its computer systems) functions after an extended disruption or a disaster.

Capacity BackUp Systems (CBU): The CBU is the Cadillac of the high availability world. A CBU is basically a system in waiting on your network. A System i CBU communicates with your main production system and replicates system data through the use of high availability software, such as the different HA solutions provided by Vision Solutions (MIMIX HA, iTera HA, and ORION HA) or the DataMirror products offered by IBM. When replication is correctly performed, the database on your CBU is a duplicate of the database on your production system. If your main production system becomes unavailable when you have a CBU, you would manually initiate a switching process to reconfigure your CBU system to impersonate your production system, complete with almost up-to-date replicated data. If configured correctly, users, devices, and companion servers will then be able to log on to and interact with the CBU as if it were the production server. When your production box is ready to come back on line, the CBU resynchronizes its data with production, repopulating the production box with all the database changes that occurred while production was down. After system synchronization, the CBU is quickly reconfigured and restarted as the backup replicated system again, and the production system is restarted and resumes servicing all system users and devices.

Now while it seems like an expensive proposition to dedicate an entire System i box to doing nothing but wait for your production box to fail, IBM offers several System i Capacity Backup editions that are significantly cheaper (but not free) than its Enterprise Edition products. The cost of these servers must be weighed against what the business would lose in the event of a severe or significant disaster, such as what happened when Hurricane Katrina hit the Gulf Coast in 2005.

It's also worth noting that there are several other costs involved with staging a CBU in addition to the cost of the backup System i box itself. To protect the box from disasters that take out a company's entire computer infrastructure, the CBU should be accessible on the same subnet as your production system but it should be physically housed in another location at a respectable distance from the production system's location. Some people house their CBUs at sister locations and others co-locate them at outside vendor centers that are specifically set up for high availability processing. In either case, there will be additional telecom and infrastructure costs to connect a server in a remote environment as part of your network. Beside co-location costs, you need to purchase high availability software as well as acquire some experienced help to set up and configure your HA replication strategy.

In future issues, I'll explain some of the other costs and responsibilities you may incur when setting up a high availability system as well as some of the other unexpected benefits that grow out of these systems. But the main point is this:

While a high availability system will be invaluable to any company that chooses to implement one, it is a relatively expensive undertaking and for most organizations, a good business case must be made before undertaking the project. For banks, insurance companies, and other large organizations that have a critical need for close to zero downtime, it may be an easy sell to implement a CBU. Other smaller organizations will have to make a clear evaluation on whether it is worth it to the business to implement a dedicated solution.

Regardless of where you are at on the high availability roadmap, it's important to review your options every so often to ensure that system availability will be as high as possible in your organization. You should also make a point to be looking at different alternatives to keep the business going in the event of disaster. While this roadmap is a good first start, I'll flesh out the challenges associated with implementing high availability in future issues. For information on some of the other options that IT Jungle has covered in previous issues, see the Related Stories section below.


RELATED STORIES

Creating a Save Changed Objects Backup Tape

Dissecting an Option 21 Save

Five Things That Kill Backups (and What to Do About Them)

Meditations on Full System Backups

Two Ways to Audit Your Backup Strategy

What Happened to My Backup?



                     Post this story to del.icio.us
               Post this story to Digg
    Post this story to Slashdot


Sponsored By
COMPUTER MEASUREMENT GROUP

CMG '07 International Conference
Enterprise Computer Performance Management
December 2-7, San Diego

Learn how to master today's most demanding enterprise computer performance management challenges at CMG '07-December 2-7 in San Diego. CMG '07 is the world's largest gathering of IT professionals focused on performance optimization…capacity planning…and resource management for enterprise computing systems. This 33rd annual conference is sponsored by the Computer Measurement Group (CMG), a not-for-profit worldwide association for systems management professionals.

Register today at www.cmg.org
Or call 800-436-7264


Senior Technical Editor: Ted Holt
Technical Editors: Howard Arner, Joe Hertvik, Shannon O'Donnell, Kevin Vandever
Contributing Technical Editors: Joel Cochran, Wayne O. Evans, Raymond Everhart,
Bruce Guetzkow, Brian Kelly, Marc Logemann, David Morris
Publisher and Advertising Director: Jenny Thomas
Advertising Sales Representative: Kim Reed
Contact the Editors: To contact anyone on the IT Jungle Team
Go to our contacts page and send us a message.

Sponsored Links

COMMON:  Join us at the annual 2008 conference, March 30 - April 3, in Nashville, Tennessee
BOSaNOVA:  Download our 'Best Practices for Securing your Backup' whitepaper
NowWhatJobs.net:  NowWhatJobs.net is the resource for job transitions after age 40


 

IT Jungle Store Top Book Picks

The System i RPG & RPG IV Tutorial and Lab Exercises: List Price, $59.95
The System i Pocket RPG & RPG IV Guide: List Price, $69.95
The iSeries Pocket Database Guide: List Price, $59.00
The iSeries Pocket Developers' Guide: List Price, $59.00
The iSeries Pocket SQL Guide: List Price, $59.00
The iSeries Pocket Query Guide: List Price, $49.00
The iSeries Pocket WebFacing Primer: List Price, $39.00
Migrating to WebSphere Express for iSeries: List Price, $49.00
iSeries Express Web Implementer's Guide: List Price, $59.00
Getting Started with WebSphere Development Studio for iSeries: List Price, $79.95
Getting Started With WebSphere Development Studio Client for iSeries: List Price, $89.00
Getting Started with WebSphere Express for iSeries: List Price, $49.00
WebFacing Application Design and Development Guide: List Price, $55.00
Can the AS/400 Survive IBM?: List Price, $49.00
The All-Everything Machine: List Price, $29.95
Chip Wars: List Price, $29.95


 
The Four Hundred
State of the System i: First-Hand Reports from Second-Hand Dealers

System i Sales Drop Again in Q3, IBM Says Little

IBM Hit by Financial Services Slowdown in Q3

Mad Dog 21/21: Symphony for the Devil

The Linux Beacon
Ubuntu Hits Launch Target for 7.10 Linux Release

Novell Delivers Workgroup Software Bundle for SMBs

Intel Is Back on Track in Q3, AMD Is Fighting to Get There

IBM Hit by Financial Services Slowdown in Q3

Four Hundred Stuff
Talend Adds i5/OS Support to Open Source ETL Tool

VAI to Deliver Flexible Computer-Telephone Integration, Thanks to iMS

LogLogic Delivers Fine-Grained User Activity Monitoring

NGS Launches Pre-Built Data Mart for Distributors

Big Iron
IBM Hit by Financial Services Slowdown in Q3

Top Mainframe Stories From Around the Web

Chats, Webinars, Seminars, Shows, and Other Happenings

System i PTF Guide
October 20, 2007: Volume 9, Number 42

October 13, 2007: Volume 9, Number 41

October 6, 2007: Volume 9, Number 40

September 29, 2007: Volume 9, Number 39

September 22, 2007: Volume 9, Number 38

September 15, 2007: Volume 9, Number 37

The Windows Observer
Office Communication Server 2007 Launched by Microsoft

Will OCS 2007 Live Up to the Hype?

Zend Puts Out New Release of Commercial-Grade PHP

Growing Businesses, Upgrades Drive IT Hiring in Q4

The Unix Guardian
Sun Elaborates on its xVM Virtualization Plans

Apple's Leopard Mac OS X Server Coming October 26

IBM Hit by Financial Services Slowdown in Q3

As I See It: Great Looking Genes

Four Hundred Monitor
Four Hundred Monitor's
Full iSeries Events Calendar

THIS ISSUE SPONSORED BY:

WorksRight Software
Help/Systems
Computer Measurement Group


Printer Friendly Version


TABLE OF CONTENTS
Good Reasons to Use Unrequired Correlation Names

Externally Described Database IO through Data Structures

Admin Alert: The System i High Availability Roadmap

Four Hundred Guru

BACK ISSUES

From the IT Jungle Forums
Finding *OUTFILE Template Files

i5/OS V5R4 Release Notes

MCH1202

Crashing processes!

SQL 'Hidden' Field





 
Subscription Information:
You can unsubscribe, change your email address, or sign up for any of IT Jungle's free e-newsletters through our Web site at http://www.itjungle.com/sub/subscribe.html.

Copyright © 1996-2008 Guild Companies, Inc. All Rights Reserved.
Guild Companies, Inc., 50 Park Terrace East, Suite 8F, New York, NY 10034

Privacy Statement