two
Volume 3, Number 34 -- October 4, 2006

Top 10 Requirements for High Availability

Published: October 4, 2006

by Michael Bilancieri

Protecting systems and applications from downtime tops many IT managers' lists of "things that keep me awake at night." With new and more stringent compliance regulations, availability has become more important, yet more confusing, than ever. In this article, we'll take a shot at deciphering the array of definitions and terms used in the field, and then list the 10 most things you can do to prepare your shop for high availability.

The growing list of availability vendors often claim that they provide a comprehensive set of solutions, complete with disaster recovery, high availability, continuous availability, fault tolerance, real-time data protection, business continuity, and the list goes on. You name the term, and nearly all industry vendors will claim they offer it. To make things even more confusing, vendors often define these terms differently. This presents a formidable challenge for customers looking for an availability solution amid tremendous industry noise.

Let's first take a look at the terms mentioned above and articulate their true meaning.

Disaster Recovery (DR) is one of the most widely-used buzz words in the industry, with nearly all vendors claiming they offer a DR solution. DR is important but it doesn't necessarily mean "availability." Recovery is the process of rebuilding and getting applications back online after a major failure. Consider it the contingency plan when all else fails. A valid DR plan could outline a recovery period of hours or even days while your availability needs are much shorter; minutes or less. A DR plan is a way of getting data to a secondary location where it can be quickly accessed to rebuild and recover from.

High Availability (HA) refers to the capability to keep applications up through most failure situations but maybe not through some of the more severe outages. Some vendors claim they deliver HA, but in reality they could take many minutes to get applications back online. A few minutes may be acceptable but most applications can not likely afford 60 to 90 minutes of downtime. It is important to understand what vendors mean when they say HA as it may not meet your requirements.

Continuous Availability (CA) is not the same as HA. CA is just that: systems run continuously without stopping. If a failure causes the application to stop and restart, it's not continuous. Very few vendors offer true CA.

Data protection is an important component of most availability solutions. Even if the system and application are restarted, missing or corrupt data will render the application useless. Don't let claims of "real-time data protection" mislead you into believing that your data is fully protected. Claims of real-time data protection refer to the capture of changes, not the protection of those changes. These solutions send changes to secondary storage as schedules and bandwidth permit. If a failure occurs before the data is sent, this data is lost and your secondary data copy is likely to be corrupt, preventing the application from starting.

The Top Ten

Now let's have a look at some of the things you should do to protect your systems and applications from downtime. Downtime affects your users, your customers, sales, revenue, productivity, and just about every other facet of your business.

  1. Take a pragmatic approach and determine what's right for your business

    Availability is not one-size-fits-all. Every business has different objectives for different applications. Look to others for guidance but stay focused on your specific requirements.

  2. Understand your businesses requirements

    What is it that you actually need to accomplish? Is it DR, availability or both? Implementing the wrong or incomplete solutions will result in wasted time and money. Check with your users and clients to determine their requirements and any service levels that must be met.

  3. Determine your Recovery Point Objective (RPO)

    RPO is the point back in time to which you must be able to recover systems and data after a failure. In other words, how much data can you afford to lose? None? Five minutes? Five hours? This will vary for different applications. Understanding RPO will help to focus your search on appropriate solutions.

  4. Determine your Recovery Time Objective (RTO)

    RTO is the amount of time a system can be down without major impact to your business. RTO will also vary with each application and is crucial to determine. If a system can be down for many hours or even days then disk- or tape-based backup may be sufficient, however most applications are likely to require an RTO of zero (which is CA) to just a few minutes (HA).

  5. What situations do you need to protect against?

    Failures and disasters come in all types and sizes and the causes will vary depending on a number of factors. If tornados or hurricanes are a concern, you may need a longer distance solution, while other geographies may have shorter distance requirements. Items to consider include power (both at the facility and the regional power grid--remember the northeast US blackout of 2003?), system failures, and natural and man-made disasters.

  6. Get support and buy-in from management and the business units

    If you don't, it will be very difficult to get the project off the ground, let alone implemented. Availability and compliance require a corporate focus. If you go it alone, your chances of success will be low.

  7. Understand your options

    Understand the details of how vendors' solutions really work, what they can and can't do and most importantly how they will react in specific failure situations. Don't base your decisions on their product datasheets. Question and understand the details.

  8. Start with easy, basic precautions

    These include separate power circuits, backup power sources, and network cabling to remove any single point of failure such as a single switch or router.

  9. Accept it, acknowledge it, embrace itxt

    It can certainly be a daunting project, one that is easy to put off, but once you set some ground rules and get things started it's really not that painful.

  10. Get started

Michael Bilancieri is the director of products at Marathon Technologies, a leading provider of high availability software for Windows environments.



Sponsored By
LAKEVIEW TECHNOLOGY

There Must Be An Easier Way

There is!
MIMIX takes the work and worry out of Windows data protection.

Stop wasting time and resources on backup operations and difficult recovery procedures.

MIMIX ha1 for Windows protects data easily and automatically,
recovers your critical data in a snap.

Try MIMIX for free with your Windows applications today.

www.MIMIX.com



Editor: Alex Woodie
Contributing Editors: Dan Burger, Joe Hertvik,
Shannon O'Donnell, Timothy Prickett Morgan
Publisher and Advertising Director: Jenny Thomas
Advertising Sales Representative: Kim Reed
Contact the Editors: To contact anyone on the IT Jungle Team
Go to our contacts page and send us a message.

Sponsored Links

Vision Solutions:  Get facts on managed availability and business continuity to eliminate downtime
Wolf Computer Consulting:  Reliable service and affordable rates for business computing needs
COMMON:  Join us at the Spring 2007 conference, April 29 - May 3, in Anaheim, California

 


 
Subscription Information:
You can unsubscribe, change your email address, or sign up for any of IT Jungle's free e-newsletters through our Web site at http://www.itjungle.com/sub/subscribe.html.

Copyright © 1996-2008 Guild Companies, Inc. All Rights Reserved.
Guild Companies, Inc., 50 Park Terrace East, Suite 8F, New York, NY 10034

Privacy Statement