|
Admin Alert: Redundancy is Good, Redundancy is Good, Re…
Published: June 18, 2008
by Joe Hertvik
The next time you order a Power i upgrade, pay special attention to the redundant components that are available. As the computing world moves toward more and more redundancy options, many 24x7 i5/OS shops are requiring absolute redundancy in almost everything available. This week, I'll look at some of the redundancy options available for i5/OS machines and how they can help your shop.
Basic Redundancy
The math of redundancy is simple. If it costs 10 percent more to install redundant components, for example, how much will that save the company if it keeps a critical system running during a busy period? Will it cost less than if the servers go down for an extended period of time without a high availability solution in place for rapid recovery? For many shops, the math does add up and there's no question of cost or whether or not they should put in redundant components. They just do it.
But if you're wondering how you can protect your Power i platform from various failures as you upgrade your machine, here are some suggestions for how to put the power of redundancy to work in your organization.
The Hierarchy of Redundant Electricity
With any enterprise computing system, the easiest crash to prevent is the one caused by power failure (although the solutions can be costly). There are a number of techniques for rescuing your Power i from sudden power outages that can take down your business in a literal flash. By increasing order of protection, here's a hierarchical ladder of power protection strategies that you can implement to keep your i5/OS and OS/400 machines running.
- Redundant power supplies on system components--Many Power i system components, such as a 0595 rack or a disk tower, can be ordered either with or without redundant power supplies. When you have dual power supplies, it's a simple matter to plug them into two different circuits or Power Distribution Units (PDUs), which can protect the component from both a power supply failure and a circuit failure.
- Redundant PDUs--In the i world, a PDU is basically a glorified power strip installed in a system tower. Like a power strip, the PDU plugs into a circuit, and your components plug into the PDU outlets. If each system component has a redundant power supply, it's easy to ensure that your system won't crash in the event of a circuit failure. All you have to do is to plug one of the two redundant power supplies into one PDU, and plug the other into the other PDU. This way, if one of the PDUs goes down or has a problem, your production system will continue working because the other PDU will still be powering your components.
- Redundant circuits--Once you have redundant power on your entire box through redundant power supplies and PDUs, the next thing you should do is to plug each half of your redundant electrical connections into different isolated circuits. By doing this, if you have a problem with one of the circuits (say the electrician trips a circuit as he's working), power will keep flowing through the redundant circuit into your PDUs and power supplies and your system will keep working.
- A UPS system--The next level of power protection involves installing a battery backup UPS system to supply power to your Power i/System i machines in the event of a complete power failure. Many organizations are able to put their entire computer room on a larger battery backup system. Keep in mind, however, that battery backup systems aren't designed to run your systems for long periods of time. Most common UPS systems are designed to run equipment for a short period of time after an outage (usually less than half an hour) until one of two things happens: either the power outage is a brief blip lasting less than one minute and the UPS provides enough power to keep the machine running so that it doesn't crash while power was being restored; or during a longer power outage, the UPS provides enough time for your administrators to get into the computer room and physically turn off your machines before the UPS power gives out.
- A generator--Depending on the effect that power outages can have on your business, your company may decide to purchase a generator to provide power during an extended power outage. You can usually program the generator to come on a few minutes after the power outage starts, to prevent the generator from firing up during a brown-out or an extremely short outage. The idea is to save energy by letting the UPS system power your computers during a short outage. If the outage continues, then it's time to get the generator involved.
Disk Drive Redundancy
Like many other systems, Power i, System i, and iSeries machines offer RAID 5 and mirroring protection in order to protect your machine against a disk drive crash. RAID 5 recreates the data from a failed disk drive, while mirroring creates an exact copy of your disk set that can be used in an emergency. For i5/OS V5R3 and above, IBM now offers new DASD IOAs and features that support RAID 6 protection. Like RAID 5, RAID 6 protects data from being lost through disk failure. However, RAID 6 protection extends to two disk drives failing in a RAID set rather than the one-disk protection that RAID 5 offers.
Data Redundancy in a Cluster
In multiple system environments, IBM offers the option of creating a collection or binding a group of multiple systems together to form a cluster. Clustered systems work together as a single system to provide almost 100 percent availability for critical applications and data, simplify systems management, and increase system scalability. A cluster can contain up to 128 systems. To learn more about clusters, check out the IBM Information Center topic Availability for multiple systems: Clusters.
To enhance high availability in a clustered environment, IBM offers several different data resilience technologies that protect data and allow data access at all times. According to IBM, data resilience refers to "the ability for the data to remain accessible to the application even if the system that originally hosted the data failed." These technologies include data replication, switchable disk pools, cross-site mirroring, and the Copy Services function provided with IBM System Storage DS products. To learn more about IBM's data resilience options, check out the IBM Information Center topic Data resilience solutions for i5/OS clusters.
CBU Redundancy
If a clustering environment is too expensive to implement and it's critical that your system always remain up, you can investigate purchasing and installing a Capacity BackUp (CBU) system Power i box. (See my article Five Benefits of a High Availability System for more on CBU.) A CBU is basically a system in waiting on your network. With the use of high availability software from companies such as Vision Solutions, IBM's DataMirror, and Bug Busters Software Engineering, the CBU contains an almost up-to-date copy of your applications and your data. When disaster hits and the production box isn't available (such as during a hurricane, flood, or fire), you can fail over to the CBU, the CBU impersonates your downed production box, and you restart processing.
The CBU can be located on-site, or it may be hosted at a remote location. Some companies host their CBUs at sister locations, since those locations are already hooked up to the internal network. Other companies may hire a company that provides hosting services to co-locate their CBU.
RELATED RESOURCES AND STORIES
Availability for multiple systems: Clusters
Data resilience solutions for i5/OS clusters
Five Benefits of a High Availability System
Post this story to del.icio.us
Post this story to Digg
Post this story to Slashdot
|