Admin Alert: Beyond Replication in an i5/OS High-Availability Environment
June 3, 2009 Joe Hertvik
One company I deal with has two different high-availability setups that mimic production partitions for its main systems. They’ve been running this configuration for two years, and it’s amazing how many high availability issues occur that have little or nothing to do with basic replication. This week, I’ll look beyond basic replication in building a Capacity BackUp (CBU) system and how these issues can affect failover processing.
The CBU in One Paragraph
A CBU is a specially configured iSeries/System i/Power i machine that communicates with your main production partition to replicate production data and applications by using high availability software installed on both machines. The CBU duplicates a production system; the production system is sometimes referred to as the source or production box or machine, and the CBU is sometimes referred to as the target box or machine. In the event of a disaster, the CBU can be switched over to “impersonate” the production box with very little downtime, servicing users, devices, and companion servers. When the main production machine comes back up, the CBU relinquishes its role and production is switched back to the regular system. See the articles in the Related Stories section for more information about i5/OS CBU boxes.
When Basic Replication Isn’t Working Out as Planned
In a high availability environment, all relevant data must be replicated from the production box to the CBU as it is created, changed, or deleted. However, when replicating objects between a source box and a target CBU, I’m always amazed to find two mistakes that can and will bite you whenever you attempt to failover to your backup box.
The first mistake is taking replication for granted. Don’t assume that new libraries or folders on the source system will automatically be added to the target system. They won’t. Replicating data between systems is an on-going process that must be looked at weekly, if not daily. So your first duty beyond basic replication is to set up daily auditing reports that inform you when a library is present on the source system that isn’t present on the target system. When you find a new library that should be added to your CBU, start replicating it over to the CBU immediately, so that you don’t get a nasty surprise the next time you failover. Many popular replication packages offer comparison reports that can be used to compare which libraries are present on your production system that are not present on your target system. Use and audit these reports every single day to keep your libraries in sync.
The second mistake occurs when administrators don’t make sure that replicated data stays in sync. Before a failover, perform further auditing on your data groups to make sure that someone hasn’t accidentally removed a library from the replication scheme. My shop ran a test last month where we found a critical library was present on both the target and source systems, but its contents hadn’t been replicated in six months. Replication had accidentally been turned off; the programs worked but the data was old. So in addition to making sure that you have the same libraries on both systems, make sure that the data is being kept in sync. Otherwise, you may have replicated the file structure perfectly but your data may not be up to date.
Are All Your TCP/IP Settings in Sync?
Chances are good that your CBU failover routines already contain provisions to activate the same TCP/IP interfaces on your target CBU system that you already have on your source system. But be careful that your target system also contains these other TCP/IP entries that your source system uses to communicate with other systems.
Subsystem Descriptions, Job Queues, and Job Descriptions
In order to run your target system as an exact duplicate of your production partition, all of your CBU subsystem descriptions, job queues, and job descriptions must match their production system counterparts. For example, some batch processes may be set up to submit specific jobs to specific job queues that are attached to specific subsystems. If the job queues used on the production subsystem aren’t present on the CBU, the job will not be submitted. Similarly, if the job queues on the CBU aren’t associated with the same subsystems on the production system, jobs may be submitted to a job queue in the target CBU system but the job may not run in its intended subsystem or it may not run at all.
Job descriptions are slightly different but the idea is the same. Submitted jobs rely on job descriptions to retrieve their job priority, output priority, initial library list, and other job parameters. If the job descriptions on the CBU aren’t exactly the same as the job descriptions on the production box, submitted jobs may fail or run with the wrong parameters.
Here are the rules of thumb for making sure that all the correct job descriptions, job queues, and subsystem descriptions are the same on both systems.
IBM Licensed Programs
CBU failover scenarios can become complicated if different IBM licensed programs are loaded on your target system than on your CBU system. Some software packages may need various IBM licensed programs or options to work (such as the Portable Applications Solution Environment, the CCA Cryptographic Service Provider, or the Java Developer Kit). If these packages aren’t available for a failover, critical applications could refuse to run or incorrectly run.
As you’re setting up your CBU, audit the IBM licensed programs on the source system against the IBM licensed programs on the CBU. You can do this by running the Display Software Resources (DSPSFWRSC) command on both machines and comparing the results. In general, any IBM licensed program or option that is present on the source machine should also be present on the target machine. If it isn’t, make arrangements to load them on the target machine (along with relevant PTFs) and call IBM for license keys for each package, as needed. The success of your CBU failover scenarios may depend on having the correct IBM products loaded.
Beyond Basic Replication
While the tips in this article won’t solve all your high availability failover problems, they will alert you to some issues that may not have been readily apparent when you first set up your CBU. Remember, you’ll learn something new every time you failover production processing to a CBU. Use these tips and your own experience as a way to improve your high availability solution, even if you never have to use it in a disaster.