• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Admin Alert: Preparing Your CBU For a Real Emergency

    March 3, 2010 Joe Hertvik

    Alongside human tragedies, computer tragedies can also occur when a critical i/OS production system stops working and a shop must unexpectedly rely on its Capacity BackUp (CBU) system as its production box. Because a hard CBU cutover can offer more challenges than a planned switch test, here are some additional configuration tasks that can help you account for the unexpected challenges that go along with an emergency CBU switch over.

    When Those Who Stand and Wait No Longer Wait

    Like a good insurance policy, you hopefully never have to use your CBU. Unfortunately, there are a number of situations where the CBU can be pressed into service for real, not just for a test. Here are just a few of the scenarios that have caused organizations to suddenly lose their production iSeries, System i, or Power i box, facilitating an emergency CBU activation.

    • Floods, which includes traditional flooding as happened in the Midwest in 2009, as well as non-traditional flooding situation such as happened in 1992, when buildings in the Chicago Loop flooded after a utility tunnel that ran under the Chicago River was breached.
    • Other acts of nature such as earthquakes and hurricanes.
    • Major power outages. If the utility company isn’t able to get power up and running for a few days, the CBU may need to be turned on to take over processing.
    • Power i hardware problems, such as a backplane error that takes the machine down unexpectedly.
    • A collapsed roof in the computer room due to snow or rain buildup.
    • A fire in the computer room.

    It may not take much to put your System i out of commission, forcing your CBU into service. To facilitate an unexpected change, you should modify your CBU run book to include the following measures that need to occur during an emergency switch over. Some of these steps may not be obvious when performing routine switch exercises, but they can become critical during a real emergency.

    Call Your HA Vendor

    For an emergency fail-over, be sure to include a run book step to contact your high availability software vendor as soon as possible. If your CBU suddenly takes over processing from a crashed production box, there may be unapplied replication updates that haven’t been applied to the CBU. Your vendor can help you work through any issues involving corrupted or incompletely replicated data. They should also be able to counsel you on how to recover from an interrupted replication.

    What Was Running When Production Crashed?

    The first few moments after your production system disappears can be critical in getting your CBU system correctly back up and running. If the main system went down quickly, it may also have abnormally ended critical batch or interactive jobs. This may result in corrupted data or unfinished jobs that need to be restarted to process customer orders correctly.

    To get information on which jobs were running when your main system crashed, you will need to take frequent snapshots of your system and make them instantly available on the CBU. On our production box, we set up a snapshot job that runs the following Work with Active Jobs (WRKACTJOB) command every 30 seconds and immediately sends that output to a replicated output queue on the CBU system.

    WRKACTJOB OUTPUT(*PRINT)
    

    If you have this information and your production box crashes, the tech staff can bring up the latest WRKACTJOB printout and see what was running within 30 seconds of when the machine went down. Once you have the automated snapshot process set up, you can place a step in the run book to remind the recovery team to check the latest WRKACTJOB printout.

    What Was Waiting To Run When Production Crashed?

    In addition to knowing what was running, you also need to know what was waiting to run when the production system went down. This is important for several reasons, including knowing how many orders were waiting to process; restarting automated job streams that contain programs that are dependent on other programs completing properly; and knowing how many and which jobs need to be resubmitted.

    Similar to how you can track which jobs were running on your production system when the failure occurred, you can write a program to run the following commands to keep track of any jobs that were waiting to run when the system crashed. This program needs to cover the following steps.

    1. Use the Work with Job Queue (WRKJOBQ) command to create a spooled file that lists out how many entries were in each of your job queues. Run this command every 30 seconds to get a complete view of what your job queues looked like before the crash. To get this information, run the following WRKJOBQ command.

    WRKJOBQ JOBQ(*ALL) OUTPUT(*PRINT)
    

    Like the WRKACTJOB command discussed above, place this report into a replicated output queue so that it will immediately be transferred and saved on your CBU.

    2. Cycle through the list of job queue entries created in step 1 and for every job queue that had entries in it that were waiting to run, take a picture of the jobs that were waiting in that job queue. Again, place each spooled file output in an output queue that is automated replicated to your CBU.

    By keeping track of which jobs were running when the production system stopped working and which jobs were waiting to be run in job queues, you can give your recovery team some basic tools to determine how much damage may have done to your system integrity as they restart production on the CBU.

    You should also note that when you run automated jobs to produce printouts of your active jobs and jobs waiting to run every 30 seconds, you will slowly start to fill your production and CBU systems up with spooled file output. To avoid this situation, you should have another automated program in place to delete excessive spooled files that are older than x number of days old. In an earlier column, I demonstrated how to create a job that automatically deleted spooled files in an output queue that meet certain deletion criteria. You can use that program or another spooled file deletion system to clear out excessive WRKACTJOB and WRKJOBQ spooled files so that they don’t clog up your system.

    Synchronizing Your Production Scheduling System With Your CBU

    If you haven’t already done so, you should implement two procedures involving your automated job scheduling software.

    1. On a regular basis, preferably several times a day, save and transfer your production job schedule from your production machine to your CBU. Depending on how your CBU is set up, you may even be able to directly replicate your automated job schedule to the CBU machine. In addition to the run schedule, be sure to also transfer over any processing history involved with the schedule (which includes dates and times that each package ran) so that the recovery team will be able to determine which production jobs did and did not run when the system crashed. By replicating the schedule and its history on a regular basis (the more often the better), you should be able to capture all the relevant scheduling information that occurred during the day.

    2. Incorporate steps into your run book so that if necessary, you can recreate and restart your production job schedule and run history on the CBU. If your run book has been thoroughly tested, this procedure may already be available.

    For information on how to replicate or restore your production job schedule and its attendant history, contact a third-party software vendor.

    RELATED STORY

    Selectively Deleting OS/400 Spool Files



                         Post this story to del.icio.us
                   Post this story to Digg
        Post this story to Slashdot

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags:

    Sponsored by
    Raz-Lee Security

    Protect Your IBM i and/or AIX Servers with a Free Virus Scan

    Cyber threats are a reality for every platform, including IBM i and AIX servers. No system is immune, and the best defense is prompt detection and removal of viruses to prevent costly damage. Regulatory standards across industries mandate antivirus protection – ensure your systems are compliant and secure.

    Get My Free Virus Scan

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Sponsored Links

    Northeast User Groups Conference:  20th Annual Conference, April 12 - 14, Framingham, MA
    DRV Technologies:  SpoolFlex automatically converts reports to user friendly PC formats - FREE trial!
    COMMON:  Join us at the annual 2010 conference, May 3 - 6, in Orlando, Florida

    IT Jungle Store Top Book Picks

    Easy Steps to Internet Programming for AS/400, iSeries, and System i: List Price, $49.95
    The iSeries Express Web Implementer's Guide: List Price, $49.95
    The System i RPG & RPG IV Tutorial and Lab Exercises: List Price, $59.95
    The System i Pocket RPG & RPG IV Guide: List Price, $69.95
    The iSeries Pocket Database Guide: List Price, $59.00
    The iSeries Pocket SQL Guide: List Price, $59.00
    The iSeries Pocket Query Guide: List Price, $49.00
    The iSeries Pocket WebFacing Primer: List Price, $39.00
    Migrating to WebSphere Express for iSeries: List Price, $49.00
    Getting Started With WebSphere Development Studio Client for iSeries: List Price, $89.00
    Getting Started with WebSphere Express for iSeries: List Price, $49.00
    Can the AS/400 Survive IBM?: List Price, $49.00
    Chip Wars: List Price, $29.95

    VAI Lands Two More Customers for S2K 5.0 i 7.1 Due April 14, with Open Access for RPG, Other Goodies

    Leave a Reply Cancel reply

Volume 10, Number 8 -- March 3, 2010
THIS ISSUE SPONSORED BY:

SEQUEL Software
CNX
System i Developer

Table of Contents

  • Variable Program Calls in Free-Format RPG
  • How to Replace Display Files While They Are In Use
  • Admin Alert: Preparing Your CBU For a Real Emergency

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • POWERUp 2025 –Your Source For IBM i 7.6 Information
  • Maxava Consulting Services Does More Than HA/DR Project Management – A Lot More
  • Guru: Creating An SQL Stored Procedure That Returns A Result Set
  • As I See It: At Any Cost
  • IBM i PTF Guide, Volume 27, Number 19
  • IBM Unveils Manzan, A New Open Source Event Monitor For IBM i
  • Say Goodbye To Downtime: Update Your Database Without Taking Your Business Offline
  • i-Rays Brings Observability To IBM i Performance Problems
  • Another Non-TR “Technology Refresh” Happens With IBM i TR6
  • IBM i PTF Guide, Volume 27, Number 18

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle