• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Admin Alert: Elements Of An IBM i Incident Management Plan, Part 2

    April 16, 2014 Joe Hertvik

    Last issue, I started outlining how to set up an IBM i incident management plan, going through four of the seven elements that are crucial for IBM i monitoring and response. This issue, let’s finish up and discuss the final elements an IBM i incident management template should provide.

    The Elements Of IBM i Incident Management, Revisited

    As presented last time, here are the critical elements every IBM i incident management plan should include.

    1. What type of monitoring are you doing: Manual, automatic, or hybrid?
    2. What are you monitoring for?
    3. Call trees: Who should be alerted when a problem occurs?
    4. Call tree protocol: How do you contact responders?
    5. Redundancy: What happens if your response protocol breaks down?
    6. Who handles damage control and keeping users/management informed?
    7. Recovery: What happens after the problem is over?

    Last issue, I covered items 1 through 4. Today, let’s look at what you can do to add redundancy, damage control, and recovery planning to the list (items 5 through 7).

    Part 5: Redundancy: What happens if your response protocol breaks down?

    It’s important to plan for what happens if your notification system breaks down.

    Let’s say your notification protocol calls for having your IBM i server send out an email message and a text message to your responders when a problem occurs. It uses the company’s email server to deliver those messages. (Check out this article for how to deliver text messages via email). But suppose the email system is down or the TCP/IP network hosting your IBM i is unavailable? How do your responders receive alert messages in those cases?

    One way to answer this question is with a two-pronged approach that takes advantage of email and an old fashioned modem. With this approach, every IBM i alert is sent out through two different transmission methods.

    • The first alert is sent out through the company email system as both an email and a text message.
    • The second alert is sent out as a text message through an analog modem and a phone line.

    This set up takes advantage of TAP paging terminal phone numbers. Many telecommunications companies still supply their own dial-up number for sending out text messages. This means you can send all your alerts out through standard email AND through an analog phone line to your cell phone provider’s TAP numbers. Doing this, you can insure that your automated IBM i text alerts will always go out, even if your email service is down.

    See this article I previously wrote for more information on setting up an IBM i modem to use TAP in conjunction with email messages for IBM i system monitoring.

    Part 6: Who handles damage control and keeping users/management informed?

    When you’re in the middle of handling an IT emergency, it’s easy to forget there are people who may be unable to work because the system is down. Conversely, other parts of the system may not be working due to the IBM i problem you’re working on.

    So any good IBM i incident management plan should also specify people who play the following roles:

    User liaison–Keeps your users informed about what’s happening and how soon a fix will be implemented. Ideally, this should be an IT manager or someone else who isn’t involved with solving the actual issue. The help desk manager is also a good candidate for user liaison.

    The user liaison’s job is to get the latest information on progress for an incident fix and to notify affected users how the fix is going. The user liaison’s other job is to keep the pressure off the responders, so they have time to troubleshoot and fix the issue. Depending on how wide-spread the issue is, the user liaison may need to notify the following groups when a problem occurs.

    • Management–Depending on proximity and company preferences, notification can be accomplished through an email, but you may also have to make a personal phone call or visit.
    • Users–Can generally be notified by email. If the issue only affects one department or a small set of users, you may also want to discuss by phone call or personal visit.
    • Business partners –Call or email.
    • Customers–May need to be contacted either by the IT department or the business owner of the customer relationship.

    It can be helpful to use a form email that can be updated as problem resolution proceeds. Any email notification you send out should include time of notification; a short description of the problem; the expected fix; the expected time the fix will be implemented; and the expected time you’ll send out the next notification email. An hour is a reasonable amount of time between updated notifications, and it’s important to keep users updated on a regular basis for an extended issue.

    Damage control–A production bust may also affect other IT processing or production functions. Aside from your responders, you may need someone to gather a team to devise work arounds for the affected systems. Again, this should be someone besides the people working on the problem, though you might employ the User Liaison team to perform this function.

    Part 7: Recovery: What happens after the problem is over?

    After the problem is finished, you need to perform the following functions:

    • Final notification to users that the problem is fixed–This notification should include any special instructions the users need to follow or items they need to be aware of.
    • Clean-up–Determine who needs to perform follow-up work to correct any additional issues that occurred because of the original problem. If the issue happened during off-hours, you may need to call in a crew to affect cleanup. You may also use the damage control crew from step 7 for this function.
    • Setting things straight–Reverse any temporary changes that were put in during the fix period, such as holding reactive jobs, limiting customer or employee access to affected functions, etc.
    • Lessons learned–Analyze the root cause of the problem and determine whether additional items need to be changed to prevent the issue from occurring again. Both the responders and IT management should participate in this phase. New projects may need to be created and approved because of this phase.

    This completes my template on setting up an IBM i incident management plan. If you have any comments or other items to add to the plan, email me at joe@joehertvik.com and I may use them in a future column.

    Joe Hertvik is an IBM i subject matter expert (SME) and the owner of Hertvik Business Services, a service company that provides written marketing content and presentation services for the computer industry, including white papers, case studies, and other marketing material. Email Joe for a free quote for any upcoming projects. He also runs a data center for two companies outside Chicago, featuring multiple IBM i ERP systems. Joe is a contributing editor for IT Jungle and has written the Admin Alert column since 2002. Check out his blog where he features practical information for tech users at joehertvik.com.

    RELATED STORIES

    Admin Alert: Elements Of An IBM i Incident Management Plan, Part 1

    Admin Alert: Adding Redundancy to Power i SMS Monitoring

    Configuring Messaging Software for Overnight Monitoring



                         Post this story to del.icio.us
                   Post this story to Digg
        Post this story to Slashdot

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags:

    Sponsored by
    WorksRight Software

    Do you need area code information?
    Do you need ZIP Code information?
    Do you need ZIP+4 information?
    Do you need city name information?
    Do you need county information?
    Do you need a nearest dealer locator system?

    We can HELP! We have affordable AS/400 software and data to do all of the above. Whether you need a simple city name retrieval system or a sophisticated CASS postal coding system, we have it for you!

    The ZIP/CITY system is based on 5-digit ZIP Codes. You can retrieve city names, state names, county names, area codes, time zones, latitude, longitude, and more just by knowing the ZIP Code. We supply information on all the latest area code changes. A nearest dealer locator function is also included. ZIP/CITY includes software, data, monthly updates, and unlimited support. The cost is $495 per year.

    PER/ZIP4 is a sophisticated CASS certified postal coding system for assigning ZIP Codes, ZIP+4, carrier route, and delivery point codes. PER/ZIP4 also provides county names and FIPS codes. PER/ZIP4 can be used interactively, in batch, and with callable programs. PER/ZIP4 includes software, data, monthly updates, and unlimited support. The cost is $3,900 for the first year, and $1,950 for renewal.

    Just call us and we’ll arrange for 30 days FREE use of either ZIP/CITY or PER/ZIP4.

    WorksRight Software, Inc.
    Phone: 601-856-8337
    Fax: 601-856-9432
    Email: software@worksright.com
    Website: www.worksright.com

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Sponsored Links

    Essextec:  Linux on Power. Lunch on us. A winning combination.
    LANSA:  Webinar: Mobile and the IBM i: Why Should You Care? May 21, 9 am PT/11 am CT/Noon ET
    COMMON:  Join us at the COMMON 2014 Annual Meeting & Exposition, May 4 - 7 in Orlando, Florida

    More IT Jungle Resources:

    System i PTF Guide: Weekly PTF Updates
    IBM i Events Calendar: National Conferences, Local Events, and Webinars
    Breaking News: News Hot Off The Press
    TPM @ EnterpriseTech: High Performance Computing Industry News From ITJ EIC Timothy Prickett Morgan

    Electronic Storage Taps Japanese Reseller to Carry LaserVault UBD IBM i TR8, Database Driven

    Leave a Reply Cancel reply

Volume 14, Number 9 -- April 16, 2014
THIS ISSUE SPONSORED BY:

Help/Systems
WorksRight Software
Bug Busters Software Engineering

Table of Contents

  • The Geezer’s Guide to Free-Form RPG, Part 2: Data Structures and More
  • Here’s Help For A Huge Hardship
  • Admin Alert: Elements Of An IBM i Incident Management Plan, Part 2

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • IBM Unveils Manzan, A New Open Source Event Monitor For IBM i
  • Say Goodbye To Downtime: Update Your Database Without Taking Your Business Offline
  • i-Rays Brings Observability To IBM i Performance Problems
  • Another Non-TR “Technology Refresh” Happens With IBM i TR6
  • IBM i PTF Guide, Volume 27, Number 18
  • Will The Turbulent Economy Downdraft IBM Systems Or Lift It?
  • How IBM Improved The Database With IBM i 7.6
  • Rocket Celebrates 35th Anniversary As Private Equity Owner Ponders Sale
  • 50 Acres And A Humanoid Robot With An AI Avatar
  • IBM i PTF Guide, Volume 27, Number 17

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle