• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Admin Alert: A Checklist For Monitoring Your IBM i Environment

    April 3, 2013 Joe Hertvik

    IBM i administration has elements of old-time system operations and real-time monitoring. You need to ensure that everything is working correctly and that problems aren’t silently developing that can: a) interfere with customer processing; and b) violate customer service-level agreement (SLA) requirements or create audit violations. This week, let’s flesh out a checklist of items to help IBM i admins achieve these goals.

    The Essential Piece

    In order to catch trouble before it occurs, I highly recommend that you set up an IBM i monitoring system to catch developing situations and alert you via email or text when a problem is occurring. A monitoring system is critical for lights-out monitoring. Without it, it’s very difficult to catch many items I’ll be mentioning here. There are several common IBM i system monitoring products you can use for automated error messaging, including:

    • Bytware MessengerConsole
    • CCSS QSystem Monitor
    • Halcyon Software IBM i (i5/OS, System i, iSeries, AS/400) Monitoring, Scheduling & Automation Software
    • Help/Systems Robot/ALERT
    • SEA absMessage

    Contact these vendors to determine the best products for your system.

    The Checklist

    In some of my previous articles, I discussed how to set up an IBM i system monitoring system, as well as items you should be automatically monitoring on your system. A list of these articles is included in the Related Stories section at the bottom on this article. In one article, I suggested that your monitoring system should send out alerts when the following seven situations occur.

    1. Long-running batch jobs.
    2. Excessive number of jobs in job queues.
    3. Jobs that should be running, but aren’t.
    4. Critical lines, controllers, or devices that aren’t active.
    5. IP interfaces not active.
    6. Interactive users using a large amount of CPU.
    7. Interactive response time spiking.

    Building on this list, I also recommend that you monitor your IBM i partitions for these six additional items I’m reviewing today.

    1. Disk space utilization above 85 percent.
    2. Software problems reported in the Work with Problems display (WRKPRB).
    3. Monitoring QSYSOPR and other message queues for inquiry messages related to application programs.
    4. Ensuring that your daily, weekly, and monthly backup jobs complete normally.
    5. Monitoring and reordering consumable items, including special forms, printer cartridges, and ribbons for critical system printers.
    6. Monitoring for replication errors on your high availability solution.

    Some of these items can be automatically monitored with alerts sent out by your system monitoring package. Others you may have to monitor the old fashioned way: by physically checking each item and keeping a log. Together, these 13 items form a good starter checklist for any shop implementing an IBM i monitoring system.

    Let’s look at each of the six new items and see why you should be monitoring for them.

    Situation #8: Disk space utilization above 90 percent

    IBM i systems traditionally do best when disk space utilization is under 90 percent. Once utilization breaches 90 percent, the system can start behaving erratically. In a worst case scenario, your disk can fill up and crash the system. Passing this threshold may also signal that an interactive or batch job is looping and filling up disk space with excessive file records or spooled files. No matter what the cause, you’ll definitely want to know when this situation is occurring.

    By default, IBM i sets a storage threshold value of 90 percent for auxiliary storage pools. When disk utilization passes 90 percent, the following CPF0907 message is sent to the system operator message queue (QSYSOPR).

    CPF0907 - Serious storage condition may exist. Press HELP
    

    Depending on how serious the storage overflow condition is, you may also see these messages show up in QSYSOPR.

    CPF0908 - Machine ineligible condition threshold reached 
    CPF0909 - Ineligible condition threshold reached for pool &1
    

    These are all serious messages that I recommend you set up your monitoring software to look for. These messages are defined in the QCPFMSG message description file in the QSYS library.

    In my shop, we changed the ASP storage threshold from 90 percent to 85 percent. We did this to give us more time to react before a run-away job fills up disk storage and crashes the system. You can change your ASP storage threshold values in the Start System Service Tools (STRSST) menu. To find the process for changing ASP storage thresholds, check out this older article on protecting your system from critical storage errors.

    Situation #9: Software problems reported in the Work with Problems (WRKPRB) display

    When a system issue occurs, the IBM i operating system will usually issue a message with a severity of 80 or above, to the QSYSOPR message queue. You should configure your system monitoring software to automatically send out an alert when it sees one of these messages.

    But in some situations, a system problem report or resolution can also be written to the system problem log without necessarily sending out an alert. Some items such as an automatic PTF download may be reported in the problem log without a message written to QSYSOPR.

    So on a monthly basis, you may want to check if there are any items in your IBM i problem log that need attention. You can view the problem log by typing in the following Work with Problem (WRKPRB) command.

    Situation #10: Monitoring QSYSOPR and other message queues for inquiry messages related to application programs

    You definitely want to catch any inquiry messages requiring a response that are sent to the QSYSOPR message queue. To do this, you can generally set up your monitoring software to look for QSYSOPR inquiry messages with a message severity of 99.

    Severity 99 will catch all inquiry messages in QSYSOPR, but you will want to refine it to ignore certain classes of severity 99 messages. This includes any severity 99 message that come from jobs running in the QSPL subsystem as these are printer messages for when a printer is out of paper, when forms need to be loaded, etc. Printer messages are not critical messages that need to be sent to a technician monitoring the system. So ignore them.

    But printer messages may not be the only messages you’ll want to ignore. As you’re setting up your monitoring system, you’ll quickly discover which severity 99 messages can be safely ignored and which ones need to be tended to.

    For programming errors on systems with a lot of RPG programs, you may want to start monitoring for certain classes of inquiry messages with a severity level roughly greater than 50. These message IDs start with the following characters.

    • RN*
    • LBE*
    • RPG*
    • CBE*

    Again, you’ll want to experiment with which messages to monitor for and which messages to ignore for your particular system. But these are valuable inquiry messages to monitor for in a traditional IBM i environment.

    Situation #11: Ensuring that your daily, weekly, and monthly backup jobs completed normally

    You’re probably already doing this, but you’ll want to double-check that your backups are completing normally and that all objects are properly backed up. Depending on how your monitoring system is configured, it may send up a flag if an object is skipped because it’s in use.

    In certain regulatory and auditing environments, there may be a requirement to document that backups completed normally. So also consider whether you need to monitor and document completed backups.

    Situation #12: Monitoring and reordering consumable items, including special forms, printer cartridges, and ribbons for critical system printers

    Outside of electronic monitoring, you may want to set up a system to ensure that you order consumable items before they run out. Examples might be specially printed forms for invoices and shipping tickets, packing labels, printer cartridges and ribbons, and other items needed for critical processes, such as sending orders to customers. And if you’re still using tape media, don’t forget to inventory your tape library and order more tapes when you get low.

    Situation #13: Monitoring for replication errors on your high availability solution

    If you’re running certain types of IBM i high availability software, you may have to define which libraries and objects are replicated to your target system. In this case, you will want to audit for replication errors and IBM i libraries that are not being replicated to your target. Many of these packages offer audit features that allow you to quickly locate and find replication errors and omissions. If you’re not auditing your replication environment on a regular basis, you may find you are missing key objects when it’s time to switch over to your backup machine.

    More To Come?

    Keep in mind this is a starter monitoring list that you will need to add items to for your specific situation. I tried to hit the most common items, but if you find something else that should be added to the list, please feel free to email me with your suggestions.

    Reader Request: BRMS Expertise Needed for IFS Incremental Backup

    After publishing my recent article on incremental IFS backups, reader Michael Lindley checked in on the joehertvik.com website with the following question about expanding my incremental backup routine to make it usable with IBM’s Backup, Recovery, and Media Services (BRMS) licensed program.

    Good article on the IFS incremental backup. Any suggestions on how to incorporate this idea within BRMS? I have looked for this option with the BRMS service within OPS Navigator, but I cannot see anything within my back control groups.

    Since my shop uses custom written backup programs along with the occasional GO SAVE option 21 Full System Backup, I don’t use BRMS. So to help Michael out, I’m throwing this question out to my readers (i.e., you). If anyone knows how to apply the Time period for last change (CHGPERIOD) parameter from the green-screen SAVE (SAV) command to BRMS processing, please email me and I’ll publish any valid replies in a future Admin Alert column.

    Follow Me On My Blog, On Twitter, And On LinkedIn

    Check out my blog at joehertvik.com, where I focus on computer administration and news (especially IBM i); vendor, marketing, and tech writing news and materials; and whatever else I come across.

    You can also follow me on Twitter @JoeHertvik and on LinkedIn.

    Joe Hertvik is the owner of Hertvik Business Services, a service company that provides written marketing content and presentation services for the computer industry, including white papers, case studies, and other marketing material. Email Joe for a free quote for any upcoming projects. He also runs a data center for two companies outside Chicago. Joe is a contributing editor for IT Jungle and has written the Admin Alert column since 2002.

    RELATED STORIES

    Admin Alert: Seven Things You Should Be Monitoring On Your System

    Admin Alert: Basic i/OS Error Monitoring and Response, Part 2

    Admin Alert: Basic i/OS Error Monitoring and Response, Part 1

    Admin Alert: Protecting Your System from Critical Storage Errors



                         Post this story to del.icio.us
                   Post this story to Digg
        Post this story to Slashdot

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags:

    Sponsored by
    WorksRight Software

    Do you need area code information?
    Do you need ZIP Code information?
    Do you need ZIP+4 information?
    Do you need city name information?
    Do you need county information?
    Do you need a nearest dealer locator system?

    We can HELP! We have affordable AS/400 software and data to do all of the above. Whether you need a simple city name retrieval system or a sophisticated CASS postal coding system, we have it for you!

    The ZIP/CITY system is based on 5-digit ZIP Codes. You can retrieve city names, state names, county names, area codes, time zones, latitude, longitude, and more just by knowing the ZIP Code. We supply information on all the latest area code changes. A nearest dealer locator function is also included. ZIP/CITY includes software, data, monthly updates, and unlimited support. The cost is $495 per year.

    PER/ZIP4 is a sophisticated CASS certified postal coding system for assigning ZIP Codes, ZIP+4, carrier route, and delivery point codes. PER/ZIP4 also provides county names and FIPS codes. PER/ZIP4 can be used interactively, in batch, and with callable programs. PER/ZIP4 includes software, data, monthly updates, and unlimited support. The cost is $3,900 for the first year, and $1,950 for renewal.

    Just call us and we’ll arrange for 30 days FREE use of either ZIP/CITY or PER/ZIP4.

    WorksRight Software, Inc.
    Phone: 601-856-8337
    Fax: 601-856-9432
    Email: software@worksright.com
    Website: www.worksright.com

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Sponsored Links

    BCD:  Recorded Webinar: How to mine IBM i report data & extract it to Excel
    Northeast User Groups Conference:  23nd Annual Conference, April 22 - 24, Framingham, MA
    COMMON:  Join us at the 2013 Conference & Expo, April 7 -10 in Austin, TX

    More IT Jungle Resources:

    System i PTF Guide: Weekly PTF Updates
    IBM i Events Calendar: National Conferences, Local Events, and Webinars
    Breaking News: News Hot Off The Press
    TPM @ The Reg: More News From ITJ EIC Timothy Prickett Morgan

    Attachmate Delivers Web Client for SFTP Tool Midrange Power7+ Servers: The IBM Sales Pitch

    Leave a Reply Cancel reply

Volume 13, Number 7 -- April 3, 2013
THIS ISSUE SPONSORED BY:

WorksRight Software
SEQUEL Software
Northeast User Groups Conference

Table of Contents

  • Bringing Home The Data
  • Updating Through A Join With SQL, Take Three
  • Admin Alert: A Checklist For Monitoring Your IBM i Environment

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • Public Preview For Watson Code Assistant for i Available Soon
  • COMMON Youth Movement Continues at POWERUp 2025
  • IBM Preserves Memory Investments Across Power10 And Power11
  • Eradani Uses AI For New EDI And API Service
  • Picking Apart IBM’s $150 Billion In US Manufacturing And R&D
  • FAX/400 And CICS For i Are Dead. What Will IBM Kill Next?
  • Fresche Overhauls X-Analysis With Web UI, AI Smarts
  • Is It Time To Add The Rust Programming Language To IBM i?
  • Is IBM Going To Raise Prices On Power10 Expert Care?
  • IBM i PTF Guide, Volume 27, Number 20

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle