• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Admin Alert: Looking for i5/OS Trouble, Part II

    January 14, 2009 Joe Hertvik

    Last week, I discussed the best ways to automatically monitor iSeries, System i, and Power i systems for hidden signs of trouble, and I focused on monitoring the QSYSMSG and QSYSOPR message queues for developing problems. This week, I’m changing direction to discuss several specific i5/OS monitoring situations that can also help you detect system problems as they occur.

    Trouble-Finding Tools

    As I mentioned last week, the following are the best places to look for trouble in your OS/400, i5/OS, and i operating systems:

    • The QSYSMSG message queue
    • The System Operator (QSYSOPR) message queue
    • Disk drive statistics
    • Active job entries
    • Job queues and output queues

    I covered QSYSMSG and QSYSOPR monitoring last time, so today I’ll look at what to monitor for in the other areas of my list. IMHO, the best way to monitor for and be alerted to problem situations on your partition is to use an automated system monitoring tools, such as Bytware MessengerConsole, Help/Systems’ Robot/ALERT, or CCSS’ QSystem Monitor package. Each of these products can look for system problems as they occur and let you know via pages, emails, text messages, etc., when something is going awry. They also allow you to set up custom monitors that are specific to your own environment. While it’s true that you can build your own monitoring system without purchasing a package, I passionately believe that it’s generally cheaper in the long run and more effective to build a monitoring platform based on the established products listed here.

    When I’m using one of these tools, these are the items I am most likely to monitor for.

    Sudden Changes In Disk Storage Usage

    In addition to looking for the serious storage overflow conditions that I talked about last time, you can also use a monitoring package to detect more subtle shifts in storage usage. On my systems, I usually set up a monitor to alert the staff when there are spikes in disk drive activity.

    Disk utilization spike detection (such as when ASP disk utilization goes up 5 percent in an hour) is valuable because it can indicate unusual activity on the system, which may require someone to investigate. The disk space may be going up because the program is in a loop and it’s chewing up drive space. Or you could have a job gone wild that’s spitting out thousands of joblog entries. Either way, early detection of large changes in drive usage can be valuable in locating developing problems.

    Where’s the Damage?

    It’s also helpful to set up your monitoring tool to look for damaged objects as they occur. A damaged object can not only corrupt other data, it can stop objects from being backed up. Damaged objects may also prevent an object from being correctly replicated to a Capacity BackUp (CPU) system.

    Whenever the system detects a damaged object, it sends a system message out to the QSYSOPR message queue with a message ID between CPF81xx and CPF8299. If you set up your monitoring system to alert your staff when one of these messages enters the QSYSOPR message queue, you will immediately know when a damaged object has been detected and you can take measures to find and fix the object.

    Jobs and Subsystems That Must Always Be Running

    For a typical environment, certain subsystems and jobs must always be running or severe problems can occur with your data. Examples might include the QINTER subsystem, as well as server-based subsystems and jobs that provide critical round-the-clock functions (such as validating credit cards). If you have an automated monitoring program, I recommend that you find all your critical jobs and subsystems and set up monitors to alert the staff when any of them are not running. By setting up these monitors, your staff will immediately know when a critical applications job has stopped working.

    Problems with Interactive Jobs

    When monitoring interactive jobs, I usually set up an automated monitor to detect the following two conditions.

    1. An interactive job that is using more than 30 percent of the available CPU–This could be indicative of an interactive job that has disconnected from the system (as sometimes happens with scanners and other devices that connect to the operating system through wireless connections). This may also indicate an interactive job that has problems or that is running in a loop.

    2. Interactive jobs running at less than priority 20–This monitor might detect a case where a user has lowered his run priority in order to process his job faster.

    Problems with Batch Jobs

    For batch jobs, the following situations might indicate problems with the system and should be monitored for.

    1. Batch jobs running at priority 20 or less–This may indicate a problem in batch submission, scheduling, or even a user who doesn’t appreciatewhy batch jobs should never run at interactive priorities.

    2. Long-running batch jobs–Depending on how long it takes critical processes to run in your shop, you may want to set up monitors that alert you when a batch job runs longer than a set amount of time. In my shop, one of our monitors alerts staff when application-oriented batch jobs run longer than one hour. Long-running batch jobs can indicate a programming issue, an inquiry job that is processing all the records in an absurdly large number of records, or a poorly written query running out of control.

    When monitoring for long-running batch jobs, however, you have to be careful to only monitor finitely running application-oriented batch jobs instead of nearly infinite running server jobs. Application-oriented jobs usually run in batch subsystems, such as QBATCH or QPGMR, and complete in a relatively short amount of time. Server-oriented jobs usually run in their own subsystem or a system subsystem, such as QSYSWRK, and they usually remain active as long as the machine is running. To effectively set up this type of monitor, you have to exclude server jobs from examination.

    Backed Up Job Queues

    Many batch job queues are single-threaded, meaning that the job queue’s associated subsystem will only accept one job at a time for processing from that queue. You can sometimes detect long-running jobs or jobs with error messages by looking at the number of jobs waiting to be run in a single threaded job queue. If the job queue has an unusually high number of jobs waiting to run, it may indicate a long-running job was submitted from that job queue or the current job is stuck waiting for a reply to an error message. Like the long-running job monitors, a judiciously used job queue monitor can also spotlight potential trouble.

    Timing Issues

    Depending on how you process work, you may have set up one or more job streams that always start and end at certain times of the day. A good example is a backup job stream where the backup generally starts at midnight and can usually be counted on to complete at 2:00 a.m. You could set up a monitor to send an alert if the back up job is still running at 2:30 a.m., which may indicate a problem with the backup. These types of monitors require a lot of analysis and awareness of how your job streams run, but if the monitor is set up correctly, it can show you where a problem is occurring before it turns into a full-blown crisis.

    Excessive Numbers of Spooled Files In An Output Queue

    Similar to a backed up job queue, an output queue with too many spooled files waiting to be printed may specify a printer problem. Many packages offer output queue monitors where an alert can be sent out if there are a high number of released spooled files waiting to print from a queue. This situation may specify a printer that is out of paper, not running, or waiting for someone to answer a printer message.

    Places to Start

    During the last two weeks, I’ve offered up a number of situations where you can monitor for real and potential problems on your iSeries, System i, and Power i machines. These ideas are meant to provide suggestions for items that you may want to monitor for on your system. System monitoring isn’t a static process that you set up once and never touch again. Rather, monitors need tending to in order to weed out unreliable situations and to add newer items that should be monitored. The key is to get started with a few good ideas and then adjust your monitors to provide the maximum benefit to your system.

    RELATED STORIES

    Looking for i5/OS Trouble, Part I

    When Batch Meets Interactive



                         Post this story to del.icio.us
                   Post this story to Digg
        Post this story to Slashdot

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags:

    Sponsored by
    WorksRight Software

    Do you need area code information?
    Do you need ZIP Code information?
    Do you need ZIP+4 information?
    Do you need city name information?
    Do you need county information?
    Do you need a nearest dealer locator system?

    We can HELP! We have affordable AS/400 software and data to do all of the above. Whether you need a simple city name retrieval system or a sophisticated CASS postal coding system, we have it for you!

    The ZIP/CITY system is based on 5-digit ZIP Codes. You can retrieve city names, state names, county names, area codes, time zones, latitude, longitude, and more just by knowing the ZIP Code. We supply information on all the latest area code changes. A nearest dealer locator function is also included. ZIP/CITY includes software, data, monthly updates, and unlimited support. The cost is $495 per year.

    PER/ZIP4 is a sophisticated CASS certified postal coding system for assigning ZIP Codes, ZIP+4, carrier route, and delivery point codes. PER/ZIP4 also provides county names and FIPS codes. PER/ZIP4 can be used interactively, in batch, and with callable programs. PER/ZIP4 includes software, data, monthly updates, and unlimited support. The cost is $3,900 for the first year, and $1,950 for renewal.

    Just call us and we’ll arrange for 30 days FREE use of either ZIP/CITY or PER/ZIP4.

    WorksRight Software, Inc.
    Phone: 601-856-8337
    Fax: 601-856-9432
    Email: software@worksright.com
    Website: www.worksright.com

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Sponsored Links

    SkyView Partners:  Security software with a measurable return on investment
    ARCAD Software:  FREE Webinar, ITIL Best Practices with Philippe Magne, January 28
    COMMON:  Join us at the 2009 annual meeting and expo, April 26-30, Reno, Nevada

    IT Jungle Store Top Book Picks

    Easy Steps to Internet Programming for AS/400, iSeries, and System i: List Price, $49.95
    Getting Started with PHP for i5/OS: List Price, $59.95
    The System i RPG & RPG IV Tutorial and Lab Exercises: List Price, $59.95
    The System i Pocket RPG & RPG IV Guide: List Price, $69.95
    The iSeries Pocket Database Guide: List Price, $59.00
    The iSeries Pocket Developers' Guide: List Price, $59.00
    The iSeries Pocket SQL Guide: List Price, $59.00
    The iSeries Pocket Query Guide: List Price, $49.00
    The iSeries Pocket WebFacing Primer: List Price, $39.00
    Migrating to WebSphere Express for iSeries: List Price, $49.00
    iSeries Express Web Implementer's Guide: List Price, $59.00
    Getting Started with WebSphere Development Studio for iSeries: List Price, $79.95
    Getting Started With WebSphere Development Studio Client for iSeries: List Price, $89.00
    Getting Started with WebSphere Express for iSeries: List Price, $49.00
    WebFacing Application Design and Development Guide: List Price, $55.00
    Can the AS/400 Survive IBM?: List Price, $49.00
    The All-Everything Machine: List Price, $29.95
    Chip Wars: List Price, $29.95

    IBM Increases Price on Tape Cartridges, Printers AS/400 LUG: Friends in High Places

    Leave a Reply Cancel reply

Volume 9, Number 2 -- January 14, 2009
THIS ISSUE SPONSORED BY:

ProData Computer Services
Vision Solutions
Guild Companies

Table of Contents

  • Bypass Locked Records in SQL Queries
  • Data Queues vs. MQSeries
  • Admin Alert: Looking for i5/OS Trouble, Part II

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • Public Preview For Watson Code Assistant for i Available Soon
  • COMMON Youth Movement Continues at POWERUp 2025
  • IBM Preserves Memory Investments Across Power10 And Power11
  • Eradani Uses AI For New EDI And API Service
  • Picking Apart IBM’s $150 Billion In US Manufacturing And R&D
  • FAX/400 And CICS For i Are Dead. What Will IBM Kill Next?
  • Fresche Overhauls X-Analysis With Web UI, AI Smarts
  • Is It Time To Add The Rust Programming Language To IBM i?
  • Is IBM Going To Raise Prices On Power10 Expert Care?
  • IBM i PTF Guide, Volume 27, Number 20

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle