• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Admin Alert: Seven Things You Should Be Monitoring On Your System

    October 3, 2012 Joe Hertvik

    Last year, I wrote a two-part article outlining a basic plan for monitoring and answering IBM i error messages. But while it’s important to detect and answer error messages that require a response right now, it’s equally important to detect developing situations that will cause system problems if left alone. This week, I’ll discuss seven other things besides error messages that you should be monitoring for on your IBM i systems.

    The Basics

    For this article, let’s assume you are already using a system monitoring product to send out pager, email, or text alerts whenever an error message shows up on your system. You can perform this monitoring by using one of the more common IBM i system monitoring products, including:

    • Bytware MessengerConsole
    • CCSS QSystem Monitor
    • Halcyon Software IBM i (i5/OS, System i, iSeries, AS/400) Monitoring, Scheduling & Automation Software
    • Help/Systems Robot/ALERT
    • SEA absMessage

    If you’re new to system monitoring, check out part 1 and part 2 of my earlier articles on i/OS error monitoring and response strategies for a primer on setting up basic monitoring.

    Once your monitoring system is set up, you’ll want to go beyond basic error message monitoring and use your monitoring software to look for developing problems on your IBM i partitions. You’ll want to find silent, non-obvious issues that can cause disruptions if not resolved.

    To detect silent trouble in your system, here are seven of the most common situations that you should be monitoring for on your IBM i partition.

    1. Long-running batch jobs.
    2. Excessive number of jobs in job queues.
    3. Jobs that should be running, but aren’t.
    4. Critical lines, controllers, or devices that aren’t active.
    5. IP interfaces not active.
    6. Interactive users using a large amount of CPU.
    7. Interactive response time spiking.

    I’ll look at each situation in turn, and explain why you should be monitoring for them.

    Situation #1: Long running batch jobs.

    It’s a good idea to set up monitors to look for jobs that are running much longer than usual. In my shop, I set up my job performance monitors to notify a tech on duty when any of the following situations occur.

    • Jobs that are running more than 30 minutes. I use this sparingly, looking for specific critical jobs that usually finish in a short amount of time, rather than having it monitor all jobs on the system that have been running more than 30 minutes.
    • Jobs that are running more than four hours. There are some legitimate jobs such as file reorganizations, that may trigger this monitor while still running within guidelines. But overall, if any batch job is running more than four hours, that job may be looping or running into another problem.
    • Job that are running more than eight hours. This is a definite red alert and should be investigated.

    Long running batch jobs by themselves aren’t necessarily a problem. But they can represent an unusual situation and should be investigated to insure that everything is running correctly on your system.

    Situation #2: Excessive number of jobs in a job queue.

    If you have single-threaded job queues feeding work to a batch subsystem, excessive jobs waiting to run may be an indicator a problem is occurring. I like to set my paging system to alert me when more than seven or eight jobs are lined up in a batch job queue waiting to run. It may indicate that the jobs are stuck behind a long-running batch job.

    Situation #3: Jobs that should be running, but aren’t.

    For our nightly end-of-day batch jobs, we use monitoring to alert us when a job has not started by its’ usual start time. We have a stable schedule and if a critical job doesn’t kick off by its’ target time, it may be an indication a system problem is preventing it from running. These monitors are a good heads-up that the system may not be functioning the way it’s supposed to.

    If you’re planning on monitoring for late jobs, beware of the following issues when setting up timing monitors.

    1. Put a little play in your start job monitoring. If your target job usually starts running at 1:30 a.m., you may want to set your timing monitor to go off at 2:15 a.m. or 2:30 a.m. if the job hasn’t started by then. This is because there may be a valid reason the job didn’t kick off on time, and you don’t want to set off a false alarm. Give your system some time to right itself before you wake up a technician.
    2. Be careful setting up timing monitors for jobs that usually run between 11 p.m. and midnight. A delay could push these jobs to run into the next day and that might set off a false alarm. IMHO, it’s best to stay away from start time monitoring for jobs that usually begin close to midnight.

    Situation #4: Critical lines, controllers, or devices that aren’t active

    If your monitoring software allows it, monitor for critical devices such as printers, controllers, or other devices being off-line. You can easily do this by monitoring for when these devices are varied off, not active, or in recovery pending state. This can give you an early warning that a critical system resource such as a shipping label printer or an Ethernet line, is not available.

    Situation #5: IP interfaces not active.

    Most monitoring packages have an option to ping an IP address and send out an alert if that address doesn’t answer. You can use this to test whether your partitions’ IP addresses are working, whether companion servers are on the network, or whether other IBM i partitions are active. Be careful with these monitors, however, as you can sometimes get a false alarm on a ping test. You may want to fiddle with your ping monitor parameters, such as number of pings to send or the wait time for a return response, to cut down on the number of false alarms.

    Situation #6: Interactive users using a large amount of CPU.

    Interactive CPU monitors can help you determine when an interactive job is experiencing a problem or when a user is doing something that he’s not supposed to (like running batch work in QINTER when he should be submitting it to batch). I like to set up a monitor to detect when an interactive job is using more than 25 percent of the available CPU. I find I get too many false alarms if I set the monitor to detect jobs that are using a lower value than 25percent CPU.

    Situation #7: Interactive response time spiking.

    Trouble can sometimes be detected when interactive users start experiencing increased response time. This may indicate a situation where there’s a runaway job on your system that’s taking away necessary resources from other jobs in the system. Looking for these jobs can help alert you to a developing situation.

    More Than Just Error Messages

    There’s more to effective system monitoring than just looking for error messages. Once you get your monitoring software in place, be sure to start creating effective system monitoring that can detect developing problems, even when it doesn’t look like your system has an issue.

    Follow Me On My Blog, On Twitter, And On LinkedIn

    Check out my blog at joehertvik.com, where I focus on computer administration and news (especially IBM i); vendor, marketing, and tech writing news and materials; and whatever else he come across.

    You can also follow me on Twitter @JoeHertvik and on LinkedIn.

    Joe Hertvik is the owner of Hertvik Business Services, a service company that provides written marketing content and presentation services for the computer industry, including white papers, case studies, and other marketing material. Email Joe for a free quote for any upcoming projects. He also runs a data center for two companies outside Chicago. Joe is a contributing editor for IT Jungle and has written the Admin Alert column since 2002.

    RELATED STORIES

    Basic i/OS Error Monitoring and Response, Part 2

    Basic i/OS Error Monitoring and Response, Part 1



                         Post this story to del.icio.us
                   Post this story to Digg
        Post this story to Slashdot

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags:

    Sponsored by
    WorksRight Software

    Do you need area code information?
    Do you need ZIP Code information?
    Do you need ZIP+4 information?
    Do you need city name information?
    Do you need county information?
    Do you need a nearest dealer locator system?

    We can HELP! We have affordable AS/400 software and data to do all of the above. Whether you need a simple city name retrieval system or a sophisticated CASS postal coding system, we have it for you!

    The ZIP/CITY system is based on 5-digit ZIP Codes. You can retrieve city names, state names, county names, area codes, time zones, latitude, longitude, and more just by knowing the ZIP Code. We supply information on all the latest area code changes. A nearest dealer locator function is also included. ZIP/CITY includes software, data, monthly updates, and unlimited support. The cost is $495 per year.

    PER/ZIP4 is a sophisticated CASS certified postal coding system for assigning ZIP Codes, ZIP+4, carrier route, and delivery point codes. PER/ZIP4 also provides county names and FIPS codes. PER/ZIP4 can be used interactively, in batch, and with callable programs. PER/ZIP4 includes software, data, monthly updates, and unlimited support. The cost is $3,900 for the first year, and $1,950 for renewal.

    Just call us and we’ll arrange for 30 days FREE use of either ZIP/CITY or PER/ZIP4.

    WorksRight Software, Inc.
    Phone: 601-856-8337
    Fax: 601-856-9432
    Email: software@worksright.com
    Website: www.worksright.com

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Sponsored Links

    Sirius Computer Solutions:  A comprehensive, cost-effective cloud solution for IBM i users
    Tributary Systems:  Storage Director® makes your tape work better. FREE Webinar and PDF
    System i Developer:  RPG & DB2 Summit, Oct 23-25 in Minneapolis. Register by Oct 12 to save $100!

    IT Jungle Store Top Book Picks

    BACK IN STOCK: Easy Steps to Internet Programming for System i: List Price, $49.95

    The iSeries Express Web Implementer's Guide: List Price, $49.95
    The iSeries Pocket Database Guide: List Price, $59
    The iSeries Pocket SQL Guide: List Price, $59
    The iSeries Pocket WebFacing Primer: List Price, $39
    Migrating to WebSphere Express for iSeries: List Price, $49
    Getting Started with WebSphere Express for iSeries: List Price, $49
    The All-Everything Operating System: List Price, $35
    The Best Joomla! Tutorial Ever!: List Price, $19.95

    Some Things To Ponder On The Impending Power7+ Era Power7+ Launches In Multi-Chassis Power 770+ And 780+ Systems

    Leave a Reply Cancel reply

Volume 12, Number 24 -- October 3, 2012
THIS ISSUE SPONSORED BY:

Bug Busters Software Engineering
WorksRight Software
Connectria Hosting

Table of Contents

  • Debugging Server Jobs In Green Screen
  • Alternatives To SQL Literals
  • Admin Alert: Seven Things You Should Be Monitoring On Your System

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • Public Preview For Watson Code Assistant for i Available Soon
  • COMMON Youth Movement Continues at POWERUp 2025
  • IBM Preserves Memory Investments Across Power10 And Power11
  • Eradani Uses AI For New EDI And API Service
  • Picking Apart IBM’s $150 Billion In US Manufacturing And R&D
  • FAX/400 And CICS For i Are Dead. What Will IBM Kill Next?
  • Fresche Overhauls X-Analysis With Web UI, AI Smarts
  • Is It Time To Add The Rust Programming Language To IBM i?
  • Is IBM Going To Raise Prices On Power10 Expert Care?
  • IBM i PTF Guide, Volume 27, Number 20

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle