• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Admin Alert: Seven Things You Should Be Monitoring On Your System

    October 3, 2012 Joe Hertvik

    Last year, I wrote a two-part article outlining a basic plan for monitoring and answering IBM i error messages. But while it’s important to detect and answer error messages that require a response right now, it’s equally important to detect developing situations that will cause system problems if left alone. This week, I’ll discuss seven other things besides error messages that you should be monitoring for on your IBM i systems.

    The Basics

    For this article, let’s assume you are already using a system monitoring product to send out pager, email, or text alerts whenever an error message shows up on your system. You can perform this monitoring by using one of the more common IBM i system monitoring products, including:

    • Bytware MessengerConsole
    • CCSS QSystem Monitor
    • Halcyon Software IBM i (i5/OS, System i, iSeries, AS/400) Monitoring, Scheduling & Automation Software
    • Help/Systems Robot/ALERT
    • SEA absMessage

    If you’re new to system monitoring, check out part 1 and part 2 of my earlier articles on i/OS error monitoring and response strategies for a primer on setting up basic monitoring.

    Once your monitoring system is set up, you’ll want to go beyond basic error message monitoring and use your monitoring software to look for developing problems on your IBM i partitions. You’ll want to find silent, non-obvious issues that can cause disruptions if not resolved.

    To detect silent trouble in your system, here are seven of the most common situations that you should be monitoring for on your IBM i partition.

    1. Long-running batch jobs.
    2. Excessive number of jobs in job queues.
    3. Jobs that should be running, but aren’t.
    4. Critical lines, controllers, or devices that aren’t active.
    5. IP interfaces not active.
    6. Interactive users using a large amount of CPU.
    7. Interactive response time spiking.

    I’ll look at each situation in turn, and explain why you should be monitoring for them.

    Situation #1: Long running batch jobs.

    It’s a good idea to set up monitors to look for jobs that are running much longer than usual. In my shop, I set up my job performance monitors to notify a tech on duty when any of the following situations occur.

    • Jobs that are running more than 30 minutes. I use this sparingly, looking for specific critical jobs that usually finish in a short amount of time, rather than having it monitor all jobs on the system that have been running more than 30 minutes.
    • Jobs that are running more than four hours. There are some legitimate jobs such as file reorganizations, that may trigger this monitor while still running within guidelines. But overall, if any batch job is running more than four hours, that job may be looping or running into another problem.
    • Job that are running more than eight hours. This is a definite red alert and should be investigated.

    Long running batch jobs by themselves aren’t necessarily a problem. But they can represent an unusual situation and should be investigated to insure that everything is running correctly on your system.

    Situation #2: Excessive number of jobs in a job queue.

    If you have single-threaded job queues feeding work to a batch subsystem, excessive jobs waiting to run may be an indicator a problem is occurring. I like to set my paging system to alert me when more than seven or eight jobs are lined up in a batch job queue waiting to run. It may indicate that the jobs are stuck behind a long-running batch job.

    Situation #3: Jobs that should be running, but aren’t.

    For our nightly end-of-day batch jobs, we use monitoring to alert us when a job has not started by its’ usual start time. We have a stable schedule and if a critical job doesn’t kick off by its’ target time, it may be an indication a system problem is preventing it from running. These monitors are a good heads-up that the system may not be functioning the way it’s supposed to.

    If you’re planning on monitoring for late jobs, beware of the following issues when setting up timing monitors.

    1. Put a little play in your start job monitoring. If your target job usually starts running at 1:30 a.m., you may want to set your timing monitor to go off at 2:15 a.m. or 2:30 a.m. if the job hasn’t started by then. This is because there may be a valid reason the job didn’t kick off on time, and you don’t want to set off a false alarm. Give your system some time to right itself before you wake up a technician.
    2. Be careful setting up timing monitors for jobs that usually run between 11 p.m. and midnight. A delay could push these jobs to run into the next day and that might set off a false alarm. IMHO, it’s best to stay away from start time monitoring for jobs that usually begin close to midnight.

    Situation #4: Critical lines, controllers, or devices that aren’t active

    If your monitoring software allows it, monitor for critical devices such as printers, controllers, or other devices being off-line. You can easily do this by monitoring for when these devices are varied off, not active, or in recovery pending state. This can give you an early warning that a critical system resource such as a shipping label printer or an Ethernet line, is not available.

    Situation #5: IP interfaces not active.

    Most monitoring packages have an option to ping an IP address and send out an alert if that address doesn’t answer. You can use this to test whether your partitions’ IP addresses are working, whether companion servers are on the network, or whether other IBM i partitions are active. Be careful with these monitors, however, as you can sometimes get a false alarm on a ping test. You may want to fiddle with your ping monitor parameters, such as number of pings to send or the wait time for a return response, to cut down on the number of false alarms.

    Situation #6: Interactive users using a large amount of CPU.

    Interactive CPU monitors can help you determine when an interactive job is experiencing a problem or when a user is doing something that he’s not supposed to (like running batch work in QINTER when he should be submitting it to batch). I like to set up a monitor to detect when an interactive job is using more than 25 percent of the available CPU. I find I get too many false alarms if I set the monitor to detect jobs that are using a lower value than 25percent CPU.

    Situation #7: Interactive response time spiking.

    Trouble can sometimes be detected when interactive users start experiencing increased response time. This may indicate a situation where there’s a runaway job on your system that’s taking away necessary resources from other jobs in the system. Looking for these jobs can help alert you to a developing situation.

    More Than Just Error Messages

    There’s more to effective system monitoring than just looking for error messages. Once you get your monitoring software in place, be sure to start creating effective system monitoring that can detect developing problems, even when it doesn’t look like your system has an issue.

    Follow Me On My Blog, On Twitter, And On LinkedIn

    Check out my blog at joehertvik.com, where I focus on computer administration and news (especially IBM i); vendor, marketing, and tech writing news and materials; and whatever else he come across.

    You can also follow me on Twitter @JoeHertvik and on LinkedIn.

    Joe Hertvik is the owner of Hertvik Business Services, a service company that provides written marketing content and presentation services for the computer industry, including white papers, case studies, and other marketing material. Email Joe for a free quote for any upcoming projects. He also runs a data center for two companies outside Chicago. Joe is a contributing editor for IT Jungle and has written the Admin Alert column since 2002.

    RELATED STORIES

    Basic i/OS Error Monitoring and Response, Part 2

    Basic i/OS Error Monitoring and Response, Part 1



                         Post this story to del.icio.us
                   Post this story to Digg
        Post this story to Slashdot

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags:

    Sponsored by
    Rocket Software

    Unlock the full potential of your data with Rocket Software. Our scalable solutions deliver AI-driven insights, seamless integration, and advanced compliance tools to transform your business. Discover how you can simplify data management, boost efficiency, and drive informed decisions.

    Learn more today.

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Sponsored Links

    Sirius Computer Solutions:  A comprehensive, cost-effective cloud solution for IBM i users
    Tributary Systems:  Storage Director® makes your tape work better. FREE Webinar and PDF
    System i Developer:  RPG & DB2 Summit, Oct 23-25 in Minneapolis. Register by Oct 12 to save $100!

    IT Jungle Store Top Book Picks

    BACK IN STOCK: Easy Steps to Internet Programming for System i: List Price, $49.95

    The iSeries Express Web Implementer's Guide: List Price, $49.95
    The iSeries Pocket Database Guide: List Price, $59
    The iSeries Pocket SQL Guide: List Price, $59
    The iSeries Pocket WebFacing Primer: List Price, $39
    Migrating to WebSphere Express for iSeries: List Price, $49
    Getting Started with WebSphere Express for iSeries: List Price, $49
    The All-Everything Operating System: List Price, $35
    The Best Joomla! Tutorial Ever!: List Price, $19.95

    Some Things To Ponder On The Impending Power7+ Era Power7+ Launches In Multi-Chassis Power 770+ And 780+ Systems

    Leave a Reply Cancel reply

Volume 12, Number 24 -- October 3, 2012
THIS ISSUE SPONSORED BY:

Bug Busters Software Engineering
WorksRight Software
Connectria Hosting

Table of Contents

  • Debugging Server Jobs In Green Screen
  • Alternatives To SQL Literals
  • Admin Alert: Seven Things You Should Be Monitoring On Your System

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • Meet The Next Gen Of IBMers Helping To Build IBM i
  • Looks Like IBM Is Building A Linux-Like PASE For IBM i After All
  • Will Independent IBM i Clouds Survive PowerVS?
  • Now, IBM Is Jacking Up Hardware Maintenance Prices
  • IBM i PTF Guide, Volume 27, Number 24
  • Big Blue Raises IBM i License Transfer Fees, Other Prices
  • Keep The IBM i Youth Movement Going With More Training, Better Tools
  • Remain Begins Migrating DevOps Tools To VS Code
  • IBM Readies LTO-10 Tape Drives And Libraries
  • IBM i PTF Guide, Volume 27, Number 23

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle