fhg
Volume 9, Number 1 -- January 7, 2009

Admin Alert: Looking for i5/OS Trouble, Part I

Published: January 7, 2009

by Joe Hertvik

I love stories with Cassandra characters, the slightly crazed player who accurately foresees oncoming doom, only to be mocked or ignored. Like Cassandra, i5/OS systems can also see events that portend system problems, clear omens that can easily be missed or ignored. This week and next, I'll discuss avoiding doom by monitoring several i5/OS situations that should be checked early and often. Ignore these warning at your own peril!!!

Finding Doom On Your Local i Partition

Omen hunting on iSeries, System i, and Power i machines is a lot easier if you know where to look. I've generally found the following locations to be the best place to find trouble.

  • The QSYSMSG message queue
  • The System Operator message queue (QSYSOPR)
  • Disk drive statistics
  • Active job entries
  • Job queues and output queues

This information forms the raw material for finding developing (or developed) problems on your system. The trouble is that it's an incredible hassle to manually monitor these areas for developing issues. That's why I highly recommend using a system monitoring tool such as Bytware's MessengerConsole, Help/Systems' Robot/ALERT, or CCSS' QSystem Monitor package. All of these tools can automate problem monitoring and immediately alert you via email, page, etc., when an issue occurs. You can also set up custom monitors to watch the system for situations that are specific to your own environment.

Now that we know which general areas to watch, let's look at the specifics of what we should be looking for.

QSYSMSG Monitoring

In the i5/OS, i, and OS/400 operating systems, IBM allows you to create a critical message queue called QSYSMSG. QSYSMSG does not come preconfigured with your operating system. It is an optional message queue that must be created in the QSYS library. According to IBM you can create QSYSMSG by using the following Create Message Queue (CRTMSG) command.

CRTMSGQ MSGQ(QSYS/QSYSMSG) TEXT
('Optional MSGQ to receive specific system messages')

Once QSYSMSG exists on your system, the operating system will automatically copy critical system messages directly to that queue for analysis. In general, only system messages are sent to QSYSMSG. More routine messages such as programming and printer alert messages are usually sent to the System Operator message queue, QSYSOPR.

You can deal with QSYSMSG messages in several different ways. First, you can set your monitoring software to alert you whenever any new critical messages are sent to QSYSMSG. When monitoring QSYSMSG, you may also want to create filters to exclude certain messages from triggering alerts. I created QSYSMSG on a development system and it started adding the following message each time a user disabled his sign on device by typing in the wrong password three times.

CPF1397 - Subsystem QINTER varied off work station 
device_name for user user_name.

Since I usually only look for urgent messages and we have other procedures for handling this situation, I set up my monitoring software to ignore this message. As you set up your QSYSMSG alert infrastructure, you may also choose to ignore less urgent messages.

The second way to monitor the queue is to set up a program to read new QSYSMSG messages and to automatically perform certain actions when a new message arrives. A third way is to monitor QSYSMSG in break mode on either the system console or on a designated machine that someone is always watching. You would do this by running the following Change Message Queue (CHGMSGQ) command on the workstation that you will be monitoring the queue from.

CHGMSGQ MSGQ(QSYSMSG) DLVRY(*BREAK)

Whenever a new message appears in the queue, it will pop up (break) on the display session where the CHGMSGQ command was run. The only downside to having QSYSMSG commands pop up in break mode is that this solution is inherent on someone always being near the breaking terminal when the message occurs. If the problem happens in off-hours or when no one is near the monitoring terminal, the message will be missed.

QSYSOPR Monitoring

The QSYSOPR message queue is trickier to monitor than QSYSMSG because it contains generic messages alongside any critical messages that may occur. This makes it tougher to either write a program to process new QSYSOPR messages or to put QSYSOPR in break mode, because you will get a lot of irrelevant messages along with the ones you need to know about. The other issue is that since many shops no longer have dedicated system operators, it's not as feasible as it used to be to have someone watch QSYSOPR all day.

Like monitoring the QSYSMSG message queue, your best bet is to use a system monitoring tool to look for critical messages. My starting recommendations for automatically monitoring the QSYSOPR message queue to alert you to problem situations are the following:

1. Monitor for any inquiry messages that require someone to type in a response before the program will continue. Inquiry messages require an operator to enter a reply (such as 'C', 'D', 'I', or 'R') when a job needs specific information to keep processing. It goes without saying that these messages need to be attended to as soon as possible.

2. Filter out any inquiry messages that are associated with the QSPLJOB user. These messages are for items such as loading different form types on a printer or aligning a form. Form alignment and loading messages are fairly common and they are usually handled by the user who is working with the printer, not by staff that are monitoring for system problems.

3. Monitor for any QSYSOPR messages that have a severity code of 80 or above (excluding messages generated by the QSPLJOB user). Severity code 80 generally distinguishes messages that must be dealt with immediately or messages that signify that something is going wrong with the system.

4. Monitor for jobs that did not complete normally. Monitor for any message that indicates a job did not complete successfully. Some of these messages include:

CPF1240 - Job &3/&2/&1 ended abnormally
CPC1234 - Job &3/&2/&1 ended from job queue by user &4
CPC1125 - Job &3/&2/&1 was ended by user &4
CPC1126 - Job &3/&2/&1 was ended by user &4

5. Monitor for serious storage conditions. Watch for the following storage error message:

CPF0907 - Serious storage condition may exist. Press HELP.

This message appears when the ASP storage threshold has been breached. The ASP threshold is a user-based setting for each storage pool. It indicates what percentage of ASP storage must be filled before you considered the ASP's storage to be filled. The CPF0907 message is sent out after ASP system storage exceeds the threshold value. For more information on ASP threshold values, see my previous article on protecting your system from critical storage errors.

There are other QSYSOPR messages that you can monitor for, but these particular situations will cover many of the general problems that occur in most shops and get you started on automated monitoring. As you become more comfortable with the tools and the processes involved, you can add more monitors to your system. I'll also review a few other messages to monitor for next issue.

Coming Next Issue

Besides monitoring QSYSMSG and QSYSOPR, next week I'll look at some other critical areas to monitor for on an i5/OS partition, including:

  • Sudden changes in disk space used
  • Damaged objects
  • Problems with running interactive jobs
  • Problems with running batch jobs, including long-running batch jobs
  • Ensuring that critical subsystems are running
  • Ensuring that critical server jobs are running
  • Problems with spooled file printouts

See you in seven.


RELATED STORY

Protecting Your System from Critical Storage Errors



                     Post this story to del.icio.us
               Post this story to Digg
    Post this story to Slashdot


Sponsored By
MKS

Are you using WDSC today? Moving to RDi tomorrow?

Would you like a more efficient way to work - a way to see all development tasks and change requests directly within your Eclipse-based development environment?

With MKS Integrity for IBM i, MKS offers the most advanced plug-in for WDSC and RDi available today. The plug-in brings requirements management, task management, software change and configuration management and the ability to deploy, directly to WDSC and RDi, helping developers be more productive and giving managers the process control and audit trail they are seeking to meet compliance and governance demands.

Developers can see tasks, update issues, run queries, check out code and deploy directly from within their IDE. All users get complete visibility of project requirements and changes as they occur. Stakeholders stay informed of project status throughout the software lifecycle ... and all of this from directly within WDSC and RDi!

If you are using WDSC or moving to RDi, let MKS demonstrate a superior way to do development - one that promotes productivity, efficiency and control.

Contact MKS today at 1-800-365-4406 or email info@mks.com.

Download a FREE White Paper:
From WDSC to RDi - Making Software Change Easier with MKS Integrity for IBM i


Senior Technical Editor: Ted Holt
Technical Editor: Joe Hertvik
Contributing Technical Editors: Edwin Earley, Brian Kelly, Michael Sansoterra
Publisher and Advertising Director: Jenny Thomas
Advertising Sales Representative: Kim Reed
Contact the Editors: To contact anyone on the IT Jungle Team
Go to our contacts page and send us a message.

Sponsored Links

New Generation Software:  Run IBM Query/400 from your Windows desktop!
Bug Busters Software Engineering:  High availability software that won't break the bank
COMMON:  Join us at the 2009 annual meeting and expo, April 26-30, Reno, Nevada


 

IT Jungle Store Top Book Picks

Easy Steps to Internet Programming for AS/400, iSeries, and System i: List Price, $49.95
Getting Started with PHP for i5/OS: List Price, $59.95
The System i RPG & RPG IV Tutorial and Lab Exercises: List Price, $59.95
The System i Pocket RPG & RPG IV Guide: List Price, $69.95
The iSeries Pocket Database Guide: List Price, $59.00
The iSeries Pocket Developers' Guide: List Price, $59.00
The iSeries Pocket SQL Guide: List Price, $59.00
The iSeries Pocket Query Guide: List Price, $49.00
The iSeries Pocket WebFacing Primer: List Price, $39.00
Migrating to WebSphere Express for iSeries: List Price, $49.00
iSeries Express Web Implementer's Guide: List Price, $59.00
Getting Started with WebSphere Development Studio for iSeries: List Price, $79.95
Getting Started With WebSphere Development Studio Client for iSeries: List Price, $89.00
Getting Started with WebSphere Express for iSeries: List Price, $49.00
WebFacing Application Design and Development Guide: List Price, $55.00
Can the AS/400 Survive IBM?: List Price, $49.00
The All-Everything Machine: List Price, $29.95
Chip Wars: List Price, $29.95


 
The Four Hundred
Now What?

IT Doing Better Than Other Careers in 2009

Strengthening Dollar Whacks Oracle's Second Fiscal Quarter

As I See It: The Rhythm of Things Unseen

Uncle Sam to Stop Buying Used IT Gear?

Four Hundred Stuff
Safestone Gives i Security Officers Greater Control

SPSS to Update ShowCase OLAP Server Technology

Rand McNally Keeps Truckers On the Go and In the Know

The Who's and What's of ESBs and eSBs

Mohawk Slashes Report Delivery Times, Thanks to Centerfield

Four Hundred Monitor
Four Hundred Monitor's
Full iSeries Events Calendar

System i PTF Guide
December 27, 2008: Volume 10, Number 52

December 20, 2008: Volume 10, Number 51

December 13, 2008: Volume 10, Number 50

December 6, 2008: Volume 10, Number 49

November 29, 2008: Volume 10, Number 48

November 22, 2008: Volume 10, Number 47

TPM at The Register
VCs still pump dough into green tech, renewable energy

Companies burying themselves in IT gear

Linux: this year's silver lining?

Server racket to slow in 2009?

Big Blue urged to open Notes and Domino

Data center budgets to stay course in 2009?

New Unisys CEO tightens the belt

Sun boosts OpenSolaris on Atom

VMware piles up next virtual stack for servers

Supermicro does micro server for SOHOs

Red Hat shakes off economic meltdown

UBS says IT spending in Europe, US to drop 2 per cent in 2009

Visionman launches Nehalem Core i7 servers

American IT staffing will not tank in Q1

THIS ISSUE SPONSORED BY:

Help/Systems
WorksRight Software
MKS


Printer Friendly Version


TABLE OF CONTENTS
Bypassing a Locked Record, Take Two

Be Specific When Updating With SQL Cursors

Admin Alert: Looking for i5/OS Trouble, Part I

Four Hundred Guru

BACK ISSUES

From the IT Jungle Forums
Insert via Java

iSeries Access for Web

Mimix installation and configuration docs

EDI Inovis Programmer - Heavy Duty Problem Solver - Anytime

Data Queues vs. MQ Series: Performance

Removing blanks from a CL Variable

XML




 
Subscription Information:
You can unsubscribe, change your email address, or sign up for any of IT Jungle's free e-newsletters through our Web site at http://www.itjungle.com/sub/subscribe.html.

Copyright © 1996-2009 Guild Companies, Inc. All Rights Reserved.
Guild Companies, Inc., 50 Park Terrace East, Suite 8F, New York, NY 10034

Privacy Statement