fhg
Volume 9, Number 11 -- March 25, 2009

Catching Robot/SCHEDULE Job Failures As They Happen

Published: March 25, 2009

Hey, Joe:

We're using Help/Systems Robot/SCHEDULE version 9.0 to schedule batch jobs on an i5/OS V5R4 machine, and it works great. However, I have a new requirement to alert my staff whenever a Robot job fails to run or terminates unexpectedly. Do you know of any utilities that can monitor and send out email pages when this happens?

--Bob


Given that Robot/SCHEDULE is one of the most widely used utilities in iSeries, System i, and Power i shops, I've run into this problem before and wrote a utility to do exactly what you're talking about. I've run this utility on Robot/SCHEDULE version 9.0 and version 10 installations and it works well under both versions without any changes. I offer my utility here as a service to the i5/OS community without any implied warranty or proof of fitness for your application (end legalese).

The utility uses the Robot/SCHEDULE file named RBTMSG, which records the results of all Robot/SCHEDULE operations. This file is stored in the ROBOTLIB library on my system (it may be stored in another library on your system). RBTMSG contains a lot of information for each job but to determine when a SCHEDULE job fails, I'm really only interested in four pieces of info.

  1. The name of each Robot job that SCHEDULE ran
  2. The severity code of the job after it ran. Successfully running SCHEDULE jobs have a severity code of 'C' (Normal completion). For our purposes, we will be looking for jobs that have a severity code of 'T' (Abnormal termination) or 'E' (Error in setup so job submission failed). Both codes indicate that the target job did not run as intended
  3. The date the Robot/Schedule ended
  4. The time the Robot/Schedule job ended

To avoid interfering with Robot/SCHEDULE processing, I created a work file that only contains these RBTMSG fields. I call this file JHRBTMSG. Here's the layout.

JHRBTMSG

A          R RBTMSG
A            CMRNAM        10A         TEXT('JOB NAME')
A            CMMSEV         1A         TEXT('SEVERITY')
A            CMSTIM         7P 0       TEXT('TIME')
A            CMSDAT         7P 0       TEXT('DATE')
A          K CMSDAT

Once I had my work file, I wrote the following program to alert me whenever a Robot/Schedule job failed with either a status of 'T' or 'E'. I call this program CHKRBTMSG. At the end of the code, you can find an explanation of what each code line does, along with instructions for setting up this program and a necessary supporting program as scheduled jobs in Robot/SCHEDULE.

0001.00  PGM
0002.00  DCLF       FILE(QGPL/JHRBTMSG)
0003.00  DCL        VAR(&TDATE7) TYPE(*CHAR) LEN(7)
0004.00  DCL        VAR(&TDATE7S) TYPE(*DEC) LEN(7 0) VALUE(0)
0005.00  DCL        VAR(&TDATE) TYPE(*CHAR) LEN(6)
0006.00  DCL        VAR(&TDAYMON) TYPE(*CHAR) LEN(4)
0007.00  DCL        VAR(&TDAY) TYPE(*CHAR) LEN(2)
0008.00  DCL        VAR(&TMONTH) TYPE(*CHAR) LEN(2)
0009.00  DCL        VAR(&TYEAR) TYPE(*CHAR) LEN(2)
0010.00  DCL        VAR(&FAILTIME) TYPE(*CHAR) LEN(8) VALUE(' ')
0011.00  DCL        VAR(&CMDTIMCH) TYPE(*CHAR) LEN(7) VALUE(' ')
0012.00  DCL        VAR(&FAILHR) TYPE(*CHAR) LEN(2) VALUE(' ')
0013.00  DCL        VAR(&FAILMIN) TYPE(*CHAR) LEN(2) VALUE(' ')
0014.00  DCL        VAR(&FAILSEC) TYPE(*CHAR) LEN(2) VALUE(' ')
0015.00  DCL        VAR(&LASTTIME) TYPE(*CHAR) LEN(7) VALUE(' ')
0016.00
0017.00  DCL        VAR(&MSG44) TYPE(*CHAR) LEN(44) VALUE(' ')
0018.00
0019.00  CHKOBJ     OBJ(QGPL/LASTRBTCHK) OBJTYPE(*DTAARA)
0020.00  MONMSG     MSGID(CPF9801) EXEC(DO)
0021.00  CRTDTAARA  DTAARA(QGPL/LASTRBTCHK) TYPE(*CHAR) LEN(7) +
0022.00                TEXT('Data area for Robot job failure +
0023.00                testing') /* Data area to hold time of +
0024.00                last terminated object */
0025.00  ENDDO
0026.00
0027.00  RTVDTAARA  DTAARA(QGPL/LASTRBTCHK) RTNVAR(&LASTTIME) /* +
0028.00                Last time that a terminated robot job +
0029.00                occurred */
0030.00
0031.00  CPYF       FROMFILE(ROBOTLIB/RBTMSG) +
0032.00                TOFILE(QGPL/JHRBTMSG) MBROPT(*REPLACE) +
0033.00                CRTFILE(*NO) INCREL((*IF CMMSEV *EQ 'T') +
0034.00                (*OR CMMSEV *EQ 'E')) FMTOPT(*MAP *DROP) +
0035.00                /* Create a duplicate file of RBTMSG */
0036.00
0037.00  RTVSYSVAL  SYSVAL(QDATE) RTNVAR(&TDATE)
0038.00  CHGVAR     VAR(&TDAYMON) VALUE(%SST(&TDATE 1 4))
0039.00  CHGVAR     VAR(&TYEAR) VALUE(%SST(&TDATE 5 2))
0040.00  CHGVAR     VAR(&TDATE7) VALUE('1' *CAT &TYEAR *CAT +
0041.00                &TDAYMON)
0042.00  CHGVAR     VAR(&TDATE7S) VALUE(&TDATE7)
0043.00
0044.00  LOOP:       RCVF
0045.00  MONMSG     MSGID(CPF0864) EXEC(GOTO CMDLBL(ENDPGM))
0046.00
0047.00  IF         COND(&CMSDAT *EQ &TDATE7S) THEN(DO)
0048.00     CHGVAR     VAR(&CMDTIMCH) VALUE(&CMSTIM)
0049.00     IF         COND(&CMDTIMCH *GT &LASTTIME) THEN(DO)
0050.00        CHGDTAARA  DTAARA(QGPL/LASTRBTCHK) VALUE(&CMDTIMCH) /* +
0051.00            Put the last time an error occurred in +
0052.00            the data area */
0053.00        CHGVAR     VAR(&FAILMIN) VALUE(%SST(&CMDTIMCH 4 2))
0054.00        CHGVAR     VAR(&FAILHR) VALUE(%SST(&CMDTIMCH 2 2))
0055.00        CHGVAR     VAR(&FAILSEC) VALUE(%SST(&CMDTIMCH 6 2))
0056.00
0057.00        CHGVAR     VAR(&FAILTIME) VALUE(&FAILHR *CAT ':' 
         *CAT &FAILMIN +
0058.00            *CAT ':' *CAT &FAILSEC)
0059.00        CHGVAR     VAR(&MSG44) VALUE('Robot job' *BCAT +
0060.00            &CMRNAM *BCAT 'failed at' *BCAT &FAILTIME)   
0061.00        SNDDST     TYPE(*LMSG) +
0062.00            TOINTNET(('email_address@emaildomain.com') +
0063.00              DSTD(&MSG44) LONGMSG('Please check the +
0064.00              status of this Robot job and take +
0065.00              corrective action to rerun')
0066.00         ENDDO
0067.00         ENDDO
0068.00
0069.00         GOTO       CMDLBL(LOOP)
0070.00
0071.00  ENDPGM:
0072.00
0073.00
0074.00
0075.00              ENDPGM

Here's how the program works.

Line 2 declares that I am using my JHRBTMSG work file to drive this program.

Lines 3-17 declares several different work variables that I use throughout the program.

Line 18-25 checks for the existence of a data area called LASTRBCHK in QGPL. If it doesn't find the data area, it recreates it. LASTRBCHK contains the time of day of the last failed SCHEDULE job that was emailed to the staff.

Lines 22-29 retrieves the LASTRBCHK data area back to my program for comparison purposes. It puts the time into a variable called &LASTTIME, which denotes the last time today that a Robot/SCHEDULE job failed on the system and the operations staff was alerted.

Lines 31-35 copies all the RBTMSG records that have a status of 'T' (Abnormal termination) or 'E' (Error in setup so job submission failed) into my JHRBTMSG work file.

Lines 37-42 retrieves today's date and formats it into a comparison value to be used against the records in the JHRBTMSG file.

Line 44-45 read the JHRBTMSG file and terminates the program when there are no records left.

Line 47 compares the incoming date on the 'T' or 'E' record with today's date. If the job referenced by this record did not run today, the program gets the next record. The program is only interested in failures that happened today.

Line 48 and 49 converts and compares the time stamp on the incoming record to the &LASTTIME variable (the time of the last Robot/SCHEDULE failure that was reported to the staff). It processes the record if the time is greater than the last failure time. If the time is equal to or less than the last failure time, it assumes the failure has already been reported and the program goes to the next record.

Line 50-52 updates the LASTRBCHK data area with the time that the latest Robot/SCHEDULE failure occurred (the time of the current record).

Lines 53-67 format the email message with the name of the Robot/SCHEDULE job that failed. This code then sends the message out to its intended recipients by using the Send Distribution (SNDDST) command.

Line 69 continues the loop to retrieve another record.

Lines 71-75 run the end of program procedure.

Scheduling CHKRBTMSG

To run this job and be alerted as soon as possible when a Robot/SCHEDULE job fails, I set up two jobs inside (where else) Robot/SCHEDULE.

  • The first job runs the checking program every 10 minutes to quickly alert the staff when a Robot/SCHEDULE job fails. I call this job CHKRBTMSG.
  • The second job deletes the LASTRBCHK data area at 12:01 a.m. every morning. This action ensures that all new failed jobs will be reported when CHKRBTMSG runs throughout the day.

And that's my routine for detecting and emailing staff whenever a Robot/SCHEDULE failure occurs. Feel free to modify this code however you see fit. Let me know how it works for you.

--Joe




                     Post this story to del.icio.us
               Post this story to Digg
    Post this story to Slashdot


Sponsored By
HALCYON SOFTWARE

How to Get Smart with Automation and Get Real ROI

This on-line presentation demonstrates the real cost savings to be made through automating the management of your IBM Power 6, (System i, iSeries), based on the actual experiences of companies using systems management solutions.

Packed with hints and tips it shows how one company achieved more than 51K in cost savings in the first year using a very methodical phased approach to automating manual tasks, checks and repetitive business processes.

View pre-recorded webinar

Download the white paper


Senior Technical Editor: Ted Holt
Technical Editor: Joe Hertvik
Contributing Technical Editors: Edwin Earley, Brian Kelly, Michael Sansoterra
Publisher and Advertising Director: Jenny Thomas
Advertising Sales Representative: Kim Reed
Contact the Editors: To contact anyone on the IT Jungle Team
Go to our contacts page and send us a message.

Sponsored Links

Halcyon Software:  Register now for our Multi-Platform Virtualization Webinar, March 31, 10 a.m.
System i Developer:  RPG & DB2 Summit in Orlando, April 15-17 for 3 days of serious training
COMMON:  Join us at the 2009 annual meeting and expo, April 26-30, Reno, Nevada


 

IT Jungle Store Top Book Picks

Easy Steps to Internet Programming for AS/400, iSeries, and System i: List Price, $49.95
The iSeries Express Web Implementer's Guide: List Price, $49.95
Getting Started with PHP for i5/OS: List Price, $59.95
The System i RPG & RPG IV Tutorial and Lab Exercises: List Price, $59.95
The System i Pocket RPG & RPG IV Guide: List Price, $69.95
The iSeries Pocket Database Guide: List Price, $59.00
The iSeries Pocket SQL Guide: List Price, $59.00
The iSeries Pocket Query Guide: List Price, $49.00
The iSeries Pocket WebFacing Primer: List Price, $39.00
Migrating to WebSphere Express for iSeries: List Price, $49.00
Getting Started With WebSphere Development Studio Client for iSeries: List Price, $89.00
Getting Started with WebSphere Express for iSeries: List Price, $49.00
Can the AS/400 Survive IBM?: List Price, $49.00
Chip Wars: List Price, $29.95


 
The Four Hundred
Wall Street Makes IBM, Sun Strange Bedfellows?

Measure Twice, Cut Once Applied to ERP Implementations

UCG Partners with MaxAva, Expands DR and HA Capabilities

As I See It: Generation Gap

BCD Cranks Up Services, Training for PHP Deployments

Four Hundred Stuff
Capitalware Clamps Down on WebSphere MQ's 'Big Dirty' Security Secret

FalconStor Casts a Wider De-Duplication Net

Help/Systems Extends i OS Job Scheduler to Linux and Unix

Shield Adds More Smarts to Budget HA Software

Appian Expands Web-Based Reporting GPS Tracking for Fleets

Four Hundred Monitor
Four Hundred Monitor's
Full iSeries Events Calendar

System i PTF Guide
March 21, 2009: Volume 11, Number 12

March 14, 2009: Volume 11, Number 11

March 7, 2009: Volume 11, Number 10

February 28, 2009: Volume 11, Number 9

February 21, 2009: Volume 11, Number 8

February 14, 2009: Volume 11, Number 7

TPM at The Register
Oracle raises software prices on IBM's Power6 iron

IBM sics lawyers on Bigger Indigo deal

Platform lands OCS cluster deal with HP

SAS schemes $70m biz analytics cloud

Sun's Niagara gets Linux (again)

Penguin floats hybrid Linux supers

Sun and IBM - What price Bigger Indigo?

Sun pitches new cloud as 'Open Platform'

Rackable shrinks CloudRack cookie sheets

Sun breaks through the clouds

California: Cisco gives out some details, finally

Sun lands Sparc-Xeon super on Cape Town

Cisco throws California virt-server gauntlet

Storage software bucks hardware sales trend

THIS ISSUE SPONSORED BY:

WorksRight Software
ProData Computer Services
Halcyon Software


Printer Friendly Version


TABLE OF CONTENTS
Treasury of new DB2 6.1 (V6R1) Features, Part 4: Index and Data Type Enhancements

Looking for Commitment, Part 3

Catching Robot/SCHEDULE Job Failures As They Happen

Four Hundred Guru

BACK ISSUES

From the IT Jungle Forums
How to return value from CL program?

ADO.NET/IBM.Data.DB2.iSeries/ iDB2Connection

Order by alias names

SQL procedure

Insert via Java

iSeries Access for Web

Mimix installation and configuration docs




 
Subscription Information:
You can unsubscribe, change your email address, or sign up for any of IT Jungle's free e-newsletters through our Web site at http://www.itjungle.com/sub/subscribe.html.

Copyright © 1996-2009 Guild Companies, Inc. All Rights Reserved.
Guild Companies, Inc., 50 Park Terrace East, Suite 8F, New York, NY 10034

Privacy Statement