Detecting A “Job End” Condition in DB2 for i
December 1, 2015 Hey, Mike
An ETL process extracts data from our ERP system and places it in our Business Intelligence systems. It runs 24/7 except for the End of Day process that shuts down the subsystem that it’s in. The SQL process we run ends up dying violently and the parent process fails (without recovering on startup). A graceful exit would be as simple as issuing a RETURN statement! Can SQL detect a shutdown process?
The ANSI SQL standard offers no specific guideline for implementing some kind of host shutdown process, so you’ll have to rely on the other special OS related views and functions IBM has given us in DB2 for i to find out about host OS activity. This includes requests for a controlled shutdown of a job or subsystem.
For those who don’t know, an SQL job (or its parent subsystem) can be requested to end and also given a certain duration to shut itself down before the OS forces the job to end. In IBM i, this is referred to as a “controlled” end. For example, when an ENDJOB *CNTRLD command is issued, the default time for the job to shut itself down is 30 seconds. For a PWRDWNSYS *CNTRLD the default time is 3600 seconds (1 hour).
Thinking about the DB2 for i services views and user-defined table functions (UDTF), there are two OS related services that present a hope that there is some way to accomplish this task entirely with SQL. You should be on IBM i 7.1 or 7.2 with the latest database group PTFs to use these services.
The first service that came to mind is the GET_JOB_INFO UDTF which can be used to return the current job’s status:
SELECT V_ACTIVE_JOB_STATUS INTO @STATUS FROM TABLE(GET_JOB_INFO('*')) job;
I was hoping the V_ACTIVE_JOB_STATUS column would return an “END” during a controlled job end, but alas it continued to return the value “RUN” even when a controlled end was requested. If the job’s status does ever change to END, I fear it will be too unreliable to detect and act on a controlled end request.
Knowing that when a controlled end job or end subsystem is requested, a message is placed in the job log, my next attempt used the QSYS2.JOBLOG_INFO UDTF to read the job log for a specific “end job” message when a controlled end is requested.
The following code snippet is a dynamic compound statement that continually reads the job log in a loop until it detects a message ID related to the job ending. The specific message ids the code looks for include:
Once the code detects one of these messages in the current job’s job log, it sends a message to QSYSOPR and exits:
BEGIN DECLARE @STATUS CHAR(4); DECLARE @EXIT CHAR(1) NOT NULL DEFAULT 'N'; WHILE @EXIT='N' DO IF EXISTS ( /* Check job log for controlled end requests */ SELECT * FROM TABLE(QSYS2.JOBLOG_INFO('*')|) JL WHERE MESSAGE_ID IN ('CPC1206','CPC1126')-- Subsystem Ending, Job Ended AND MESSAGE_TYPE='COMPLETION') THEN CALL QCMDEXC( 'SNDMSG MSG(''WARNING: SQL Code is taking a dive!'') TOUSR(*SYSOPR)'); SET @EXIT='Y'; ELSE CALL QSYS2.QCMDEXC ('DLYJOB (1)'); /* Wait for shutdown */ END IF; END WHILE; END;
The asterisk parameter (shown in red) signifies the UDTF should read the job log for the current job. I expect that finding message IDs CPC1206 and CPC1126 should cover normal controlled shutdowns by the following commands: ENDSBS, ENDSYS, PWRDWNSYS and ENJOB. There may be other “end job” conditions that your SQL code can check for such as:
When the controlled end request is detected, the following appears in the system operator message queue:
Additional Message Information From . . . . . . . . . : MIKE Date sent . . . . . . : 10/01/15 Time sent . . . . . . : 19:33:51 Message . . . . : WARNING: SQL Code is taking a dive!
The caveat to all this is that for a procedure to achieve a graceful shutdown, the code to check for the shutdown condition must be executed during the allotted interval or it was all in vain. Note the precaution with this technique: If your SQL job happens to have a huge job log, it could take a while for this UDTF to run. If the job has 10 seconds to shut itself down, you’ll be in trouble if the JOBLOG_INFO UDTF takes 17 seconds!
With respect to using an RPG solution, I believe implementing a non-deterministic RPG external function to return the value of the %SHTDN BIF is probably a much lighter weight solution than the sample shown above and it is easy to implement. For those unfamiliar with %SHTDN, it simply returns a ‘0’ if the job is active and a ‘1’ if the job has a pending request to end in a controlled manner by any of these commands ENDSBS, ENDSYS, PWRDWNSYS and ENJOB.
An RPG service program to do this is easy to create. This sample program is named JOBEND:
// Compile as module // CRTSRVPGM SRVPGM(zzz/JOBEND) EXPORT(*ALL) HNoMain DIsJobEnding PR 1 PIsJobEnding B Export DIsJobEnding PI 1 /Free Return %ShtDn; /End-Free PIsJobEnding E
Once the service program is built, create an external UDF wrapper (replace zzz with your library):
CREATE OR REPLACE FUNCTION zzz.IsJobEnd() RETURNS CHAR(1) LANGUAGE RPGLE EXTERNAL NAME 'zzz/JOBEND(ISJOBENDING)' PARAMETER STYLE GENERAL RETURNS NULL ON NULL INPUT NOT DETERMINISTIC NO EXTERNAL ACTION DISALLOW PARALLEL
Note that the function should be marked as non-deterministic. And now you have a UDF that returns a ‘1’ (current job ending) or ‘0’ (current job is active):
An additional benefit to this UDF is that it can be used to interrupt a long-running query (such as a BI ETL query) by using it in conjunction with the RAISE_ERROR built-in function:
SELECT COL1, COL2... FROM IMMENSE_QUERY WHERE CASE WHEN zzz.IsJobEnd()='1' THEN RAISE_ERROR('38E00', 'Controlled Shutdown Request was Received') ELSE 'OK' END='OK'
RAISE_ERROR is a strange function that is designed to throw an error! When wrapped in the CASE statement above, it will only throw an error if the IsJobEnd UDF evaluates to ‘1’. In this example, RAISE_ERROR will signal an error with SQLSTATE 38E00 (a user defined/custom SQL state) and cause the statement to stop. Whatever process is running the statement (stored procedure, Java connection, etc.) will receive a “hard” error with SQLSTATE 38E00 and should handle the error gracefully.
However, even this technique may not be completely effective. If DB2 is doing its JOIN processing during the duration of the controlled end (prior to the evaluation of the WHERE clause), the statement may never shut itself down as intended. When I first tested the above statement, I created an immense CROSS JOIN query and even though I gave the job 30 seconds to end, the job was ended by the OS prior to the IsJobEnd/Raise_Error expression’s evaluation.
To remedy this, I ended up changing the CROSS JOIN to a LEFT JOIN and duplicating the CASE statement in the JOIN’s correlation:
LEFT JOIN MY_BIG_TABLE ON CASE WHEN dev.IsJobEnd()='1' THEN RAISE_ERROR('38E00', 'Controlled Shutdown Request was Received') ELSE 'OK' END='OK'
Although this adds a tax to the query, it gives DB2 another opportunity to evaluate the function throughout the various stages of query processing.
It is important for IsJobEnd to be defined as NOT DETERMINISTIC so that DB2 for i will continually re-evaluate it while running in a statement like the one shown above. One last thing, if a job or subsystem is ended *IMMED (immediately) don’t count on this code working correctly!