Admin Alert: Ending Subsystems Properly
January 17, 2007 Joe Hertvik
It’s unavoidable that system i5 administrators occasionally have to end a subsystem by using the End Subsystem command (ENDSBS). Killing jobs can wreak havoc with system processing because jobs may not have time to cleanly end before the subsystem terminates, which can result in partially updated data and potentially damaged objects. Fortunately, there are some simple things you can do to cushion but not totally soften the blow as your jobs end.
Why Interactive Subsystems Must End
Killing a subsystem is a fairly common event. It’s necessary in several situations, including:
Regardless of why you need to end a subsystem, there’s a definite way to perform this procedure so that it causes the least amount of damage to your files and data. Here are some of the ins and outs that I consider when I need to pull the plug on user jobs in my subsystems.
Issuing Early Warnings
During an emergency, you can’t really prepare system users for an unscheduled shutdown, but there are plenty of options for cleanly removing users before a scheduled shutdown. If the shutdown will happen within a day, you can email all affected users, giving them notice as to when they should end their work and cleanly sign off before shutdown occurs. An email notification is particularly effective for weekend shutdowns, so that people who normally come in on a Saturday or Sunday will have some notice that the system may not be available that day.
Closer to the actual shutdown, you can also send i5/OS break messages to the users who are still signed on, letting them know that the system is going to come down soon. An i5/OS break message will (under most circumstances) immediately be displayed on the user’s 5250 screen session. Once received, the system uses the Display Message command (DSPMSG) to immediately pop the message up on the terminal session.
For scheduled outages, my strategy is to use the Send Break Message command (SNDBRKMSG) to send a series of break messages alerting my 5250 users that my subsystems will soon be terminated. About a half-hour before I’m going to end the system, I will use SNDBRKMSG to send the following break message:
SNDBRKMSG MSG(‘***ATTENTION*** The system will be coming down for maintenance in about a half-hour at 9:00 PM. Please start finishing your work and sign off.’) TOMSGQ(*ALLWS)
By placing the *ALLWS literal in the To work station message queue parameter (TOMSG), SNDBRKMSG sends this message to all work station and message queues, so that everyone who is currently signed on will immediately see the message. For some users, the message may not break on a user’s screen if their job’s Break message handling parameter (BRKMSG) is set to *HOLD. But most people will receive this message.
As the time for shutdown draws nearer, I will usually send two more break messages to all users. I send one message about 10 minutes before the shutdown occurs and another right before I shutdown the system. If the shutdown is happening during working hours and most of the users are in one location where I can reach them easily, I may also use the company paging system to announce the coming shutdown.
If I’m in a smaller shop, I can optionally use the Work with Active Jobs command (WRKACTJOB) with the Subsystem parameter (SBS) to obtain a list of all signed on users who haven’t signed off yet.
WRKACTJOB OUTPUT(*PRINT) SBS(QINTER)
If there are only a few signed on users left and I know how to reach them, I may even call them up or go over to their desk and sign them off myself. The idea is that I want to give my users as much warning as possible before I forcibly end their jobs by executing the ENDSBS command.
Doing the Deed–ENDSBS
When it’s time to end an individual subsystem, the End Subsystem command (ENDSBS) is the i5 tool of choice. ENDSBS is an easy tool to use but there are a few tricks. First, you need to decide whether to end your subsystem immediately or on a delayed basis. To end QINTER immediately, you would run ENDSBS this way.
ENDSBS SBS(QINTER) OPTION(*IMMED)
In this case, the subsystem starts terminating jobs immediately, and no new jobs can enter the system. However, the individual jobs in the subsystem may not instantly end. Rather, individual jobs may take several minutes to end, depending on how the application is written and what value is specified in the Time limit during immediate ending of a job system value (QENDJOBLMT). Here’s how this feature works.
Before explaining QENDJOBLMT, I should note that this system value is only available with i5/OS V5R3 and V5R4. In i5/OS V5R2 and V5R1, this function is activated by creating a QENDJOBLMT data area. Instructions for creating this data area can found in IBM APAR SE07797. You may also need to install PTFs to add this processing to those releases.
When performing an ENDSBS *IMMED command, if a subsystem job is running an application that is written with a signal handling procedure for the asynchronous signal SIGTERM, i5/OS will generate a SIGTERM signal for that job. The SIGTERM tells the application that the subsystem will soon be ending the job and that the running application should immediately start its end of job processing. Once SIGTERM is sent, i5/OS sets a timer for the maximum amount of time (in seconds) that the system will wait for the job’s signal handling procedure to finish ending the job. The timer’s value is retrieved from the QENDJOBLMT system value or data area, and the system will immediately end the job if the timer runs out before the signal handler finishes. The default value for QENDJOBLMT is 120 seconds (two minutes) but it can be set anywhere from 30 seconds all the way up to 3600 seconds (one hour). This is why you sometimes see subsystem jobs take a long time to terminate after issuing an immediate subsystem end. For applications that don’t handle SIGTERM processing, the subsystem job is immediately ended.
It’s also worth noting that QENDJOBLMT and SIGTERM processing also comes into play whenever you use any i5/OS command that ends system jobs with a *IMMED option, such as the End Job command (ENDJOB), the End System command (ENDSYS) or when using the End Subsystem command to end all subsystems on your i5 partition, like this:
ENDSBS SBS(*ALL) OPTION(*IMMED)
The other thing to keep in mind about the QENDJOBLMT system value is that it should also be coordinated with the Maximum time for PWRDWNSYS *IMMED system value (QPWRDWNLMT). QPWRDWNLMT defines how long (in seconds) the system will wait for all jobs to end before powering down the system. Your QENDJOBLMT value should always be less than the QPWRDWNLMT value, otherwise QENDJOBLMT will be ignored if you follow an ENDSBS command with a Power Down System command (PWRDWNSYS) for immediately ending your system.
PWRDWNSYS OPTION(*IMMED) RESTART(*YES)
When QPWRDWNLMT is greater than QENDJOBLMT, the system will end your jobs with an IPL before your job’s SIGTERM processing had a chance to complete.
To end your subsystem in a controlled manner, where the system gives user jobs a specific amount of time to end, you would run the ENDSBS command this way.
ENDSBS SBS(QINTER) OPTION(*CNTRLD) DELAY(time_in_seconds)
A controlled subsystem end is specified by entering *CNTRLD in the How to end parameter (OPTION). Controlled subsystem ending allows your jobs a specific amount of time to end before they are terminated. Like an immediate end, jobs that have signal handling procedures are sent a SIGTERM signal, but all other subsystem jobs are given the amount of time listed in DELAY to end before the subsystem completes processing for them.
The default value for DELAY is *NOLIMIT, which means that no new jobs can enter the subsystem and the subsystem will terminate after all of its jobs have ended. While this is a nice clean way to end both your jobs and your subsystem, it could result in a subsystem that never goes down if you have a job that refuses to end after receiving a SIGTERM signal. However, this feature is perfect for ending batch subsystems after all their current jobs finish processing.
About Our Testing Environment
All configurations described in this article were tested on an i5 box running i5/OS V5R3. The QENDJOBLMT system value is only available in i5/OS V5R3 and V5R4 but it can be implemented with a data area in i5/OS V5R2 and V5R1.