Joblogapalooza, Its Possible Causes, and a Call for Input
August 23, 2006 Hey, Joe
Last week the number of joblogs in our QEZJOBLOG output queue increased by thousands of spooled files. Auxiliary storage pool storage usage also increased from 74 percent to 96 percent. Even IBM can’t tell me what’s wrong. The only clues I have are that I keep getting a QSYSOPR CPF22E3 message that reads ‘User profile EXPLOIT is disabled’ and there are lots of QRWTSRVR joblogs in QEZJOBLOG. Do you have any idea what’s happening?
I don’t have a clear answer for what’s going on but I do have a few suspicions. Here’s what I think could be happening based on something that occurred in my shop last week.
We had a user job that was stuck in a loop as it was trying to write duplicate records with the same unique key. As the job kept processing, it literally created thousands of joblog files, all with multiple instances of the same CPF5026 error message (Duplicate Key Not Allowed for member &6). This job created so many joblogs that it eventually hit the upper limit on how many spooled files the system would let it produce. I later found out that a job’s spooled file limit is defined in the Maximum spooled files system value, QMAXSPLF. My QMAXSPLF value was set very high at 20000 spooled files per job, and, sure enough, there were 21220 spooled files in my QEZJOBLOG output queue that day.
So in my case, the job blew the upper range on the number of spooled files that it could create. Because it was producing so many spooled files, it was slowly filling up storage but not so much storage that it threatened to crash the system as is happening in your situation. I’m not sure if your problem fits into this exact same scenario (where runaway jobs create excessive joblogs that are filling up system storage) but here are a few items to check for dealing with your own renegade spooled file situation.
1. Check your QMAXSPLF system value to see how many spooled files a job is allowed to produce. The default value is 9999 (which is high enough, all by itself) but QMAXSPLF can be set anywhere from 9999 to 999999 spooled files per job in i5/OS V5R3. If this number is set too high, it’s possible that a runaway job could be driving system storage upward with excessive joblogs. Correct QMAXSPLF as you see fit, but unfortunately IBM doesn’t let you set this value below 9999.
2. Check to see which devices or server jobs are trying to sign on to your system with the user profile. Investigate why you have a user profile called EXPLOIT and what type of processing the system uses it for. I find it curious that you keep getting CPF22E3 messages for that user (User profile &1 is disabled). It’s possible there’s a server job out there that’s trying to sign on the system with the EXPLOIT user profile, and that the signons are being rejected because the profile is disabled. Maybe the job is being persistent in trying to log on and that could be causing the many thousands of QEZJOBLOG printouts that you are seeing (similar to what I saw when my looping job kept trying to add a duplicate key to a file, over and over again).
You can find any jobs associated with your EXPLOIT user by running the Work with User Job command (WRKUSRJOB) for that user profile, like this:
In particular, check to see how many spooled files are associated with any jobs you find.
Another possibility is that someone could be trying to hack in to your system by using the EXPLOIT user profile and an automatic password guessing program. This may have caused EXPLOIT to be disabled according to the Maximum sign-on attempts allowed (QMAXSIGN) and the Action to take for failed signon attempts (QMAXSGNACN) system values. If it’s an automatic attack, however, the program might be persistent and that may be why you keep seeing CPF22E3 messages in QSYSOPR.
Regardless of whether one of these scenarios is happening, I would also take a look at the QRWTSRVR server jobs on your system to see if they are trying to use EXPLOIT to perform redundant server processing. If EXPLOIT is disabled, the solution might be to simply re-enable the user profile.
3. If the problem continues, be sure to clear QEZJOBLOG on a regular basis to keep excessive joblogs from overwhelming your system. Unfortunately, this is treating the symptoms rather than the cause but it may be what’s needed to keep your system going until you find out what’s wrong. You can clear QEZJOBLOG by using the following Clear Output Queue command (CLROUTQ):
So that’s my possible take on the situation: it could be caused by a runaway job (or jobs) that with your system’s blessings are producing too many joblogs. This may not be the answer you’re looking for, but it seems odd that your shop came up with a problem that was eerily similar to what my shop experienced.
Beyond this diagnosis, I would also like to throw this question out to our Four Hundred Guru readers to see what they think. Since I have now encountered two situations where joblogs can start filling up the system, there may be more out there. If you have ever seen this situation occur before or you know how to solve it, please send me an email from the IT Jungle contact page. The best solutions may be featured in a future column.