Admin Alert: The Case of the Mysterious CPI0999 Storage Error
May 10, 2006 Joe Hertvik
Most i5 administrators know that system problems can occur when i5 disk units fill up. System degradation can occur when system storage reaches 90 percent, and the system can crash after storage usage passes 95 percent. But while disk drive capacity problems are fairly straightforward to handle, they are not the only storage problems that can occur on OS/400 boxes.
There is a second storage issue–identified when the system starts issuing CPI0999 error messages–that is not as clearly understood as traditional storage capacity problems. A CPI0999 issue can also create severe performance problems on your system and it should be attended to immediately after it is discovered. The problem lies in identifying when the situation legitimately occurs and what to do about it.
CPI0999 can be issued by an i5/OS V5 system or by an iSeries or AS/400 system running OS/400 V4R5 and below. The message text reads “Storage directory threshold reached.” CPI0999 can be triggered in conjunction with a CPF0907 message (Serious storage condition may exist. Press HELP), or it can be sent out as a message entirely by itself.
According to most sources, CPI0999 is issued when one of the following two situations occur:
In either situation, when storage usage is bypassing its threshold values, it is imperative that you quickly reduce how much storage is used on the system. However, there is a problem. When storage usage is approaching critical mass and it’s threatening to crash your system, there usually isn’t time for measured thought. Storage overflow emergencies are usually caused by one or two rogue jobs that are quickly filling up hard drive space, and your job is to pinpoint and correct the problem before it damages the system.
So the first step in either situation is to find any job that is filling up system storage and end it before it ends the system. If the job was filling up temporary storage or work files in QTEMP, ending the rogue job should be enough to stop your storage run-up. If the job was filling up storage by writing excessive records to an application file or copying several large production files to a test environment, you will need to reverse that situation, either by deleting records and reorganizing files or by deleting unnecessary files that caused your storage to peak. For tips on reorganizing files with deleted records, click here.
While storage recovery works well for a CPF0907 problem (situation 1), the diagnosis is different for a straight CPI0999 scenario (situation 2) because it is a fragmentation problem as well as a disk storage problem. Increasing unused disk space by deleting, clearing, or reorganizing files will help resolve situation 2, because there will be fewer fragmented files filling up your hard drive, but the fragmentation and system directory issues may still be there.
To make matters worse, all versions of the i5/OS and OS/400 operating system cannot tell if a situation 2 issue is corrected until after an IPL occurs. So the operating system will continue to send you CPI0999 messages every hour until the problem is corrected and the system is IPLed. And if you run your i5/OS box like most shops, where your partitioned system can run for months without re-IPLing, that is a long time to go without knowing whether your problem is solved.
So what can you do if you find yourself in a CPI0999 situation? Here’s my list of things to check off on to solve this problem and to stop any critical issues, such as system degradation or a lack of contiguous storage that could prevent your system from IPLing.
If all else fails and you have done everything else to correct this situation short of an IPL, you may just have to run with hourly CPI0999 messages until you can get around to IPLing. Many shops aren’t able to IPL frequently, and since it’s impossible to know if a straight CPI0999 situation is resolved without an IPL, running with these messages is sometimes the only choice you have.
RELATED STORIES AND ITEMS