fhg
Volume 6, Number 19 -- May 10, 2006

Admin Alert: The Case of the Mysterious CPI0999 Storage Error

Published: May 10, 2006

by Joe Hertvik

Most i5 administrators know that system problems can occur when i5 disk units fill up. System degradation can occur when system storage reaches 90 percent, and the system can crash after storage usage passes 95 percent. But while disk drive capacity problems are fairly straightforward to handle, they are not the only storage problems that can occur on OS/400 boxes.

There is a second storage issue--identified when the system starts issuing CPI0999 error messages--that is not as clearly understood as traditional storage capacity problems. A CPI0999 issue can also create severe performance problems on your system and it should be attended to immediately after it is discovered. The problem lies in identifying when the situation legitimately occurs and what to do about it.

CPI0999 can be issued by an i5/OS V5 system or by an iSeries or AS/400 system running OS/400 V4R5 and below. The message text reads "Storage directory threshold reached." CPI0999 can be triggered in conjunction with a CPF0907 message (Serious storage condition may exist. Press HELP), or it can be sent out as a message entirely by itself.

According to most sources, CPI0999 is issued when one of the following two situations occur:

  1. System storage is filling up and hard drive usage has zoomed past its storage threshold values. Here, the system sends out a CPF0907 message in conjunction with CPI0999. In this situation, there's a good chance (90 percent or more) that the CPI0999 message was a byproduct of an accidental run-up in storage usage. Any situations associated with CPI0999 will probably resolve themselves when total storage usage is once again reduced below its threshold levels.
  2. When a disk fragmentation problem occurs and i5/OS is indicating that the system has passed what IBM calls "a point of significant concern." Besides fragmentation, a CPI0999 error may also signal that the system directory contains too many entries and needs to be expanded. These conditions affect system performance, and IBM lists CPI0999 as a potentially serious condition that must be corrected immediately. In addition, a piece of older IBM documentation that I found states that a totally fragmented system may not IPL.

In either situation, when storage usage is bypassing its threshold values, it is imperative that you quickly reduce how much storage is used on the system. However, there is a problem. When storage usage is approaching critical mass and it's threatening to crash your system, there usually isn't time for measured thought. Storage overflow emergencies are usually caused by one or two rogue jobs that are quickly filling up hard drive space, and your job is to pinpoint and correct the problem before it damages the system.

So the first step in either situation is to find any job that is filling up system storage and end it before it ends the system. If the job was filling up temporary storage or work files in QTEMP, ending the rogue job should be enough to stop your storage run-up. If the job was filling up storage by writing excessive records to an application file or copying several large production files to a test environment, you will need to reverse that situation, either by deleting records and reorganizing files or by deleting unnecessary files that caused your storage to peak. For tips on reorganizing files with deleted records, click here.

While storage recovery works well for a CPF0907 problem (situation 1), the diagnosis is different for a straight CPI0999 scenario (situation 2) because it is a fragmentation problem as well as a disk storage problem. Increasing unused disk space by deleting, clearing, or reorganizing files will help resolve situation 2, because there will be fewer fragmented files filling up your hard drive, but the fragmentation and system directory issues may still be there.

To make matters worse, all versions of the i5/OS and OS/400 operating system cannot tell if a situation 2 issue is corrected until after an IPL occurs. So the operating system will continue to send you CPI0999 messages every hour until the problem is corrected and the system is IPLed. And if you run your i5/OS box like most shops, where your partitioned system can run for months without re-IPLing, that is a long time to go without knowing whether your problem is solved.

So what can you do if you find yourself in a CPI0999 situation? Here's my list of things to check off on to solve this problem and to stop any critical issues, such as system degradation or a lack of contiguous storage that could prevent your system from IPLing.

  1. If you are able to, schedule a system IPL as soon as possible. An IPL may delete a large number of temporary objects that are filling up the storage directory and contributing to disk fragmentation. IBM's documentation on the CPI0999 problem is sketchy, but IPLing your box seems to be one of the more effective ways to solve this problem. It's also the only way to turn off the CPI0999 message, which, as I mentioned before, will be sent out every hour on the hour until an IPL occurs.
  2. Think about performing a Reclaim Storage command (RCLSTG) on your system. RCLSTG attempts to validate and reclaim orphaned, damaged, or incompletely updated objects on your system. It also deletes unusable objects and fragments. The down side with RCLSTG is that it requires your system to be in a restricted state where no work can be done, and, if RCLSTG has not been run in a long time, it may take many hours to run, preventing your system from processing work during that time.
  3. Take a system inventory and delete or clear out any unnecessary files and members on your system. As I mentioned before, this will increase free disk space that can then be reorganized after an IPL or during the STRDSKRGZ command mentioned in the next point.
  4. i5/OS also provides the Start Disk Reorganization command (STRDSKRGZ), which allows you to start a disk reorganization function for one or more storage pools on your system. Similar to the Windows defragmentation tool, STRDSKRGZ reorganizes unused disk space together in one place, in order to reduce fragmentation and to allow future disk allocation requests to be performed more efficiently. What's nice about STRDSKRGZ is that it has a Time Limit parameter (TIMLMT) that allows you to run the command only for a set amount of time. Once the time limit is passed, the command finishes its current processing and ends. This allows you to perform a partial reorganization over time, rather than trying to reorganize your entire ASP at one time. The other nice thing about STRDSKRGZ is that it can be run anytime without the system being in restricted state.
  5. You can also try running the Reclaim Temporary Storage command (RCLTMPSTG), which automatically reclaims storage used by temporarily decompressed copies of panel groups, menus, display files, and printer files. The kicker here is that the temporary storage is not fully reclaimed until the next time the system is IPLed.

If all else fails and you have done everything else to correct this situation short of an IPL, you may just have to run with hourly CPI0999 messages until you can get around to IPLing. Many shops aren't able to IPL frequently, and since it's impossible to know if a straight CPI0999 situation is resolved without an IPL, running with these messages is sometimes the only choice you have.


RELATED STORIES AND ITEMS

Getting In and Out of Restricted State

IBM APAR MA14464 for OS/400 V3R6M0 Dealing with DASD Fragmentation

iSeries Information Center, RCLTMPSTG Command Description

iSeries Information Center, STRDSKRGZ Command Description

Protecting Your System from Critical Storage Errors

Tips for Dealing with Deleted Records in AS/400 Files

Tips on Running RCLSTG



Sponsored By
COMMON

COMMON Fall 2006
IT Education Conference & Expo
Miami Beach, Florida
September 17-21, 2006

Register Now!

COMMON is the world's largest community of IBM midrange users providing information, education and networking for end-users. COMMON is hosting its Fall 2006 IT Education Conference & Expo in Miami Beach, Florida, September 17-21, and will feature a host of sessions and hands-on labs covering business strategy, database, systems management, networking and development, with a featured educational focus on Disaster Recovery. Take advantage of the real-world technical education, best-practice sharing with fellow iSeries users, and access to IBM executives and developers that you can't find anywhere else!

Conference Highlights
· Choose from hundreds of sessions organized by specific Courses of Study to help
      you find the technical information you need.
· Explore the latest technology solutions in the industry's largest iSeries-related Expo.
· Network with like-minded peers at COMMON social events.
· Speak directly with IBM executives at the iSeries Town Hall Meeting and take
      advantage of multiple opportunities to talk with the IBM experts who build the
      technology that you use every day.

COMMON education is one of the most cost-effective ways to gain the tools and knowledge you need to meet the changing demands of information technology. You'll receive education unlike any offered within the industry that will enable you to garner a tangible and immediate return on your investment. In fact, more than 95% of COMMON Spring 2005 attendees said they gained information from sessions that was of immediate value when they returned to work.

In addition to the direct savings on education, conference attendees make lasting career connections with other iSeries professionals whom they can consult long after the conference ends. Attendees will also have direct access to IBM developers and managers. The Expo offers an opportunity to talk one-on-one with industry vendors who provide the latest products and services. This means attendees return to the office with real-time solutions that can be implemented immediately--without wasting countless hours in independent research.

For more information on COMMON and to register, please visit:
www.common.org



Senior Technical Editor: Ted Holt
Technical Editors: Howard Arner, Joe Hertvik, Shannon O'Donnell, Kevin Vandever
Contributing Technical Editors: Joel Cochran, Wayne O. Evans, Raymond Everhart,
Bruce Guetzkow, Brian Kelly, Marc Logemann, David Morris
Publisher and Advertising Director: Jenny Thomas
Advertising Sales Representative: Kim Reed
Contact the Editors: To contact anyone on the IT Jungle Team
Go to our contacts page and send us a message.

Sponsored Links

COMMON:  Join us at the Fall 2006 conference, September 17-21, in Miami Beach, Florida
T.L. Ashford:  BARCODE400 - the fastest way to create compliance labels directly from the iSeries
BCD:  Try WebSmart - the easiest and most complete iSeries Web development tool

 


 
Subscription Information:
You can unsubscribe, change your email address, or sign up for any of IT Jungle's free e-newsletters through our Web site at http://www.itjungle.com/sub/subscribe.html.

Copyright © 1996-2008 Guild Companies, Inc. All Rights Reserved.
Guild Companies, Inc., 50 Park Terrace East, Suite 8F, New York, NY 10034

Privacy Statement