Admin Alert: Dealing with i5 Critical Storage Errors,
April 25, 2007 Joe Hertvik
Critical storage errors can occur whenever i5 hard drive usage passes 90 percent of available storage space. Above 90 percent, system performance starts to degrade. When storage usage breaches 95 percent, the system can become unstable and turn itself off or spontaneously reboot. While you can’t predict when critical storage errors will occur, there are several things you can do to detect and cleanup storage problems before they crash your system.
What’s the Disk Facts, Jack?
This week and next week, I’ll deal with the issues involved in monitoring and correcting critical storage errors or storage overflow conditions. Today, I’ll look at some tricks for monitoring your system for developing storage problems. Next week, I’ll discuss some common reasons why your i5 partition storage may be filling up and show you some simple things that you can do to reduce storage usage.
The first step in handling unexpected disk usage spikes is to monitor for them. The i5/OS operating system provides the following two monitoring functions to detect hazardous disk overflow situations.
Besides the information in this article, you can find more information on setting and using these values in a previous article I wrote called Protecting Your System from Critical Storage Errors. Today’s article supplements the info in the previous article and it provides additional information about setting QSTGLOWLMT and QSTGLOWACN. These two storage monitoring settings are effective in helping you identify when problems are starting to occur, but they are more effective when combined with a monitoring and paging program such as Bytware’s Messenger Plus product, which can read system logs and message queues and notify you as a problem is occurring.
Compensating for the Defaults
The big problem with IBM’s critical storage notification system is that the i5/OS defaults for these values are set too high to enable administrators to do much more than panic when storage is starting to fill up. So my first line of defense for monitoring disk usage is to set these values to more appropriate levels that will give me more time to react when a problem occurs. In my shops, I usually set the disk drive threshold value to 85 percent and my QSTGLOWLMT limit value (auxiliary storage lower limit) to 15 percent. This provides additional time to analyze and react to storage usage issues before a crisis appears, which creates a more proactive environment for taking care of these issues as they occur.
However, there is one modification that I need to consider when setting QSTGLOWLMT to a higher value, such as 15 percent available storage. By default, QSTGLOWLMT is set to 5 percent available storage, which tells the system to detect and react to the most critical situation where the system can become unstable and spontaneously crash. When setting your auxiliary storage lower limit value higher than 5 percent, you are changing the rules of the game. Now, you are telling the system to look for possible storage overflow errors as the system approaches critical storage, not when it has already passed the 5 percent available storage rate. Depending on the value that you set QSTGLOWLMT to, your critical storage monitoring can be changed from a reactive situation (looking for storage situations when available storage is less than 5 to 10 percent of all hard drive space) to a proactive situation (where available hard drive space is approaching the critical five-to-ten percent value but has not yet passed that mark yet).
So if I change my QSTGLOWLMT value to 15 percent from 5 percent, I should also reconsider what action to change my Auxiliary storage lower limit action system value (QSTGLOWACN) to. At this higher value, I should have more available time before I hit the crisis point of 5 percent available storage (95 percent full) and therefore I shouldn’t set QSTGLOWACN to any of its panic-mode values, such as automatically ending the system to a restricted state (*ENDSYS) or immediately powering down the system and restarting it (*PWRDWNSYS). Theoretically, I should have more time to react and correct the problem at 15 percent available storage (85 percent full) than I have at 5 percent availability. At higher values (15 percent and above) QSTGLOWACN should only be set to one of the following milder settings where I can investigate and correct the problem before it gets much worse.
About Our Testing Environment
All configurations described in this article were tested on an i5 550 box running i5/OS V5R3. Most of the commands used here are also available in earlier versions of the i5/OS and OS/400 operating systems, so the configurations should be usable in prior releases. The QSTGLOWLMT and QSTGLOWACN system values are only available in OS/400 V4R2 and later operating systems, including the newer i5/OS V5Rx systems. However, you may notice minor variations in pre-V5R3 copies of these commands. These differences may be due to incremental command improvements that have occurred from release to release.