Admin Alert: A Starter Program to Find Damaged Objects in i OS 6.1.1
Published: July 27, 2011
by Joe Hertvik
After a damaged object prevented a full system production backup in my shop, I was asked to devise a technique for detecting damaged objects on the system. This week, I'm presenting my starter CL program that looks for damaged objects as a template for approaching the issue, in the hopes that others can add to and expand on this code.
How Objects Get Damaged
iSeries, System i, and Power i objects can get damaged in a variety of ways, including:
- Electrical outages where the power is quickly cut off before the hard drive has a chance to finish writing all its information to disk.
- Hard drive failures and pending hard drive failures, where a disk drive that is starting to fail sends out random noise to the disk. This noise can affect the way data is written into an object.
To a certain extent, you can protect yourself from power failure induced corruption by installing and attaching an uninterruptible power supply (UPS) and a generator to your Power i box. There usually isn't much, however, that you can do to prevent a hard drive failure. So your options for prevention are fairly limited, forcing you to look instead toward detecting damaged objects soon after they occur.
Methods for Finding Damaged Objects
There are a few tried and true methods for finding damaged objects. However, each method has its own benefits and drawbacks. After talking with IBM about the issue, here are the options I came up with.
1. Run an option 21, Full System Backup--A full system backup (FSB) checks all objects for damage before backup and it is a reliable damaged objects detector. Many production systems however run 24x7x365 and are seldom taken down for backup. That makes it difficult to use an FSB for damage detection.
2. Run the Retrieve Disk Information (RTVDSKINF) command--RTVDSKINF gathers information about your i OS library objects and stores it in the QCURRENT member of the QAEDISK file in the QUSRSYS library. RTVDSKINF performs the following actions when it runs across object damage.
- For a damaged or unusable object, the object is deleted and a system message is sent to the QSYSOPR message queue.
- For objects that are not addressable through an i OS library, those objects are moved to the proper library or to the QRCL library (the reclaim storage library).
RTVDSKINF will write one record for every library object that it finds. For damaged objects, it will place a '2' in the Status Indicator field of the QAEZDISK record for that object.
The biggest weakness I've found with RTVDSKINF is that it only checks library objects. It doesn't check system objects or objects in the AS/400 Integrated File System (AS/400 IFS).
3. Run the Reclaim Object Links (RCLLNK) command over the root directory (/) of the AS/400 IFS--RCLLNK searches all IFS objects for damage and allows you to specify what actions to take when a damaged object is found.
Interestingly enough, when I called IBM on this issue, they told me that they don't necessarily recommend using the Reclaim Storage (RCLSTG) command anymore to correct or delete inaccurate, unusable objects and fragments. The main reason was to reserve RCLSTG for specific situations where it is more specifically called for. Another consideration is that RCLSTG is a long-running function and there are several other ways to perform damaged object detection and repair. Both the RTVDSKINF and RCLLNK commands run many of the same diagnostics and object fixes that are run in RCLSTG. Besides the run-time difference, the other big difference between RTVDSKINF/RCLLNK and RCLSTG is that you don't have to put your system in restricted state to run RTVDSKINF and RCLLNK.
Putting It Together
Given the techniques in the last section, I put together the following program called FINDDAMAGE to detect object damage on an iSeries, System i, or a Power i system.
0003.00 DCL VAR(&COUNT) TYPE(*DEC) LEN(15 5) VALUE(0) /* +
0004.00 Count of the number of records in +
0005.00 QTEMP/QAEZDISK that indicate damaged +
0006.00 objects */
0008.00 DCLF FILE(QUSRSYS/QAEZDISK)
0010.00 RTVDSKINF /* Get information on non-IFS disk status */
0012.00 CPYF FROMFILE(QUSRSYS/QAEZDISK) +
0013.00 TOFILE(QTEMP/QAEZDISK) MBROPT(*REPLACE) +
0014.00 CRTFILE(*YES) INCREL((*IF DISTIN *EQ +
0015.00 '2')) /* check to see if there are any +
0016.00 records indicating damaged objects */
0018.00 OVRDBF FILE(QAEZDISK) TOFILE(QTEMP/QAEZDISK) +
0021.00 CHGVAR VAR(&COUNT) VALUE(0)
0025.00 MONMSG MSGID(CPF0864) EXEC(GOTO CMDLBL(NEXTSTEP))
0027.00 CHGVAR VAR(&COUNT) VALUE(&COUNT + 1)
0029.00 GOTO CMDLBL(GETIT)
0031.00 NEXTSTEP: IF COND(&COUNT > 0) THEN(DO) /* If a damaged +
0032.00 object record appears in QAEZDISK, send a +
0033.00 message to the QSYSOPR message queue */
0034.00 SNDPGMMSG MSG('Damaged object found on system') +
0035.00 TOMSGQ(*SYSOPR) MSGTYPE(*INQ)
0038.00 RCLLNK OBJ(/) SUBTREE(*ALL) DMGOBJOPT(*KEEP *KEEP) +
0039.00 /* Check the IFS to see if there are any +
0040.00 damaged objects */
FINDDAMAGE performs the following functions.
Lines 3.00 through 9.00 define a counter to be used in the program and declare that the program will use the QAEZDISK output file from the RTVDSKINF command.
Line 10.00 runs the RTVDSKINF command, which places object information about every i OS library object into the QCURRENT member of the QAEZDISK file.
Lines 11.00 through 17.00 use the Copy File (CPYF) command to create a subset copy of the newly created QAEZDISK file. This copy only contains records that contain a '2' in the Status Indicator field of the record (DISTIN). Remember that Status Indicator '2' for a QAEZDISK record indicates that the record describes a damaged library object. So what I'm doing here is creating a QAEZDISK file that only contains records for damaged objects. This QAEZDISK version resides in the temporary library QTEMP.
Line 18.00 contains an Override Database File command (OVRDBF) that specifies that any references to the full QAEZDISK file in QUSRSYS should be redirected to the damaged object version of QAEZDISK that I just created in the QTEMP library.
Lines 21.00 through 37.00 set up a read loop to look at all the records in the damaged objects version of the QAEZDISK file in QTEMP. If the program finds any records in QTEMP/QAEZDISK, that indicates there is a damaged object on the system and the program writes an inquiry message to the QSYSOPR message queue in lines 34.00 to 35.00, indicating that a damaged object was found. The inquiry message will be picked up by any process or employee monitoring the system, and my staff will know that there's a damaged object on the system.
Lines 38.00 to 40.00 run the RCLLNK command described above over the entire AS/400 IFS. This command runs RCLLNK in detect mode where any damaged usable or unusable objects are detected on the system without deleting them (Damaged object option field, DMGOBJOPT, is set to *KEEP *KEEP). If all of the AS/400 IFS objects have been reclaimed and all problems have been corrected, RCLLNK sends out a completion message. If RCLLNK finds one or more damaged objects in the AS/400 IFS, it will issue an escape message that will show up in my QSYSOPR message queue.
This program can be run any time of day or night. The system doesn't have to be in restricted state to use this code. These advantages provide a fairly reliable way of automatically looking for and identifying damaged objects before they can cause a problem on your system.
I've been running this program for a few weeks now and it's been fairly reliable. However, I'd like to know what you think and whether you think this program will actually find any damaged objects. Send me your feedback and ideas on how this program worked in your environment and whether it needs any modifications or tweaks. You can email me at the IT Jungle Web site Contact page.
Post this story to del.icio.us
Post this story to Digg
Post this story to Slashdot