Fix Decimal Data Errors

June 13, 2007 Bruce Guetzkow

Note: The code accompanying this article is available for download here, and it has been updated since it originally ran [Updated 06/13/07]

I have seen requests many times in the forums from programmers asking how best to handle data with invalid decimal data. Typically this type of error is first discovered when a program ends abnormally. I have found this problem occurs most often with data received from outside sources: customers and vendors. Many times this data comes from systems other than a System i. I’ve developed a command to identify and (optionally) “fix” errant data.

What’s in a Number?

First, what is invalid decimal data? This is non-numeric data in a numeric field. For example, a field defined as “5s 0” should contain a number in the range of -99999 to +99999. The data is invalid if it contains values such as ABCDE or 123A6. Of course in the case of a zoned-decimal field such as this one, it is acceptable for the last (and only the last) position to contain a value that appears to be alphabetic. This is because the last position of the field identifies the sign.

To consider numeric values properly you must think in terms of the hexadecimal (hex) representation of the field. The first four bits of each byte are called the “zone” and the last four bits the “digit”. Values for each zone and digit can range from 0 to 9 and A to F for the values 0 to 15 in the base-16 (hexadecimal). The base-10 (decimal) numbers 0 to 9 are defined for zoned-decimal fields in hex as F0-F9, respectively. The last byte identifies the sign of the value. For this byte only, the zone portion can be either a C or F for positive values and a D for negative. The hex representation of the zoned-decimal value 123 stored in a “5s 0” field is: F0F0F1F2F3 or F0F0F1F2C3.

Packed fields are stored differently. Normally packed fields have an odd length, such as “7p 0” or “9p 2”. The physical length of the stored data is: (scale of field + 1) / 2. For example, a “7p 2” field is stored in (7 + 1) / 2 = 4 bytes. For packed fields each zone and digit contains a position of the value except for the right-most digit which contains the sign. If the number 123 is stored in a “5p 0” field, the hex value is 00123F or 00123C. Valid values for all zones and digits must be in the range 0-9, except for the last digit which must be F or C for positive values or D for negative.

Identifying Errors

How do we identify invalid data? This depends on the definition of the file fields. If a byte contains the hex value C1 (the letter “A”) it would be valid for any alphanumeric field, but may be valid or invalid for part of a numeric field, depending on the position of the value and the definition of the field. In order to determine validity you need to know the field definitions for a file and the hex representation of the data.

This is where the FIXDECDTA command comes in. The source consists of five members:

FIXDECDTA–type CMD
FIXDECDTA–type CLLE (Command Processing Program)
FIXDECDTA1–type REXX
FIXDECDTA2– REXX
FIXDECDTA3–type REXX

As you can see I’ve chose to use REXX for some of the steps as it has functions that easily convert data from character to hex. It also doesn’t have any fixed-length restrictions concerning data files so I don’t need to know the layout of any file prior to executing a REXX procedure.

The command FIXDECDTA has seven parameters:

INPUT–the file being interrogated
INPUTMBR–the input file member
OUTPUT–the resulting “fixed” file
OUTPUTMBR–the output file member
FORMAT–the file whose field definitions are used to determine validity
MBROPT–are records added to the output file or is the file replaced
ERRRPT–is an error report created

The only parameter with no default value is the input file name. The default value for the input member is *FIRST, but you can change that to a specific member name if you wish. You can specify a file to write to (output), use the input file as your output (replacing the original data) or specify *NONE if you don’t want any records written to a file. Data is “fixed” by replacing it with zeros in the format based on the field definition. The output member can either be *FIRST, the same as the input file member, or you can indicate a member name. Regardless, the output file and member must exist before executing this command.

The format file can be the input file, the output file or any other existing file of your choice. This file is used for its field definitions only. The remaining parameters identify whether output is added to the output file or if it replaces it, and whether you want to create an error report. You must specify either an output file (not *NONE) and/or *YES for an error report. You also cannot specify *OUTPUT as the format if the output file name is *NONE. If either of these conditions occurs you will receive the generic error message CPD0150 (Requirements between parameters not satisfied.). If you would like more descriptive messages you can change the DEP (Dependent Definition) statements to reference specific message IDs and indicate the message file when creating the command (CRTCMD).

Command Processing

The command processing CLLE program FIXDECDTA begins with the command parameters and additional field definitions. Before compiling this member you will need to change the last DCL (Declare CL Variable) statement to contain the name of the library where your REXX source resides. The input file and member parameters are validated to make sure that it exists and that it is not empty. Next the output file and member defaults are replaced with actual values (if specified) and validated. The same is also done for the format file.

The format file field definitions are then placed into a temporary file. Now REXX takes over. The REXX External Data Queue is cleared to make sure there are no leftover entries from other processes. This External Data Queue is similar to other data queues, but like the *LDA (Local Data Area) it is unique to each job. It is a useful tool for communication between REXX procedures and APIs exist for other HLL languages to use it as well.

The first REXX procedure (FIXDECDTA1) uses the field definitions as input and writes them to the External Data Queue. Procedure FIXDECDTA2 has several inputs. First it reads through the External Data Queue entries and saves the field definitions for later use. It then reads through the input file validating numeric fields. At this time the command only validates zoned-decimal and packed fields. Data is written to the output file and/or to the error report depending on flags passed into this procedure based on the appropriate command parameters.

The last REXX procedure, FIXDECDTA3, is used to generate a simple error report if one was requested. The report identifies the input file and lists the relative record number (RRN) and field name of each field that had invalid numeric data. The final step appends or replaces data in the input file if OUTPUT(*INPUT) was specified.

The command issues two fatal error messages which can be monitored for with the MONMSG (Monitor Message) command. The first, CPF9897, is sent when invalid parameters are encountered, such as specifying a file or member that is not found. When this message is sent processing is terminated before checking for invalid data. Message CPF9898 is sent when invalid decimal data is found.

Processing the Data

Now you can receive data from an outside source and verify that it contains valid numeric data. You should then be able to process it with confidence.

                     Post this story to del.icio.us
               Post this story to Digg
    Post this story to Slashdot