The New Basics
Published: June 13, 2012
by Jon Paris
Every time I teach some of the more recent additions to RPG, such as XML parsing or Open Access, I find that I need to include some "remedial" education on some of the D-spec enhancements that have been made to the language over recent releases.
Most of these enhancements came into the language many years ago, but if you had no immediate need for them, they may have passed you by. After all, even the most avid reader of this newsletter has probably forgotten most of what they read here 10 years ago if they didn't use it. And yes, it really was that long ago!
So, I decided that it might be a good time to take a step back and review a few RPG basics in light of the changes that have occurred to the language over the years. Let's call these the "New Basics." Here we'll concentrate on some basics about data storage in RPG.
Where Does Data Live?
Most of us have been using RPG for so many years that we sometimes forget (or perhaps never learned) the fundamentals of how RPG organizes field storage. So let's start there with a few basic points.
1. Within your program there is no such thing as a record. If you think about it, this makes sense. After all, you can have the same field name in multiple files. If a field was part of a "record" (think data structure), then this could not work. In practice when we read a record, the compiler generates code to move the content of the fields from the record buffer into their individual storage locations. Similarly, when a record is written, the compiler gathers the data and places it into the buffer. You have almost certainly noticed this behavior if you've ever stepped through a program in debug. When you hit, for example, a READ operation, you have to press F10 multiple times to move to the next statement. (Just as a brief aside, you can avoid this behavior by coding the compiler directive OPTION(*NODEBUGIO) on your program's H-spec.)
2. Fields that are consecutive in the record may not be consecutive in memory. I have seen people use a pointer and a based structure so that they could treat consecutive fields, such as monthly totals, as an array. This is a really, really bad idea. Sometimes it will work. . . for a while. Then it may suddenly stop working just as a result of a re-compile after a completely unrelated change. Yes, I have seen this happen. And because the data that ended up being part of the array was all numeric, there was not even a "boom" to indicate a problem. Only when the end user noticed that the data in the report was garbage was the problem discovered.
See point four below if you want to know how to do this the safe way.
3. Fields that are signed/zoned numeric in the database will normally be packed in your program. This occurs because the compiler defaults to packed decimal for all internally defined numeric fields for historical reasons. So when it moves a field's data from the buffer to its internal memory location, it also converts it to packed.
A similar thing happens to date fields. They are always stored internally in the default format for the program. This will be *ISO if nothing else is specified on the H-spec. Note: The H-spec controls this, NOT a system value. It is important to understand this if you make extensive use of "real" dates in your programs.
For instance, suppose I have defined all the date fields on my database as having a *USA format. Date fields are actually stored on disk as a binary day count and are converted to the specified format (*USA in this case) before the data is handed over to the program. If no H-spec date format is in play, then as each date field is moved into its memory location it will end up being converted again this time from *USA to *ISO. Two conversions for every date! It is hardly surprising that some people get the idea that real date fields perform poorly. If you understand that this is going to happen, you can avoid it either by judicious use of an H-spec DATFMT entry.
4. You can guarantee that a field maintains its data type and occupies a known location in memory by specifying it in a data structure. Any data structure. It doesn't have to be externally defined. It is also not necessary to specify the field's length or data type. Just specify the field name and the compiler will sort it all out.
If you need to redefine a series of fields in a table row as an array, this is the way to do it, and it will work even if they are not consecutive in the record! The small code sample below shows how this works.
Here's the DDS for the record layout:
Q1SALES 7S 2
Q1QTY 5S 0
Q2SALES 7S 2
Q2QTY 5S 0
Q3SALES 7S 2
Q3QTY 5S 0
Q4SALES 7S 2
Q4QTY 5S 0
And here are the D-specs:
D SalesData DS
D SalesForQtr Overlay(SalesData)
D Like(Q1SALES) Dim(4)
As you can see from this extract from the compiler listing, the QnSALES fields have retained their data type (S) whereas the QnQTY fields have been redefined internally as packed.
This tendency for signed/zoned fields to change to packed often causes problems for folks when they first start prototyping program and procedure calls. Which brings us to point number five.
5. Always be explicit; define the data types and sizes of parameters in prototypes. Generally speaking, if you use the LIKE keyword to define individual fields, you are asking for problems. Suppose that when I originally wrote the prototype I specified LIKE(Q1QTY). That would result in the parameter being defined as packed. But what if a change was made to the program so that Q1QTY was placed in a data structure? Because the prototype used the LIKE keyword it now dictates that the parameter be signed. Think the programmer would anticipate this change? My experience is that they would not. In fact, such changes often end up with questions being posed on Internet lists. Luckily this is exactly the kind of thing that prototypes defend us against so at least we know about the problem. In the "bad old days" of CALL/PARM we wouldn't find out about the problem until run time. And sometimes not even then.
I should add at this point that I do make frequent use of the LIKEDS and LIKEREC keywords in prototypes since they do not suffer from these problems.
6. Be even more explicit--specify initialization for numeric and varying length character fields in data structures. The default initialization for a data structure is blanks. This will include any numeric or varying length subfields within that DS. The problem with numeric fields is fairly obvious, blanks are not the same as zeros and can cause errors.
The issue with varying length fields is less obvious because blanks are perfectly valid in a varying length field. The problem arises because the character portion of the field is preceded by a 2 (or 4) byte binary length and blanks in that portion of the field result in an invalid field length.
The easiest way to deal with this is to specify the INZ keyword on the data structure definition line. This will cause the compiler to initialize all fields in the data structure to their appropriate default values. If for some reason you don't want to do this, then code an explicit INZ on each numeric and varying length field.
7. Avoid decimal data errors by using data structure I/O. Back in V5R2, IBM introduced the ability to perform I/O operations on externally described files into data structures. It has always been possible to perform such I/O operations with program-described files, but V5R2 was the first time it could be done for externally described files. I won't go into the basic mechanics here, as Ted Holt covered them here in 2007, but Ted did not mention this aspect of the feature.
As we noted earlier, when a record is read, the individual fields are copied from the buffer to their storage location. But that's not the case when we use data structure I/O. When we use data structure I/O, then the entire record is copied as a stream of bytes from the buffer to the DS. Since the data is being processed at the byte level, no decimal data error occurs during the read. Now that is nice!
Of course, should you subsequently reference the field in question all bets are off. But, at least you will know exactly which field is in error and you can use MONITOR to trap the error and take corrective action. That way, the user never sees the old green screen of death, and that's the way it should be. In my experience, most of the time when debugging a decimal data error problem, you inevitably discover that the field wasn't going to be used anyway, and that made the error even more frustrating.
That's it for our first "New Basics" lesson. If there are particular features of the language that you feel may have passed you by, please let me know and I'll look at covering them in future tips.
Jon Paris is one of the world's most knowledgeable experts on programming on the System i platform. Paris cut his teeth on the System/38 way back when, and in 1987 he joined IBM's Toronto software lab to work on the COBOL compilers for the System/38 and System/36. He also worked on the creation of the COBOL/400 compilers for the original AS/400s back in 1988, and was one of the key developers behind RPG IV and the CODE/400 development tool. In 1998, he left IBM to start his own education and training firm, a job he does to this day with his wife, Susan Gantner--also an expert in System i programming. Paris and Gantner, along with Paul Tuohy and Skip Marchesani, are co-founders of System i Developer, which hosts the new RPG & DB2 Summit conference. Send your questions or comments for Jon to Ted Holt via the IT Jungle Contact page.
Externally Described Database IO through Data Structures
Post this story to del.icio.us
Post this story to Digg
Post this story to Slashdot