Guru Classic: Where Does Data Live? The New Basics
June 19, 2019 Jon Paris
Every time I teach some of the more recent additions to RPG, such as XML parsing or Open Access, I find that I need to include some “remedial” education on some of the data definition enhancements that have been made to the language over recent releases.
Most of these enhancements came into the language many years ago back in the days when D-specs were de rigueur, but if you had no immediate need for them, they may have passed you by. After all, even the most avid reader of this newsletter has probably forgotten most of what they read here 10 years ago if they didn’t use it. And yes, it really was that long ago that some of these features were added!
In deciding which tips to update as “Guru Classics,” I realized that with the change to free-form data declarations and related enhancements, now would be a good time to take a step back and revisit these “new basics” tips. In this first article, I’ll concentrate on some basics about how data is handled in RPG.
Where Does Data Live?
Most of us have been using RPG for so many years that we sometimes forget (or perhaps never learned) the fundamentals of how RPG organizes field storage. So let’s start there with a few basic points.
1. Within your program there is no such thing as a record. If you think about it, this makes sense. After all, you can have the same field name in multiple files. If a field was part of a “record” (think “data structure”), then this could not work. In practice when we read a record, the compiler generates code to move the content of each field from the record buffer into its individual storage location. Similarly, when a record is written, the compiler gathers the data from the individual fields and places it into the buffer. You have almost certainly noticed this behavior if you’ve ever stepped through a program in debug. When you hit, for example, a READ operation, you have to press F10 multiple times to move to the next statement. If you count, you’ll find that there is one step needed for each field in the record. (Just as a brief aside, you can avoid this behavior by coding the compiler directive OPTION(*NODEBUGIO) on your program’s Ctl-Opt or H-spec.)
2. Fields that are consecutive in the record may not be consecutive in memory. I have seen people use a pointer and a based structure so that they could treat consecutive fields, such as monthly totals, as an array. This is a really, really bad idea. Sometimes it will work . . . for a while. Then it may suddenly stop working just as a result of a re-compile after a completely unrelated change. Yes, I have seen this happen. See point 4 below if you want to know how to do this the safe way.
3. Fields that are signed/zoned numeric in the database will normally be packed in your program. This occurs because the compiler defaults to packed decimal for all internally defined numeric fields for historical reasons. So when it moves a field’s data from the buffer to its internal memory location, it also converts it to packed. If your reaction to this statement is “that doesn’t happen in my programs”, then you’ll probably find the reason in item 4 below.
A similar thing happens to date fields. They are always stored internally in the default format for the program. This will be *ISO if nothing else is specified on the Ctl-Opt or H-spec. Note: It is the Ctl-Opt or H-spec that controls this, NOT a system value. It is important to understand this if you make extensive use of “real” dates in your programs.
For instance, suppose I have defined all the date fields on my database as having a *USA format. Date fields are actually stored on disk as a binary day count and are converted to the specified format (*USA in this case) before the data is handed over to the program. If no default date format is specified for the program, then as each date field is moved into its memory location it will end up being converted again — this time from *USA to *ISO. Two conversions for every date! And that is just in a read operation — the process is reversed on an update. So it is hardly surprising that some people get the idea that real date fields perform poorly. If you understand that this is going to happen, you can avoid it either by using the default of *ISO for all date fields, or through judicious use of a Ctl-Opt or H-spec DATFMT entry to ensure that the internal and external formats match.
4. You can guarantee that a field maintains its data type and occupies a known location in memory by specifying it in a data structure. Any data structure. It doesn’t have to be externally defined. It is also not necessary to specify the field’s length or data type. Just specify the field name and the compiler will sort it all out.
If you need to redefine a series of fields in a table row as an array, this is the way to do it, and it will work even if they are not consecutive in the record! The small code sample below shows how this works.
Here’s the DDS for the record layout:
R SALESREC1 CUSTOMER 4 Q1SALES 7S 2 Q1QTY 5S 0 Q2SALES 7S 2 Q2QTY 5S 0 Q3SALES 7S 2 Q3QTY 5S 0 Q4SALES 7S 2 Q4QTY 5S 0
And here are the data declarations:
Dcl-Ds SalesData; Q1SALES; Q2SALES; Q3SALES; Q4SALES; SalesForQtr Pos(1) Like(Q1SALES) Dim(4); End-Ds;
Note: There’s a new definition keyword that was introduced in 7.4 (and PTF’d into 7.3) that avoids the use of the hard-coded position required for the POS keyword. This is SAMEPOS and gives you a soft-coded method to define the start position of the array. So the definition for SalesForQtr can now be coded like this:
SalesForQtr SamePos(Q1SALES) Like(Q1SALES) Dim(4);
For more details on the SamePos keyword see this tip Guru: 7.4 Brings New RPG Goodies
As you can see from this extract from the compiler’s Xref listing, the QnSALES fields have retained their data type (S) whereas the QnQTY fields have been redefined internally as packed.
CUSTOMER A(4) Q1QTY P(5,0) Q1SALES S(7,2) Q2QTY P(5,0) Q2SALES S(7,2) Q3QTY P(5,0) Q3SALES S(7,2) Q4QTY P(5,0) Q4SALES S(7,2) SALESDATA DS(28) SALESFORQTR(4) S(7,2)
This tendency for signed/zoned fields to change to packed often causes problems for folks when they first start prototyping program and procedure calls. Which brings us to point number 5.
5. Always be explicit; define the data types and sizes of parameters in prototypes. Generally speaking, if you use the LIKE keyword to define individual fields, you are asking for problems. Suppose that when I originally wrote the prototype I specified LIKE(Q1QTY). That would result in the parameter being defined as packed. But what if a change was made to the program so that Q1QTY was placed in a data structure? Because the prototype used the LIKE keyword it now dictates that the parameter be signed. Think the programmer would anticipate this change? My experience is that they would not. In fact, such changes often end up with questions being posed on Internet lists. Luckily this is exactly the kind of thing that prototypes defend us against so at least we know about the problem. In the “bad old days” of CALL/PARM, we wouldn’t find out about the problem until run time. And sometimes not even then. I should add at this point that I do make frequent use of the LIKEDS and LIKEREC keywords in prototypes since they do not suffer from these problems.
6. Be even more explicit and specify initialization for numeric and varying length character fields in data structures. The default initialization for a data structure is blanks. This will include any numeric or varying length subfields within that DS. The problem with numeric fields is fairly obvious, blanks are not the same as zeros and can cause errors.
The issue with varying length fields is less obvious because blanks are perfectly valid in a varying length field. The problem arises because the first character portion of the actual field is preceded by a 2 (or 4) byte binary length and blanks in that portion of the field result in an invalid field length.
The easiest way to deal with this is to specify the INZ keyword on the data structure definition line. This will cause the compiler to initialize all fields in the data structure to their appropriate default values. If for some reason you don’t want to do this, then code an explicit INZ on each numeric and varying length field.
7. Avoid decimal data errors by using data structure I/O. Back in V5R2, IBM introduced the ability to perform I/O operations on externally described files into data structures. It has always been possible to perform such I/O operations with program-described files, but V5R2 was the first time it could be done for externally described files. I won’t go into the basic mechanics here, as Ted Holt covered them here in 2007, but Ted did not mention this aspect of the feature.
As I noted earlier, when a record is read, the individual fields are copied from the buffer to their storage location. But that’s not the case when we use data structure I/O. When we use data structure I/O, then the entire record is copied as a stream of bytes from the buffer to the DS. Since the data is being processed at the byte level, no decimal data error occurs during the read. Now that is nice!
Of course, should you subsequently reference the field in question all bets are off. But, at least you will know exactly which field is in error and you can use MONITOR to trap the error and take corrective action. That way, the user never sees the old green screen of death, and that’s the way it should be. In my experience, most of the time when debugging a decimal data error problem, you inevitably discover that the field wasn’t logically going to be used anyway, and that made the error even more frustrating.
Jon Paris is one of the world’s foremost experts on programming on the IBM i platform. A frequent author, forum contributor, and speaker at User Groups and technical conferences around the world, he is also an IBM Champion and a partner at Partner400 and System i Developer. He hosts the RPG & DB2 Summit twice per year with partners Susan Gantner and Paul Tuohy.