Guru Classic: The Efficiency of Varying Length Character Variables
August 14, 2019 Jon Paris
Remember the bad old days when dinosaurs still roamed the earth and the only way to build strings in RPG involved playing silly games with arrays? Or worse still, obscure combinations of MOVE operations? Thankfully those days are far behind us — although sadly there are still a few RPG/400 dinosaurs coding away!
RPG IV introduced many powerful new string handling options, such as the %TRIMx family of BIFs, but even now there are capabilities in the language that few programmers fully exploit. One of my favorites is variable length fields. This lack of familiarity made this tip an obvious choice to update in an age where we are frequently tasked with building CSV, JSON, XML, and HTML strings. There are many good reasons to use these varying length fields in such cases, but in this tip we’re going to focus mainly on performance.
For those of you unfamiliar with varying length fields, the following definition (A) shows how they are defined.
(A) dcl-s varyField varChar(256) Inz;
Under the covers, varying length fields have two components: the current length that is represented by a 2-byte integer in the first two positions, followed by the actual data. In today’s RPG they are defined by the VARCHAR keyword. Back in the days of D specs they were identified by adding the keyword VARYING to a regular character definition. Actually to say that they have a 2-byte length is not true in all cases. Version 6 heralded an increase in maximum field lengths with the result that while varying length fields up to 65,535 characters in length have a 2-byte length, longer fields need to use a 4-byte length to accommodate the length. The programmer has no need to be concerned with this however.
You should train yourself to always code the INZ keyword to ensure that the length field is set correctly. This is critical when varying length fields are incorporated in data structures. Why? Because by default, data structures are initialized to spaces (hex 40) and that causes havoc when interpreted as the field length!
Whenever the content of a varying length field is changed, the compiler automatically adjusts the associated length to reflect the new content. Note that you should always use %TRIMx when loading data from a fixed length field into a varying length field, otherwise any trailing and/or leading blanks will be counted in the field length. Any time you want to know how long the field is, use the %LEN() built-in function to obtain the current value.
Now that we’ve reviewed the basics of variable length fields, let’s see how they can be used to boost the performance of some types of string operation. Take a look at the following two pieces of code. Both of them build a string of 100 comma separated values. At first glance there is very little difference in the logic, but would you believe that the second one can run hundreds or even thousands of times faster?
dcl-c baseString 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'; dcl-s fixedField char(10); dcl-s longFixed char(2000); dcl-s longVarying varchar(2000); For i = 1 to 10; For j = 1 to 10; (B) fixedField = %Subst(baseString: i: j ); (C) longFixed = %TrimR(longFixed) + ',' + fixedField; EndFor; EndFor; For i = 1 to 10; For j = 1 to 10; fixedField = %Subst(baseString: i: j ); (D) longVarying += ',' + %TrimR(fixedField); EndFor; EndFor;
The reason is simple. The second one (D) makes use of a varying length field to build up the result string! This difference in speed is easy to understand if you think about what is going on under the hood. The first version (C) uses a fixed length target string so these are the steps that take place:
- Work out where the last non-space character is.
- Add the comma in the next position.
- Add the content of fixedField in the next and subsequent positions.
- If longFixed is not yet full, add blanks to fill it.
This process is repeated for each new value added to the string. Notice that having carefully padded the string with blanks (step 4), the very next thing we do (step 1) is to work out how many there are so that we can ignore them!
Contrast this with the mechanics of the second version using the variable length field (D):
- Increment the field length by 1, and place the comma in that position.
- Determine the length of the field to add (i.e., ignoring trailing spaces).
- Copy that new data in starting at the field length + 1 position incrementing the field length.
Much simpler! And the resulting speed differences can be staggering. In tests I ran while preparing this tip, even with a target field length as small as 256 characters, the varying length field version took only half the time of the fixed length version. When I raised the field length to 25,600, which is a much more realistic size when building a CSV, HTML or XML string, the speed difference rose to 1,300 to 1!
Another point to consider is that the code shown above (C, i.e., the “slow” version) is already much more efficient than much of the code I have seen in customers’ programs. The two variants below are both very common and both even less efficient. In the first case (E) the field being added is being trimmed of blanks, which are immediately added back if it does not fill the target field! In the second case (F) the separation of the two functions means that the calculations for the effective length of the target field and the subsequent blank filling occur twice for each loop. You can imagine what that does to the speed. And yes, I have seen cases where people combine both E and F!
(E) longFixed = %TrimR(longFixed) + ',' + %TrimR(fixedField); (F) longFixed = %TrimR(longFixed) + ','; longFixed = %TrimR(longFixed) + fixedField;
That’s all for this first look at variable length fields. In this follow-on tip, I describe their uses and abuses in the database.
P.S. For those of you wondering what the purpose of the code at (B) is, it is simply used to generate fields of different effective lengths (one to 10 characters) to act as the test data to be added to the target string.
Jon Paris is one of the world’s foremost experts on programming on the IBM i platform. A frequent author, forum contributor, and speaker at User Groups and technical conferences around the world, he is also an IBM Champion and a partner at Partner400 and System i Developer. He hosts the RPG & DB2 Summit twice per year with partners Susan Gantner and Paul Tuohy.