|
Parameter Passing and Performance
Published: June 20, 2007
by Ted Holt
There are three ways to pass a parameter to a procedure: by reference, by value, and by read-only reference. These methods are not interchangeable, and passing parameters by value can have unfavorable effects on performance. In the following paragraphs, I explain why I make such a statement, I show you how to define parameters for performance, I list the performance figures from my testing, and I provide some recommendations for parameter passing.
Before I go any further, let me get a couple of things out of the way. First, keep in mind that I am talking about you and the routines you write. If you call someone else's routine, you have to follow whatever convention they used. If they decided that all the parameters were to be passed by value, you'll have to pass them by value as well, no matter how you feel about it.
Second, I thank Barbara Morris of the RPG compiler team for answering questions I put to her regarding the passing of parameters. Any erroneous conclusions I may have reached are my own fault, of course.
Let's say you've decided to write a subprocedure to be used in ILE programs. Language doesn't matter, so I'm going to use RPG for the examples. You name your subprocedure DoIt, and it requires one parameter to do its job. Which method are you going to use to pass the parameter to the subprocedure?
By reference?
P DoIt b
D pi
D Type 2a
By value?
P DoIt b
D pi
D Type 2a value
By read-only reference?
P DoIt b
D pi
D Type 2a const
First, let's consider passing by value. The VALUE keyword on the parameter definition means that a caller routine makes a copy of the data that it wants DoIt to accept as a parameter. DoIt operates on the copy, rather than on the data itself. Look at the following example.
H dftactgrp(*no) actgrp(*new)
D ItemTypeCode s 2a
D DoIt pr
D Type 2a value
/free
*inlr = *on;
DoIt (ItemTypeCode);
return;
/end-free
* =========================================
P DoIt b
D pi
D Type 2a value
/free
// do some stuff
return;
/end-free
P e
The main routine invokes DoIt, passing a copy of ItemTypeCode in the first (and only) parameter. DoIt refers to the copy of ItemTypeCode as Type. If DoIt changes Type, the change occurs to the copy of ItemTypeCode, not to ItemTypeCode itself. (I do not like this ability to change parameters that are passed by value. I see assignments to such parameters as misleading, since it appears that a data value in the caller is changed, when in fact it is not changed. But this preference of mine is not applicable to the question of performance that I am attempting to address.)
Next, let's consider read-only reference. Here's the same example.
H dftactgrp(*no) actgrp(*new)
D ItemTypeCode s 2a
D DoIt pr
D Type 2a const
/free
*inlr = *on;
DoIt (ItemTypeCode);
return;
/end-free
* =========================================
P DoIt b
D pi
D Type 2a const
/free
// do some stuff
return;
/end-free
P e
The CONST keyword tells the caller to pass the address of ItemTypeCode to DoIt. However, DoIt is not allowed to modify the data it refers to as Type.
Obviously this method doesn't work if the subprocedure must change the parameter. In that case, you'd omit both VALUE and CONST keywords, which would be passing the parameter by reference.
H dftactgrp(*no) actgrp(*new)
D ItemTypeCode s 2a
D DoIt pr
D Type 2a
/free
*inlr = *on;
DoIt (ItemTypeCode);
return;
/end-free
* =========================================
P DoIt b
D pi
D Type 2a
/free
// do some stuff
return;
/end-free
P e
The main routine provides DoIt with a pointer to (i.e., the memory address of) ItemTypeCode. Anything that DoIt does to Type is really done to ItemTypeCode.
The first thing to consider, then, is whether or not the parameter is to be modified. If a subprocedure is to be able to modify the parameter, you must pass the parameter by reference.
But if a subprocedure does not modify a parameter, it's better to pass the parameter by value or by read-only reference. These last two parameter-passing mechanisms have two advantages over passing by reference.
- When you pass a parameter by reference, that parameter must be defined identically in the calling routine and subprocedure. When you pass a parameter by value or read-only reference, parameters in the calling routine and called routine do not have be defined identically, although they must be defined in compatible ways. For instance, it's OK for a caller to pass a three-digit packed-decimal number to a subprocedure that expects a seven-digit packed-decimal number.
- When you pass a parameter by reference, the caller must pass a variable. Passing a parameter by value or read-only reference allows the caller to pass literals and expressions to the subprocedure.
Does it matter, then, whether I pass parameters by value or pass by read-only reference? From a performance standpoint, I knew it must. After all, the system can allocate memory for a pointer much more quickly than it can allocate memory for a large character variable. I ran a few tests to get an idea of how the passing of parameters affects performance. The times given in the following tables are CPU seconds, as reported in the job log. This is hardly scientific, but plenty good enough for our purposes.
In the first version of my test program, I passed a 64-byte variable-length character to a subprocedure. I repeatedly invoked the subprocedure within a loop.
H dftactgrp(*no) actgrp(*new)
D BigString s 24a varying
D inz('I like cheese.')
D Index s 10u 0
D Limit s 10u 0
D inz(500000)
D DoIt pr
D inString 64a varying const
/free
*inlr = *on;
for Index = 1 to Limit;
DoIt (BigString);
endfor;
return;
/end-free
* =============================================================
P DoIt b
D pi
D inString 64a varying const
/free
return;
/end-free
P e
The following table shows execution times for various numbers of iterations.
|
Number of Iterations
|
Run time, CONST
|
Run time, VALUE
|
|
500,000
|
1
|
1
|
|
1,000,000
|
1
|
1
|
|
2,500,000
|
1
|
1
|
|
5,000,000
|
1
|
1
|
|
7,500,000
|
1
|
1
|
Table 1: Invoking a subprocedure that accepts a 64-byte variable-length character parameter.
Does it seem to make any difference which method you use?
I changed the parameter length from 64 bytes to 64 kilobytes (65535 bytes) and reran the tests. The next table looks a bit different from the previous one.
|
Number of Iterations
|
Run time, CONST
|
Run time, VALUE
|
|
500,000
|
1
|
22
|
|
7,500,000
|
1
|
282
|
Table 2: Invoking a subprocedure that accepts a 64 kilobyte variable-length character parameter.
Hmmm, passing by value doesn't look so good anymore, does it? Think about what's happening. Each time the caller invokes the subprocedure, it must first allocate memory for the parameter. When passing by read-only reference, it must allocate enough memory for a pointer, which is a matter of bytes. But when passing by value, it must allocate 64 kilobytes. Upon return to the caller, the system deallocates the memory. Allocating and deallocating 64 bytes is no big deal, but allocating and deallocating 64 kilobytes is.
I tried another test. This time, I passed an expression, rather than a scalar variable, to the subprocedure. In this case, the system has to do a bit more work before calling the subprocedure. Instead of passing a pointer to a variable, the system must evaluate the expression.
H dftactgrp(*no) actgrp(*new)
D BigString s 65535a varying
D inz('I like cheese.')
D Index s 10u 0
D Limit s 10u 0
D inz(500000)
D DoIt pr
D inString 65535a varying const
/free
*inlr = *on;
%len(BigString) = 65535;
for Index = 1 to Limit;
DoIt (%subst(BigString:32768:32768)+%subst(BigString:1:32767));
endfor;
return;
/end-free
* =============================================================
P DoIt b
D pi
D inString 65535a varying const
/free
return;
/end-free
P e
The next table shows how passing by read-only reference and passing by value handled the extra overhead.
|
Number of Iterations
|
Run time, CONST
|
Run time, VALUE
|
|
500,000
|
10
|
28
|
|
7,500,000
|
147
|
427
|
Table 3: Invoking a subprocedure, passing a 64-kilobyte expression as the first parameter.
Runtime went up--way up. However, it went up much more when the parameter was passed by value.
What did I learn from my tests?
- The size of the parameter is important. It doesn't matter whether you pass small values, such as integers, by value or by address. I'm not sure what data types we should consider to be small values. Integer, unsigned integer, floating point, packed-decimal, zoned-decimal, date, time, timestamp, and indicator variables are all small, so either method should work for them. The same should apply to short character variables. I don't have any scientific basis for saying this, but I don't think I'd pass a character variable longer than 1096 bytes by value. I also don't think I will henceforth automatically make all variable-length character parameters 64 kilobytes long, just in case I ever need to operate on a really big string. I'm reminded of what my friend Cletus the Codeslinger tells his kids, "The world is full of morons. Try not to be one of them."
- Consider the frequency of the call. If a subprocedure is called only once in a program, performance is probably not critical. However, if a subprocedure is called once for each record of a 350,000-record file, passing parameters by value may very well slow the job noticeably.
- I see a strong case for avoiding pass-by-value completely. Read-only reference provides the same benefits as pass-by-value with a smaller performance penalty.
There is more to this performance issue than what I have covered here. I will get back to you with more information, maybe even by next week.
Post this story to del.icio.us
Post this story to Digg
Post this story to Slashdot
|