Subprocedure Return Values–Food for Thought
September 30, 2009 Jon Paris
Those of us who have adopted subprocedures as a way of life often use their ability to return a result without thinking too much about the possible implications. But no technology should be thrown blindly into production without some understanding of the underlying tradeoffs.
Ted Holt has previously written tips on the performance aspects of parameter passing (Parameter Passing and Performance) and on the basics of subprocedure performance (Performance of Function Subprocedures). In this article, I want to take that discussion one step further because I find that many people still fail to appreciate the potential performance implications of certain types of subprocedure return values.
While the primary determinant for our coding style should always be readability and maintainability, nevertheless, it is important to understand that there are ramifications to our design choices. Failure to understand this can result in significant performance issues as the application evolves over time. So in this tip we are going to review the performance implications of some of the choices that we might make. In order to do that we need to consider some of the underlying mechanics of returning a value from a subprocedure.
When a value is returned by a subprocedure, just how does that value get back to the caller? The answer is that it is first copied into a temporary storage area normally referred to as the stack. Once control is returned to the caller, it will typically have to be copied again, this time from the stack into its final destination. For example, when our subprocedure codes “RETURN TheAnswer;” the contents of the variable TheAnswer are copied onto the stack. If the caller originally invoked the subprocedure by coding “TheAnswer = MySubProc( Parm1, Parm2);” then that data must them be copied from the stack into TheAnswer. The overhead of doing this is negligible when dealing, for example, with an account number validation routine that returns an indicator, or even when using a routine that returns a single record. But it has the potential to cause problems when returning with, say, 50 records at a time. It may not be a performance issue when the application is first deployed, but what if a subsequent programmer decides to increase the number to 100 or 500? An apparently trivial change, but what about the performance issues? Will they even think about it? I doubt it.
I tend to only use return values for small amounts of data. Of course the term “small” is somewhat vague, so what do I mean by it? The rule of thumb I use is:
To check the speed differences, I tested using three small subprocedures. The first simply returns a field of 32,000 characters. The second has no return value, and returns 32,000 characters as a parameter. The third simulates the return of a count of the number of items in the result set as well as returning the actual data as a parameter. You can see the prototypes below:
D TestProc1 Pr 32000a D TestProc2 Pr D result 32000a D TestProc3 Pr 10i 0 D result 32000a
Within the subprocedures, no attempt is made to actually calculate any values. Rather, the intent is simply to measure the overhead of the different call mechanisms.
So is there a significant difference in the performance of the two approaches? My initial tests with 1,000 iterations showed that it took over 650 times as long to return the 32,000 characters as a return value, compared with returning them as a parameter. I must confess that while I was expecting the difference to be significant, I was surprised by its magnitude. Testing with larger sample sizes further emphasized the difference.
I should also mention that the size factors that impact the performance of return values, also apply to passing parameters to subprocedures by value. VALUE is a tempting keyword to use in parameter definition because passing by value guarantees that the called procedure receives a copy of the data. As a result there is no possible way that code in the subprocedure can unexpectedly change the original value. This is not true when using CONST, as in that case no copy of the data is made when the data type and size of the parameter match the requirements defined in the prototype. So when considering how to pass parameters, the same criteria (i.e., OK for small parameters, but think before using big ones) should be applied when considering your choice of parameter passing methods.
So there you have it. While the power of the hardware we use today means that we rarely need to consider performance to the extent that we used to, ignoring it completely when designing procedure interfaces is not a good idea. A poor design for a routine that is called thousands of times a second can leave you in a world of pain.
If you would like to experiment yourself with the code used to perform these tests, contact me via my Web site (http://www.Partner400.com) and I will be more than happy to send you a copy to play with.
Jon Paris is one of the world’s most knowledgeable experts on programming on the System i platform. Paris cut his teeth on the System/38 way back when, and in 1987 he joined IBM’s Toronto software lab to work on the COBOL compilers for the System/38 and System/36. He also worked on the creation of the COBOL/400 compilers for the original AS/400s back in 1988, and was one of the key developers behind RPG IV and the CODE/400 development tool. In 1998, he left IBM to start his own education and training firm, a job he does to this day with his wife, Susan Gantner–also an expert in System i programming. Paris and Gantner, along with Paul Tuohy and Skip Marchesani, are co-founders of System i Developer, which hosts the new RPG & DB2 Summitconference. Send your questions or comments for Jon to Ted Holt via the IT Jungle Contact page.