Designing DB2 for i Stored Procedures for Simulated Array Handling
August 5, 2009 Hey, Mike
I enjoyed your article on using .NET to access DB2 on the iSeries. I am trying to extend the functionality of our ERP software with VB.NET 2005/2008. The data is on the iSeries. I plan to use DB2 stored procedures for data retrieval. One of my main stumbling blocks is how to pass a variable number of parameters to a stored procedure. For instance, let’s say a project manager is responsible for 10 projects that have specific project numbers. He can query on one project, all 10 projects, or any variable number of projects. How does one design a stored procedure so that it accepts a variable number of parameters? Also, if you have any tips on how to enhance performance, I would find that information very helpful.
“How does one design a stored procedure to accept a variable number of parameters?” That is a fantastic question! Passing all the required project numbers at once is better than calling a stored procedure for each project number. This is especially true in a client/server environment where it is imperative that the number of trips to the database server be minimized.
After monkeying with various options over the years, the solution I like the best is to use a delimited list. This concept involves taking a variable number of related data items (such as project IDs, order numbers, item numbers, etc.) and concatenating them into one long delimited character variable. The delimiter can be a pipe, tilde, comma, tab, or space. The only catch is this delimiter character will never appear within the data values. Then, this one long character variable (containing a variable number of data elements) can be passed as a single parameter to a stored procedure.
When the stored procedure receives the delimited list of items as a single parameter, I use a table function to convert the delimited list string into a row set, which can subsequently be joined to application tables and views.
It just so happens that Four Hundred Guru once illustrated this technique in a tip called Delimited List Processing in SQL. Review it and the LISTUDTFR RPG table function for more information.
Assuming you have downloaded and created the ListUDTF table function outlined in this tip, here’s an example of how a list of project numbers delimited by a pipe (|) can be converted to a row set.
Select * From Table(ListUDTF('PROJ1|PROJ2|PROJ3','|',9999)) As ProjectList
This query will return the following result set:
The first parameter in the table function is a delimited list. The second parameter is the delimiter character(s), and the third parameter is the maximum number of rows the table function should return.
This temporary result set can be joined to a project master table as follows:
Select ProjectMaster.* From Table(ListUDTF('PROJ1|PROJ2|PROJ3','|',9999)) As ProjectList Join ProjectMaster On ProjectMaster.ProjectId=ProjectList.Item
Putting this statement in the context of a DB2 SQL stored procedure that returns a result set to the caller, we get code that looks something like this:
Create Procedure ProjectList (@Projects IN VarChar(1000)) Language SQL Result Sets 1 Set Option Commit=*None, UsrPrf=*Owner, DatFmt=*ISO Begin Declare Project_Cursor Cursor With Return To Caller For Select ProjectMaster.* From Table(ListUDTF(@Projects,'|',9999)) As ProjectList Join ProjectMaster On ProjectMaster.ProjectId=ProjectList.Item; Open Project_Cursor; End;
Of course, you may want to enhance this procedure and add an ORDER BY clause or additional parameters to restrict the query’s results even further.
A .NET program (or any other language) can easily accommodate this scenario by creating a delimited list of project IDs in a string variable and passing this variable as a parameter to the stored procedure.
In a nutshell, that’s how I like to tackle the issue of passing a variable number of data items.
A second dimension to your issue involves performance. SQL statements usually perform better when they’re pre-compiled. But allowing for a variable number of parameters often requires using a dynamic query rather than a pre-compiled query. Nevertheless, there are a few things you can do to try to tweak performance.
Option 1: Add a Cardinality Value to the Table Function Definition
The CREATE FUNCTION statement used to define the LISTUDTF table function (as presented in the Delimited List Processing in SQL tip) omits the CARDINALITY option. CARDINALITY is used to give DB2 a hint as to how many rows it can expect the user-defined table function to return. Normally DB2 can examine row count statistics on physical files, indexes, etc., to formulate a plan on how it should retrieve the data. But how will DB2 know how many rows to expect from an RPG program without an explicit hint?
Create Function xxxxx/ListUDTF (parmList VarChar(24576), parmDelim VarChar(10), parmMaxItems Integer) Returns Table(ItemNo Integer, Item VarChar(128)) External Name 'xxxxx/LISTUDTFR(PROCESSLIST)' Language RPGLE Disallow Parallel No SQL Parameter Style DB2SQL Deterministic CARDINALITY 12
In this illustration, when ListUDTF is used in a query, DB2 makes its plan based on an estimate of 12 rows being returned, on average, from the table function. CARDINALITY is not meant to be a hard and fast number, but specifying the average number of rows to be returned is meant to help DB2.
Be careful, though, because ListUDTF is an all-purpose function and could return drastically different row counts depending on the application. If this is the case, you can create the function multiple times (using slightly different names) for different cardinalities you may need.
In the case of a small number of rows, specifying a cardinality value may not help much. But it is one more piece of information that DB2 can use to formulate an execution plan for the query.
Option 2: Build a Procedure With a Pre-Defined Number of Parameters
Another way to tackle the performance problem is to build a stored procedure with the largest anticipated parameter count needed. For example, if 10 project numbers is the most you’ll ever need, then code the procedure as follows:
Create Procedure ProjectList ( @Project1 IN VarChar(10), @Project2 IN VarChar(10), @Project3 IN VarChar(10), @Project4 IN VarChar(10), @Project5 IN VarChar(10), @Project6 IN VarChar(10), @Project7 IN VarChar(10), @Project8 IN VarChar(10), @Project9 IN VarChar(10), @Project10 IN VarChar(10)) Language SQL Result Sets 1 Set Option Commit=*None, UsrPrf=*Owner, DatFmt=*ISO Begin Declare Project_Cursor Cursor With Return To Caller For Select ProjectMaster.* From ProjectMaster Where ProjectId In (@Project1,@Project2,@Project3, @Project4,@Project5,@Project6, @Project7,@Project8,@Project9, @Project10); Open Project_Cursor; End;
If ProjectMaster has a unique index or primary key on the ProjectId column, this technique will get the best performance, with the caveat that needing additional parameters will require changes in both the client and server environments. In other words, getting the best performance means telling the pre-compiler what it can expect beforehand and all programs involved need to conform to these expectations. In this case, though, I’d favor the versatility of the dynamic list approach rather than striving for maximum performance with pre-determined parameters.
In summary, the need to pass a variable number of parameters (or better, a variable number of related data elements) is really equivalent to the need to pass a dynamically sized array parameter. SQL doesn’t have array processing capabilities, but array processing can be emulated using a contrived list of values in a delimited string along with a table function like the ListUDTF function presented here.
Michael Sansoterra is a programmer/analyst for i3 Business Solutions, an IT services firm based in Grand Rapids, Michigan. Send your questions or comments to Mike via the IT Jungle Contact page.