Pipelined User-Defined Table Functions In DB2 For i
May 5, 2015 Michael Sansoterra
Pipes are a fantastic invention as they are useful for conveying a substance in an efficient and well directed manner. Whether used for transporting sewage, carrying oil, or even bottling a duck (if you’re old enough to remember “The High and the Flighty” Foghorn Leghorn cartoon featuring the Ace Novelty company’s “Pipe Full of Fun Kit #7”); our world wouldn’t be the same without them.
Not to be left out, starting in IBM i 7.1 TR9 and IBM i 7.2 TR1, DB2 for i now offers a “PIPE” statement that is used to quickly populate rows in a user-defined table function’s (UDTF) result set.
Creating a pipelined function is simple and very similar to creating a non-pipelined SQL table function. For a simple example, say you had a suite of related global variables that are referenced often throughout an application’s code base:
CREATE VARIABLE PRDLIB.ENVIRONMENT VARCHAR(20) DEFAULT 'Production'; CREATE VARIABLE PRDLIB.PRIMARY_OUTQ VARCHAR(10) DEFAULT 'PRT01'; CREATE VARIABLE PRDLIB.PDF_FOLDER VARCHAR(128) DEFAULT '/Production/PDFs'; CREATE VARIABLE PRDLIB.DEFAULT_COMPANY SMALLINT DEFAULT 1;
A typical approach to making these variables easily accessible in an application is to create a table function that will grab them all at once and return them in a single row:
CREATE OR REPLACE FUNCTION PRDLIB.ENVIRONMENT_VARIABLES() RETURNS TABLE ( ENVIRONMENT VARCHAR(20), PRIMARY_OUTQ VARCHAR(10), PDF_FOLDER VARCHAR(128), DEFAULT_COMPANY SMALLINT) CARDINALITY 1 DETERMINISTIC LANGUAGE SQL BEGIN RETURN SELECT * FROM (VALUES( ENVIRONMENT,PRIMARY_OUTQ,PDF_FOLDER,DEFAULT_COMPANY) ) ENV; END
The new PIPE statement provides an alternative way to code this:
CREATE OR REPLACE FUNCTION PRDLIB.ENVIRONMENT_VARIABLES_PIPE() RETURNS TABLE ( ENVIRONMENT VARCHAR(20), PRIMARY_OUTQ VARCHAR(10), PDF_FOLDER VARCHAR(128), DEFAULT_COMPANY SMALLINT) CARDINALITY 1 DETERMINISTIC LANGUAGE SQL BEGIN PIPE (ENVIRONMENT,PRIMARY_OUTQ,PDF_FOLDER,DEFAULT_COMPANY); RETURN; END
As you can see, PIPE simply inserts a row into the table function’s result set. It’s obvious that the number of values and data types passed in the PIPE statement must be compatible with the UDTF’s return table definition.
So what advantage does the PIPE statement offer DB2 coders? First, the syntax is easy to understand. Second, PIPE is advantageous in situations when variables (not table data), are the driving force behind the UDTF’s data. Without the benefit of PIPE, variable data needs to be preserved in a temp table or included in a SELECT statement. (If a UDTF will make use of a standard query from existing table(s), there is probably no need to use a pipelined function.)
The following COUNTER table function illustrates the benefit of PIPE by returning an arbitrary number of integer values:
CREATE OR REPLACE FUNCTION PRDLIB.COUNTER(@ROWS INT) RETURNS TABLE ( ROW_ID INT) LANGUAGE SQL CARDINALITY 50 DETERMINISTIC BEGIN DECLARE @I INT DEFAULT 0; WHILE @I>@ROWS DO SET @I=@I+1; PIPE (@I); END WHILE; RETURN; END;
Invoking the pipelined UDTF:
SELECT * FROM TABLE(PRDLIB.COUNTER(5)) COUNTER;
Which returns the following result set:
This COUNTER UDTF can also be implemented with a recursive common table expression, but in my opinion the pipelined function is easier to follow.
As always, specify the cardinality value when creating a UDTF to give DB2 an idea of the average number of rows it can expect to receive.
If you’re thinking that writing a pipelined SQL UDTF is similar to writing an external UDTF you are correct. They both insert one row at a time into the UDTF result set. Finally, like PIPE, external UDTFs can only return variables as row data.
For a final example, the following ParseData table function is a pipelined function that will split text data based on a given delimiter and return the result as rows. As you can see each new row is sent directly through the PIPE in lieu of dumping them into a temporary table first.
CREATE OR REPLACE FUNCTION QGPL.ParseData( @Data CLOB(1M), @Delimiter VARCHAR(12)) RETURNS TABLE (Row_Id INTEGER,FieldData varchar(2048)) LANGUAGE SQL BEGIN DECLARE @START INTEGER DEFAULT 1 NOT NULL; DECLARE @END1 INTEGER DEFAULT 0 NOT NULL; DECLARE @ROWID INTEGER DEFAULT 0 NOT NULL; DECLARE @FIELD VARCHAR(2048); IF @Data IS NULL THEN RETURN; END IF; SET @END1=LOCATE(@Delimiter,@Data); WHILE @END1>0 DO SET @FIELD=SUBSTRING(@Data,@START,@END1-@START); SET @ROWID=@ROWID+1; PIPE (@ROWID,@FIELD); SET @START=@END1+LENGTH(@Delimiter); SET @END1=LOCATE(@Delimiter,@Data,@END1+LENGTH(@Delimiter)); END WHILE; SET @FIELD=SUBSTRING(@Data, @START,LENGTH(@Data)-@START+1); SET @ROWID=@ROWID+1; PIPE (@ROWID,@FIELD); RETURN; END
Here is an example of the ParseData function’s usage:
SELECT * FROM TABLE(QGPL.ParseData('Pipelined|Functions|Are|Great','|')) x;
Since there is a similarity between pipelined SQL table functions and external table functions, I might as well address a misgiving I had about external UDTFs that IBM‘s Scott Forstie cleared up for me. Until recently, based on early external UDTF experience in V5R2, I thought that all external UDTFs would create a temporary (QTEMP) table under the covers and were therefore relatively slow. Scott responded that the temporary table issue is only true for external UDTFs processed through the CQE but not the SQE. Most SQL statements are now processed by the SQE so this is unlikely to be an issue.
As for performance, pipelined functions can be beneficial compared to a similar UDTF written using a temporary table. As Forstie says, “Even if a small temporary table is fully contained within memory, there is performance advantage for the PIPE statement since the overhead of library management, file management, and catalog management can be avoided.” The performance benefit will largely depend on how often the function is invoked.
Pipelined UDTFs are an asset to DB2 for i developers because they can potentially perform quicker, they are easier to code and they bring IBM i devs another feature that is available in other DB2 versions.