fhg
Volume 7, Number 43 -- December 12, 2007

Use SQL User-Defined Functions to Avoid Data Repetition Problems

Published: December 12, 2007

by Michael Sansoterra

By now, most System i and AS/400 developers know they can enhance the SQL language in DB2 for i5/OS by creating their own user-defined functions (UDFs). In this tip I'd like to focus on one common problem encountered when writing queries and how a simple UDF template can often solve this problem.

By way of review, in case you're not familiar with them, UDFs are pieces of code that can bolt onto SQL. UDFs usually contain business logic for doing complex tasks that would be difficult or cumbersome to do using the SQL built-in functions. There are a variety of UDFs, but for now we will only consider a scalar UDF, which is a UDF that accepts zero or more input parameters and returns a single output parameter.

Now let's review this typical problem encountered when writing queries. To begin, a user requests a simple report that is formatted for Excel.

Take the following sales order example:

Select OrderId,OrderDate, 
       CustomerId,ItemId,ItemDesc,Qty,ExtAmount
  From Order O
  Join OrderDetail D On D.OrderId=O.OrderId
Where D.ItemType='ELECTRONIC'
   And D.Qty>10

You finish the query, dump the data to Excel and you're done in under half an hour. Now the user comes back and wants to see all the serial numbers associated with each item number. You rewrite the query:

   Select OrderId,OrderDate, 
          CustomerId,ItemId,ItemDesc,Qty,ExtAmount,
          SerialNo
     From Order O
     Join OrderDetail D On D.OrderId=O.OrderId
Left Join Serial S On S.OrderId=D.OrderId
                  And S.ItemId=D.ItemId
    Where D.ItemType='ELECTRONIC'
      And D.Qty>10

Now the user is getting the desired serial numbers. However, the unintended side effect is that the detail line is getting repeated for each serial number. If a quantity of 11 is sold for an item, then theoretically there should be 11 serial numbers that give you the same detail line repeated 11 times. The order detail data is the primary focus of the query so repeating detail data to show serial numbers just creates a mess.

What can we do about this? Well, one solution is to let the user eliminate the duplicates in Excel, which can be time consuming. If you have a fancy Excel creation utility you may have the luxury of removing the duplicates, but usually that requires extra programming.

Another possible solution is to combine this ancillary repetitive data all into one column. In this scenario the serial number data would be grouped into one large column, each serial number being concatenated together by a delimiter of some kind. This solution will keep the order detail data unique so that data isn't repeated due to a join with the serial number file. A UDF is well suited to perform this task.

Shown below is a sample UDF written with the SQL procedural language that will receive the "OrderId" and "ItemId" as parameters. The serial numbers for the given order and item combination will be joined together (delimited by a space) and returned in one large variable character (VARCHAR) column.

Create Function GetSerialList
(@OrderId Int,
 @ItemId  Int)
Returns VarChar(1024)
Language SQL
Reads SQL Data
Set Option Commit=*None, UsrPrf=*Owner, DatFmt=*ISO 
Begin
    Declare @List VarChar(1024) Not Null Default '';
    Declare @i    Int           Not Null Default 0;

    For SerialList AS SerialNumbers Cursor For
    Select RTrim(SerialNo) As Serial
      From Serial
     Where OrderId=@OrderId
       And ItemId=@ItemId
    Do 
        Set @i=i+1;
        If @i>1 Then 
            Set @List=@List||'' ';
        End If;
        Set @List=@List||Serial; 
    End For; 
    Return List;
End

This function uses the SQL FOR statement as a control statement that provides a template for loop processing. Every row returned from the FOR statement's cursor is processed by any statements a developer wants to sandwich between the DO and END FOR statements. These statements can make use of declared variables or columns defined in the cursor's SELECT clause. If you have column expressions in your SELECT, make sure you assign alias names so that they can be used within the row processing code.

In our sample code, the only necessary functions to perform are to increment a counter variable (@i) and to append the serial number (column Serial) from the cursor to the @List variable. When all rows are processed, the FOR statement is exited automatically and the @List variable is returned as the function's result. In fact, this template can be summarized as follows in four easy steps:

  1. Declare counter and list variable
  2. Declare FOR statement with appropriate cursor
  3. Within the FOR statement build a delimited list
  4. Return the list

Now back to our original query. We can use the GETSERIALLIST UDF to give us all serial numbers in a single column:

Select OrderId,OrderDate, 
       CustomerId,ItemId,ItemDesc,Qty,ExtAmount,
       GetSerialList(OrderId,ItemId) As SerialNos
  From Order O
  Join OrderDetail D On D.OrderId=O.OrderId
Where D.ItemType='ELECTRONIC'
   And D.Qty>10

Now instead of duplicating rows we simply create a large column for holding repetitive data. Of course, depending on the type and length of repetitive data, jamming it all in a single column may not always work. However, users I have worked with most often welcome this solution. A few more examples are:

  • Combining multi-line order comments into a single column
  • Combining item numbers into a delimited list for an order summary query
  • Combining release dates for a blanket purchase order line item
  • Creating a list of possible vendors that can supply a given item

Using this simple UDF template to combine repetitive data into a single column often provides an easy and effective way to deliver data to the user without having to muck up a query with additional joins.

Michael Sansoterra is a programmer/analyst for i3 Business Solutions, an IT services firm based in Grand Rapids, Michigan. You can contact him through our contact page.




Copyright © 1996-2008 Guild Companies, Inc. All Rights Reserved.
Guild Companies, Inc., 50 Park Terrace East, Suite 8F, New York, NY 10034

Privacy Statement