• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • DB2 For i Scalar Function Performance Considerations

    March 5, 2014 Michael Sansoterra

    I love using user-defined functions (UDFs) in DB2 for i. They encapsulate business logic, provide for code modularity and improve code reusability. But they can also flush performance down the drain. So in this tip, I’d like to use a sample UDF to analyze how it can affect the performance of an SQL statement.

    For this illustration, I’ve exported the Microsoft SQL Server Adventure Works sample data to DB2 for i. There are three main tables to consider in this query; SalesOrderHeader (31465 rows), SalesOrderDetail (121317 rows) and ProductCostHistory (22680 rows; I dumped in some extra randomized data into this table). Figure 1 below shows the relationships between the tables including the Product table, which is not required.

    Figure 1–Sample Table Relationships.

    The goal of the query is simple: read the entire sales order history and show the order date, product ID, net product sales, product cost and gross profit. Simply joining the order header and detail tables returns most of the required info:

    SELECT soh.OrderDate,sod.SalesOrderId,
           sod.ProductId,OrderQty*UnitPrice As Sales
      FROM SalesOrderDetail sod
      JOIN SalesOrderHeader soh
       	ON soh.SalesOrderId=sod.SalesOrderId
    

    All that is left is to get the product cost and then calculate the net profit.

    The query would be trivial except for a snafu. Since the product cost varies with time, a lookup of the product cost in the product cost history table (based on order date) is required. Further, it would be nice if I could count on all of the date ranges in the product cost history table to be contiguous (i.e., no overlapping start dates and end dates for a given product). But alas, due to the sad state of the data in the database, the query has to be careful to fetch only one cost per product ID and date, lest an erroneous one-to-many situation cause too many rows be returned.

    To get the product cost, my first instinct is to use a UDF to do the cost lookup in the cost history table. The UDF is defined as follows:

    CREATE OR REPLACE FUNCTION GetProductCost(
    @ProductID INT,
    @EffectiveDate TIMESTAMP)
    RETURNS DECIMAL(19,4)
    LANGUAGE SQL
    RETURNS NULL ON NULL INPUT
    DETERMINISTIC
    DISALLOW PARALLEL
    NOT FENCED
    SET OPTION DATFMT=*ISO,USRPRF=*OWNER,COMMIT=*NONE
    RETURN (
    	SELECT StandardCost
      	FROM ProductCostHistory
     	WHERE ProductId=@ProductId
       	AND @EffectiveDate BETWEEN STARTDATE AND ENDDATE
     	FETCH FIRST 1 ROW ONLY
    );
    

    The savvy SQL reader will note that when using a FETCH FIRST clause there should be an ORDER BY clause to guarantee consistent results in the event that an effective date matches multiple date ranges in the table. I skipped that step for simplicity.

    Plugging the GetProductCost UDF into the query above to calculate the product cost and the net profit yields the following revised query:

    SELECT soh.OrderDate,sod.SalesOrderId,
           sod.ProductId,OrderQty*UnitPrice As Sales,
       	 OrderQty*GetProductCost(sod.ProductId,soh.OrderDate) AS Cost,
           OrderQty*(UnitPrice - GetProductCost(sod.ProductId,soh.OrderDate)) 
           AS NetProfit
      FROM SalesOrderDetail sod
      JOIN SalesOrderHeader soh
       	ON soh.SalesOrderId=sod.SalesOrderId
    

    I now admire the UDF, knowing that I can reuse it throughout the application and even hand it to a junior developer to use, being confident that the results will be consistent. But I wonder what the performance overhead is.

    I know that performance should be optimal as the query currently returns 120K rows and will return even more well into the future. Running the query with and without the function in place reveals that the UDF adds considerable time to the query with the large result set.

    Looking to tweak the performance, I know that RPG is very good at keyed lookups and wonder if an RPG function would perform better than SQL. So I wrote an RPG UDF called GetProductCostRPG that has this logic to approximate the SQL UDF’s logic:

    SetGt (ReqProduct:ReqDate) ProductCostHistory;
    ReadPE (ReqProduct) ProductCostHistory;
    If ReqDate>=StartDate
    And ReqDate<=EndDate
    And NOT %Eof(ProductCostHistory);
    	StandardCost=STAND00001;
    	Return;
    Else;
    	StdCost_NI=-1;
    EndIf;
    

    ReqProduct and ReqDate are the function’s input parameter values. STAND00001 is the StandardCost column from the ProductCostHistory table.

    The RPG code is compiled as a program object and the performance is better than the SQL UDF, which is an embedded SQL C program under the covers. But the improvement is not enough. Hoping for more performance gains, I reworked the RPG code as a service program and created function GetProductCostRPGSRV. The result was the performance went up a tad. Hopefully, IBM will one day give developers the option to specify a PROGRAM TYPE of MAIN (*PGM) or SUB (*SRVPGM) when creating SQL functions.

    While I love UDFs, I’m still not happy with how much they’re costing the query. So I looked for a way to write the query without a UDF. The following query will do the trick:

    SELECT soh.OrderDate,sod.SalesOrderId,sod.ProductId,
           StandardCost,OrderQty*UnitPrice As Sales,
       	 OrderQty*StandardCost AS Cost,
       	 OrderQty*(UnitPrice-cost.StandardCost) AS NetProfit
      FROM SalesOrderDetail sod
      JOIN SalesOrderHeader soh ON soh.SalesOrderId=sod.SalesOrderId
      LEFT JOIN LATERAL (
    	SELECT ProductId,StandardCost
      	FROM ProductCostHistory pch
     	WHERE pch.ProductId=sod.ProductId
       	AND soh.OrderDate BETWEEN STARTDATE AND ENDDATE
     	FETCH FIRST 1 ROW ONLY
    ) cost ON cost.ProductId=sod.ProductId
    

    Notice that the function’s logic is embedded in the LEFT JOIN as a nested table expression. The lateral keyword instructs the LEFT JOIN to run the nested table expression in a row by row fashion; this allows DB2 to correlate the nested table expression’s tables with the tables in the outer query. LATERAL is synonymous with TABLE.

    Now, performance of the statement without using UDFs is something I’m happy with! Table 1 below shows the results of the different methods used.

    Trial

    SQL
    Function

    RPG
    Function (Pgm)

    RPG
    Function (SrvPgm)

    No
    Function

     

    GetProductCost

    GetProductCostRPG

    GetProductCostRPGSRV

    SQL w LATERAL

    Trial 1

    10.894

    8.423

    7.750

    0.298

    Trial
    2

    10.860

    8.391

    7.751

    0.297

    Trial 3

    10.860

    8.421

    7.710

    0.296

    Average

    10.871

    8.412

    7.737

    0.297

    To measure the query run times without worrying about network latency, I measured the query run times in an aggregate query that returned one row. So basically all the processing is done in DB2 for i with minimal results given back to the client. The queries looked like this, with the various UDF names substituted as appropriate:

    SELECT SUM(Sales),SUM(Cost),SUM(NetProfit) FROM (
    SELECT soh.OrderDate,sod.SalesOrderId,
           sod.ProductId,OrderQty*UnitPrice As Sales,
       	 OrderQty*GetProductCost(sod.ProductId,soh.OrderDate) AS Cost,
           OrderQty*(UnitPrice - GetProductCost(sod.ProductId,soh.OrderDate)) 
           AS NetProfit
      FROM SalesOrderDetail sod
      JOIN SalesOrderHeader soh
       	ON soh.SalesOrderId=sod.SalesOrderId
    ) QueryTest
    

    Of course your results will vary. This test was done on a P50 with minimal other activity, so the performance differences could be starker on a P5 or P10 box with other processes running.

    It’s possible to conclude in this case that the UDFs are more trouble than they’re worth, but that would be a premature conclusion. There was something obvious I forgot when defining my functions; I didn’t include “NO EXTERNAL ACTION.” Here is the CREATE FUNCTION statement wrapper for the RPG service program version. NO EXTERNAL ACTION wasn’t originally specified.

    CREATE OR REPLACE FUNCTION AdvWorks.GetProductCostRPGSRV
    (@ProductID INT,
     @EffectiveDate TIMESTAMP)
    RETURNS DECIMAL(19,4)
    LANGUAGE RPGLE
    PARAMETER STYLE SQL
    DETERMINISTIC
    NO SQL
    RETURNS NULL ON NULL INPUT
    NOT FENCED
    DISALLOW PARALLEL
    NO EXTERNAL ACTION
    EXTERNAL NAME 'DEV/GETCSTR2(GETCOST)'
    

    DETERMINISTIC means that when the same input parameters are given to the function, the same output will always be returned from the function within the duration of current query execution. A UDF calculating the volume of a sphere, based on the input of the radius, is deterministic. A random number or current time UDF would not be deterministic because the UDF output can’t be determined from the input.

    Specifying NO EXTERNAL ACTION means that there is nothing happening in the function that is not under the database manager’s control. Examples of an “external action” would be sending a message, accessing the IFS, writing to a data queue, etc. If DB2 thinks an “external action” is possible within the function, then it will call the function every single time, even if its output is deterministic. NO EXTERNAL ACTION tells DB2 it may not need to run the function for every row in the query.

    When DETERMINISTIC and NO EXTERNAL ACTION are specified, DB2 has the option to cache the data from the UDF in order to improve performance, if it can. This is easy to show with Visual Explain.

    Figure 2 below is Visual Explain’s query plan representation for the query when the default of EXTERNAL ACTION is present. The highlighted gray box titled “complicated” is how Visual Explain represents the UDF in the query plan:

    Figure 2–The node representing the GetProductCost UDF execution is titled “complicated”.

    Figure 3 below shows Visual Explain’s execution plan representation for the query when the function has NO EXTERNAL ACTION specified. Notice that there are two new steps, cache and cache probe:

    Figure 3–The query plan has two new operators when NO EXTERNAL ACTION is enabled on the UDF.

    The results from the small change to the function definition are impressive. Table 2 shows the mammoth improvement in execution times:

    Trial

    SQL Function

    RPG Function (Pgm)

    RPG Function (SrvPgm)

    No Function

     

    GetProductCost

    GetProductCostRPG

    GetProductCostRPGSRV

    SQL
    w LATERAL

    Trial 1

    2.771

    2.182

    2.031

    0.297

    Trial
    2

    2.779

    2.180

    2.021

    0.322

    Trial 3

    0.261

    0.261

    0.261

    0.363

    Trial
    4

    0.261

    0.262

    0.261

    0.297

    Average

    1.518

    1.221

    1.144

    0.320

    With the UDF caching enabled, there is no discernible difference in performance for the various function versions: SQL vs. RPG vs. RPG *SRVPGM. Notice during the first two trials, the times are improved but comparatively high to the query without a UDF. But by the third execution it appears that the SQE has somehow figured out the best way to cache the UDF results and thereby dramatically improve the query run time (even better than the statement without a function).

    Keep in mind that this type of performance improvement may not always be present; it just depends on how much data DB2 has to cache and whether it can do it effectively. The ProductCostHistory table is relatively static; there aren’t many changes to it within a single week’s time. But if a UDF is based on a table that is continually changing, then DB2 may not be able to show these kinds of performance gains. That means each situation requires some experimentation.

    One other interesting thing to note is when I tried the FENCED/ALLOW PARALLEL options, performance degraded. So be careful and test when using various function options. To summarize, UDFs can end up making your queries expensive; especially if the wrong keywords are specified.

    So there is no right or wrong answer about when it’s appropriate to use UDFs. I’ll finish by leaving some general guidelines I use:

    1. I average the execution time of several invocations of the UDF and approximate how long it would take to run it over 100,000 rows. A fast UDF that executes in about 1ms will still add 100 seconds (1:40) to a query that processes the function 100K times. This is too much overhead for my liking. Of course, there are occasions when DB2 can cache UDF results, which may alleviate much of the performance penalty.

    2. SQL development often pits performance against reusability. When coding for performance, I will type out simple to intermediate column expressions in various queries instead of using a UDF. This makes the code harder to maintain but on systems with large volumes of data, I’d rather have the performance benefit.

    3. If an expression’s logic is complicated or doing something beyond what SQL can natively do, then I’ll use a UDF.

    4. I generally try to compare execution times with and without the UDF to get an idea of how much time the UDF is consuming. Often times LATERAL can help eliminate some UDFs that lookup data.

    5. When I decide a UDF may be appropriate, I spend some time considering which language to use. SQL is getting hard to beat, so if the UDF has to operate on a set of data with many rows, I usually choose SQL. For logic involving many single I/O operations requiring the use of positioned lookups, an RPG service program is a good choice. When creating a UDF based on code libraries downloaded from the internet, such as a library capable of dynamically evaluating math expressions, I will look for a C/C++ solution before a Java solution because it’s likely to perform better.

    Michael Sansoterra is a DBA for Broadway Systems in Grand Rapids, Michigan. Send your questions or comments for Mike via the IT Jungle Contact page.

    RELATED STORIES

    DB2 For i, Java, And Regular Expressions

    Away With The SQL Slash Separator

    Dynamically Invoke SQL UDFs From RPG Code

    Include C Utilities in Your Developer Library: Evaluating a Mathematical String Expression



                         Post this story to del.icio.us
                   Post this story to Digg
        Post this story to Slashdot

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags:

    Sponsored by
    Rocket Software

    Software built on TRUST. Delivered with LOVE.

    For over 35 years, Rocket Software’s solutions have empowered businesses to modernize their infrastructure, unlock data value, and drive transformation – all while ensuring modernization without disruption.

    Learn More

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Sponsored Links

    Maxava:  Don't wait for a disaster. Start planning today. DR Strategy Guide for IBM i FREE eBook.
    System i Developer:  Upgrade your skills at the RPG & DB2 Summit in Dallas, March 18-20.
    Northeast User Groups Conference:  24th Annual Conference, April 7 - 9, Framingham, MA

    More IT Jungle Resources:

    System i PTF Guide: Weekly PTF Updates
    IBM i Events Calendar: National Conferences, Local Events, and Webinars
    Breaking News: News Hot Off The Press
    TPM @ EnterpriseTech: High Performance Computing Industry News From ITJ EIC Timothy Prickett Morgan

    Infor Adds Industry-Specific Functionality to M3 Rumors Say Power8 Systems Debut Sooner Rather Than Later

    Leave a Reply Cancel reply

Volume 14, Number 5 -- March 5, 2014
THIS ISSUE SPONSORED BY:

HelpSystems
WorksRight Software
Northeast User Groups Conference

Table of Contents

  • DB2 For i Scalar Function Performance Considerations
  • Use Wireshark To Diagnose IBM i Communications Problems
  • Admin Alert: Setting Up IBM i TCP/IP Host Routes

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • FAX/400 And CICS For i Are Dead. What Will IBM Kill Next?
  • Fresche Overhauls X-Analysis With Web UI, AI Smarts
  • Is It Time To Add The Rust Programming Language To IBM i?
  • Is IBM Going To Raise Prices On Power10 Expert Care?
  • IBM i PTF Guide, Volume 27, Number 20
  • POWERUp 2025 –Your Source For IBM i 7.6 Information
  • Maxava Consulting Services Does More Than HA/DR Project Management – A Lot More
  • Guru: Creating An SQL Stored Procedure That Returns A Result Set
  • As I See It: At Any Cost
  • IBM i PTF Guide, Volume 27, Number 19

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle