|
|||||||
|
|
![]() |
|
|
Scribble on SQL's Scratchpad by Michael Sansoterra [The code for this article is available for download.] User-defined functions, or UDFs, allow developers to extend SQL for specific needs, so there is virtually no database task for which SQL isn't suitable. I can hear the skeptic shouting, "I can think of plenty of things SQL doesn't do very well, or at all." Well, you wouldn't be alone. There are many of you out there sailing along in the boat of skeptics, but take a look at what I have to offer before you sail by the port of SQL opportunity. During my many days of programming SQL on a V3R2 system, I hit many roadblocks with SQL. At the time, I started (and nearly completed!) many projects, only to ultimately have SQL show itself insufficient for the task. Fortunately, IBM has made so many enhancements to SQL that those roadblocks are just about gone. Starting with V4R4, IBM has even allowed the programmer to make his own enhancements, with UDFs. This article will review a common problem that, in prior days, SQL/400 wasn't able to handle well: a running total function. I will show how creating a simple UDF with the scratchpad option can be used to enhance SQL's functionality to solve this problem. This article assumes the reader has a handle on creating basic scalar UDFs and is proficient in a high-level language, such as RPG or COBOL. A Quick Review As a brief review, recall that a scalar function is an entity that can accept zero or more input parameters, and perform an operation on these values and return a single output value. Some examples of SQL's built-in scalar functions are the substring (SUBSTR) function, the convert-to-character (CHAR) function, and the round (ROUND) function. In each case, one or more values are passed to the function and operated on, and a single result value is returned. Scalar UDFs allow the SQL language to be tailored to specific needs. Here are a few benefits of UDFs:
For example, if a company makes use of long formulas to make certain calculations, the formula could be encapsulated in a UDF. Rather than typing the long formula in multiple SELECT statements, the UDF would be substituted. The code below shows the classic Celsius to Fahrenheit UDF, followed by how it might be used in a SELECT statement:
Create Function Mike/CelToFahr (DegCel Dec(5,1))
Returns Dec(5,1)
Language SQL
Begin
Return (DegCel * 9/5 + 32);
End
Select "Date", Temp As Celsius, CelToFahr(Temp) As Fahrenheit
From WeatherHistory
Where CalToFahr(Temp)>=68 /* All Temps over 68F */
Scalar UDFs come in two varieties: SQL and external. SQL UDFs, like the one shown above, are written in the SQL Stored Procedure Language. External UDFs are written in high-level-language programs. Now that the review is done, let's focus on a special use for external UDFs, called a scratchpad, to solve our running-total column. The Once Insurmountable Enigma Imagine that you have created a sales report with the columns year, month, and sales amount. Because it's so simple, you built the report with a QM Query and are done. After reviewing the report, however, the user decides he wants a running-total column added to the report (Figure 1).
Figure 1: A sample sales report that requires a running total by year in the YTD Amount column In the old days of SQL, before V4R2, one would most likely rewrite the report in a high-level-language program. Starting with V4R2, the CASE statement allowed for conditional summations, but the resulting SQL statement wasn't necessarily pretty. However, an external UDF can be easily created to handle just this sort of problem. External UDFs can make use of a scratchpad, which is a static, user-defined variable area that doesn't change between calls to the UDF. In RPG IV terms, think of the UDF as a procedure and think of the scratchpad as a user-defined data structure with the keyword static (the structure's contents are preserved between procedure calls). With this in mind, we can write a function that will receive the sales amount value and remember this value in the scratchpad. When the function is called again, the scratchpad remembers the value from before and adds the current row's value to it, and so on. Also, as shown in Figure 1, the user wants the running total to reset to zero at the start of the new year. Therefore we will need another input variable in our function that we can use to detect a change in the year and reset our scratchpad to zero when the year changes. We could call the resulting UDF SUMTOTAL, and its format would look like this: SUMTOTAL(Year Integer, SalesAmt Decimal(9,2)) However, so that the function can be used in a wider variety of scenarios, we could create a similar, but more generic, UDF called SUMVAL (sum value), and its syntax would look like this: SUMVAL(BreakValue Varchar(128), AccValue Dec(13,4)) In this expanded function, BreakValue would represent the data for any value that we wanted to break on, whether it be year, order number or customer name. Building a UDF Now that the purpose of the running total UDF has been established, let's build it. Recall that in order to register an external UDF for SQL to use, the CREATE FUNCTION statement needs to be issued. This statement will register the function, along with its parameters and the function's resulting data type. The SQL code below shows the CREATE FUNCTION statement needed to define our SUMVAL UDF:
Create Function MIKE/SUMVAL (BreakData Varchar(128),
AccValue Dec(13,4))
Returns Dec(17,4)
External
Called On Null Input
Not Deterministic
Language RPGLE
No Sql
Parameter Style DB2SQL
ScratchPad 142
Final Call
Disallow Parallel /* Only 1 scratchpad per function */
Here is a quick rundown of each of the keywords specified on the above CREATE FUNCTION statement. External--This is a call to an external program. When the program name is not specified (as is the case here), SQL will assume the external program name is the same as the function name. The program need not be present when the CREATE FUNCTION statement is issued. Called On Null Input--This keyword tells SQL that the function should be called even if one or more of the parameters passed to it are NULL. Not Deterministic--A deterministic function is one that will always return the same value for a given set of inputs (that is, the output can be determined based on the input.) A non-deterministic function will not always return the same result for the same input. This keyword serves as an optimization technique for SQL. If SQL knows that a function will not change its output value for identical inputs, it will not have to call the UDF for consecutive rows with identical inputs. Language RPGLE--Specifies that the external function will be created in ILE RPG (or RPG IV). No SQL--This keyword notes that there are no embedded SQL statements used in the UDF. This keyword is also for SQL's optimization. Knowing that no SQL statements are involved relieves SQL from worrying about issues such as commitment control. Parameter Style DB2SQL--This keyword informs SQL how to construct the parameter list for the calling program. Figure 2 shows the required parameter list for a UDF written to conform to the DB2SQL parameter style. See the IBM SQL reference guide (click here to view PDF file) for information on the parameter list for other available parameter styles.
Figure 2: The external UDF program is required to conform to this parameter list when using the DB2SQL parameter style Scratchpad n--This keyword tells SQL to set aside n bytes for use as a static variable area that can be reused by the high-level-language program. When scratchpad is defined, another parameter defined as n bytes should be added to the end of the calling program's parameter list. Final Call--When this keyword is specified, another parameter is sent to the external program, flagging whether this is the first, normal, or last call of the SQL statement. When using this keyword, yet another parameter, with data type INTEGER, will need to be appended to the host program's parameter list. Disallow Parallel--This keyword lets SQL know that it can't multithread the call to this function. This is particularly important with the scratchpad, since this ensures that only one scratchpad will be used with the function. The code snippet below contains a portion of the code for the SUMVAL RPG program. It shows the scratchpad data structure and the program's PLIST. The scratchpad will store the following information:
In total, these variables will require 142 bytes of memory, which is why the CREATE FUNCTION statement example included Scratchpad 142. Of course, the size of the program's scratchpad should match the size specified on the CREATE FUNCTION statement. * * Scratchpad * D dsScratchPad DS D AccumValue Like(Result) D NonNullData 1 D PriorData Like(BreakData) D PriorNull 5i 0 C *Entry Plist C Parm BreakData C Parm AccValue C Parm Result C Parm BreakData_Null C Parm AccValue_Null C Parm Result_Null * DB2SQL Style Parms C Parm SQL_State C Parm Function_Name C Parm Specific_Name C Parm Msg_Text * Scratchpad Parm and Final Call flag parm C Parm dsScratchPad C Parm FinalCall The rest of the RPG is mostly self-explanatory, but here are the highlights of how it works. If this is the first time the program is called from an SQL statement, as indicated by the FinalCall parameter, set the NonNullFlag to N. This flag will be used to indicate if an actual numeric value to accumulate was passed to the program. The function will return a NULL until a numeric value is encountered. Once a numeric value has been encountered, any additional NULLs will be treated as zeros. The program next checks for a "first call" flag or a change in the BreakData parm. When true, the scratchpad's accumulator will be zeroed. After this, the numeric value passed to the function is added to the scratchpad accumulator. The BreakData value is saved in the scratchpad as well. The program ends with a RETURN. For performance reasons, the *INLR is not set on, so that the program, and its variables are retained in memory. The following SQL statement demonstrates how the SUMVAL function could be used in a SELECT:
SELECT Order, Item, ExtAmount,
SumVal(Order, ExtAmount) As Running
FROM OrderDetail
ORDER BY Order
No doubt some of you are wondering, "But what if the Order column was numeric instead of alpha?" In that case, you would have to CAST the numeric column to VARCHAR, using either the CAST or the VARCHAR built-in functions: SumVal(Varchar(Order), ExtAmount). Remember, when invoking UDFs in an SQL statement, the parameter data types you pass must be compatible with the parameter data types specified on the CREATE FUNCTION statement. For example, the AccValue numeric parameter is defined as packed decimal (13,4). An integer data type can be passed as a parameter, since an integer can be promoted to DEC(13,4), provided the integer doesn't exceed nine places. However, a FLOAT column cannot be passed here, because a FLOAT cannot be promoted to DEC(13,4), since DEC(13,4) is not nearly big enough to hold the entire range of FLOAT values. In this case, you would have to change your function to accept a FLOAT parameter or specifically CAST your float to DEC(13,4) if you're certain that DEC(13,4) will always be large enough to hold your floating point number. Incidentally, if you create the SUMVAL but then get a SUMVAL in *LIBL type *N not found error message, there is probably a data type compatibility issue. In this case, attempt to specifically CAST your parameters to VARCHAR(128) and DEC(13,4) respectively. If, after doing this, the function runs, this indicates that you're trying to pass the function incompatible data types. Remember, a function is identified not only by name but also by its signature--that is, its name and parameter list. Finally, let's close with one other common SQL shortcoming where SUMVAL can help: data conversions. When converting data from one system to another, it's common to use SQL to populate the table using the combination of INSERT INTO and SELECT statements. This is a powerful technique, and in most cases is easier to work with than a high-level-language program. However, there is often one drawback for these types of data conversions: What if the receiving table requires an incremental number? For example, say you're moving to a new software package and you're trying to populate a sales line item history table. Typically, you have a sales order number, followed by an incremental line item sequence field. Further, the sequence field is reset every time a new order number comes around. I've ran into this several times, only to revert back to an RPG program or come up with some trick to populate this column uniquely. Alas, here is where the SUMVAL UDF comes in handy. The following SQL contains a sample insert statement that inserts an incremental number in the line item sequence field:
Insert Into OrderDetail (OrderNo, LineSeq, Item, Qty, ExtPrice)
Select OrderID, Int(SumVal(VarChar(OrderID),100)), ItNbr,
Quantity, Round(Quantity * Price,2)
From SalesDetail
Order By OrderID
The LineSeq field in the OrderDetail table will be incremented by 100 for every line item for a given order. The SUMVAL function will break on ORDERID, so, when the OrderID changes, the counting will restart. Notice that an ORDER BY OrderID clause is shown. This is required to make sure that false breaks aren't encountered. Remember SUMVAL uses control-break logic to reset the counter, so the data must be sorted correctly. One thing I've noticed is that you should always specify your base field names in the ORDER BY -- do not use expressions. Specifying ORDER BY VARCHAR(ORDERID) may yield incorrect results. Additionally, keep in mind that when using the scratchpad in a SELECT or in a cursor for this type of running total, the data should only be read FORWARD. Reading backward or skipping records (for example, FETCH LAST) through a cursor will cause invalid results. In interactive SQL (STRSQL), make sure that your data refresh option is set to forward only (use F13 to see this option). The Running "Sum"mary The scratchpad feature of UDFs has become indispensable for me. Reporting or data-conversion enigmas, which used to require high-level-language programs to solve, can now be solved quickly and entirely with SQL. Michael Sansoterra is a programmer/analyst for SilverLake Resources, an IT services firm based in Grand Rapids, Michigan. E-mail: msansoterra@silver-lake.com.
|
Editors
Contact the Editors |
| Copyright © 1996-2008 Guild Companies, Inc. All Rights Reserved. |