|
||||||||
|
|
![]() |
|
|
Empower SQL with Java User-Defined Functions by Michael Sansoterra [The code for this article is available for download.] Many programming requests routed my way have to do with data transformations of some kind (like moving, copying, importing, exporting, or validating data), so I'm forever using SQL. When I recently started to learn Java, my instinct led me to find out how I could use it to make my life with SQL easier. It didn't take long to figure out that Java has a boatload of excellent string-handling routines. Two useful capabilities that Java has that I wanted to bring to SQL are locale-specific currency format and regular expressions. (I'll cover regular expressions in detail in a future article.) The question is, how can SQL be endowed with these capabilities? Combining Java and SQL The answer is user-defined functions. UDFs let programmers write programs that extend SQL's capabilities. There are several kinds of UDFs, but for now, we'll consider only external scalar UDFs. Scalar functions accept zero or more arguments and return a single result. External means they are written in something other than the SQL Procedure Language, such as CL, COBOL, or Java. (For an example of a UDF written in RPG, see "Scribble on SQL's Scratchpad.") (Note: To follow these examples, you must have the 5722JV1 Java Developer Kit and 5722JC1 Toolbox for Java installed on your system. To use the regular expression examples, you must be using Version 1.4 of the JDK.) Building a Java UDF Creating a Java function for use by SQL involves mapping a static Java method to an SQL function name. Consider, for example, the following method, which returns a string representation of a number in the local currency:
public static String formatCurrency(BigDecimal number) {
NumberFormat moneyFormat =
NumberFormat.getCurrencyInstance();
String result=moneyFormat.format(number);
return result;
}
A UDF can be created to put an SQL "face" on the above method to convert a number to a formatted string. Used in a statement, the SQL face might look something like this:
SELECT PO, Item,
FmtCur(UnitCost * Qty) AS ExtCost
FROM PurchaseLine
WHERE PO=12345
When the statement is executed, everytime the FmtCur() function is referenced, the Java method formatCurrency will be invoked, with a numeric parameter being passed to the method. The method's return value will be the result of the FmtCur() function, which will be a string formatted with the proper currency symbol, separators, and decimal point. Registering the Function with SQL A UDF is registered to SQL with the CREATE FUNCTION statement. CREATE FUNCTION does a number of things, like giving the function a name, defining its input parameters, specifying the data type for the return parameter, and specifying the programming language. Here's a sample CREATE FUNCTION statement to register the FMTCUR function for use by SQL: CREATE FUNCTION mylib.FmtCur(Amount Dec(21,9)) RETURNS VARCHAR(26) EXTERNAL NAME 'UDFs.formatCurrency' SPECIFIC mylib.formatCurrency_Default LANGUAGE Java PARAMETER STYLE Java FENCED NO SQL RETURNS NULL ON NULL INPUT SCRATCHPAD DETERMINISTIC The function FMTCUR accepts one input parameter: Amount DEC(21,9). And it returns a string: RETURNS VARCHAR(26). The UDF signature must match the Java method's signature. In this example, the DECIMAL SQL data type is mapped to Java's BigDecimal class, and the function's string result is mapped to SQL's VARCHAR data type. How did I know how to do this? Easy, I checked the manual! For information on determining compatible data types between SQL and Java, see the section entitled "Parameter passing conventions for Java stored procedures and UDFs," in the IBM Developer Kit for Java (in PDF format). Here are descriptions of the other keywords: EXTERNAL NAME--For a Java UDF, this specifies the class and method to be used as the code behind the function. In the above example, the class is UDFs and the method is formatCurrency. When the function is invoked, method formatCurrency will be passed an amount and will expect the method to return a string. EXTERNAL NAME's usage varies depending on the programming language. SPECIFIC--This keyword is used to uniquely identify a function. Normally, the function name itself is good enough to identify the function, but, as with Java methods, SQL functions can be overloaded. This means a single function can have the same name but multiple parameter lists (or signatures). I'll mention more on overloading later. LANGUAGE--This keyword specifies the language the function is written in. PARAMETER STYLE--The parameter style defines how parameters are to be passed to the external program. Choosing a style depends largely on the language being used to author the function. For Java routines, there are two choices: Java and DB2General. Of the two, Java is the more portable, as it specifies that the function conforms to the SQLJ Part 1 standard for coding SQL routines in Java. When specifying DB2General, the function must extend class com.ibm.db2.app.UDF. The advantage of using DB2General is the functionality afforded by the IBM-supplied UDF class, including a greater ability to work with NULLs and a method that returns a handle to the current database connection. The disadvantage is that it's only portable across the DB2 platform. The other major difference between the two styles is that UDFs coded for parameter-style Java retrieve their result from the method's return value. When using DB2General, the method is declared with a return type of void and the function's result is retrieved from an output parameter. For more information about the DB2General and Java styles, see "Java user-defined scalar functions," in the IBM Developer Kit for Java. FENCED--This option specifies that the function will run in a different thread than the SQL statement. NO SQL--This option specifies that the function contains no SQL statements. Alternative options are READS SQL DATA, MODIFIES SQL DATA, or CONTAINS SQL (read or modify). RETURNS NULL ON NULL INPUT--When this option is specified, the database manager will automatically set the function's result to NULL if any of the function's input parameters are NULL. This optimization step allows SQL to skip a call to the function's external program if the result will be NULL anyway. The alternative phrase to force a call to the function is CALLED ON NULL INPUT. SCRATCHPAD--This option tells SQL to set aside a specific number of bytes of memory for the function's use. The contents of this scratchpad area will be maintained between calls to the function program (that is, SQL will not alter its contents, thereby making it a "static" area). If a number is not specified, the default number of bytes is 100. This keyword has special meaning for Java programs, as specifying scratchpad will cause only one instance of a given class to be instantiated. If scratchpad is not specified for a Java program, every time the function is called, a new instance of the class will be created. DETERMINISTIC--This keyword informs the database manager that, for an identical set of inputs, the function will return the same output. This is another optimization feature that allows the database manager to skip a call to the function's program if the function's value can be determined ahead of time. The alternative keyword NOT DETERMINISTIC means that the function's result cannot be determined based on the input parameters and that the program should be called even when the input parameter values are repeated. For more help on this topic related to Java, see the section called "SQL Routines," in the IBM Developer Kit for Java. For more help on CREATE FUNCTION in general, go to the SQL reference manual (under "Chapter 5. Statements"), then see the entry "CREATE FUNCTION (External Scalar)". The Java UDFs Class Click here to view the UDFs.java program. This class contains several static methods designed for use as SQL UDFs. The corresponding CREATE FUNCTION statement for each method is provided in the comments. The following table contains a list of the methods provided.
When writing functions in Java on the iSeries, there are a few guidelines to follow:
As a side note about the sample code, I specified a default schema (library) of mylib in the CREATE FUNCTION statements. When CREATE FUNCTION is issued, it will create a service program in the specified schema. Also note that I specified return lengths for columns that will usually be longer than needed. This is done to be able to handle larger than usual results but makes viewing the results painful in a tool such as interactive SQL. Other options are reducing the column size using the LEFT or SUBSTR functions or reviewing the results in a GUI editor such as that provided by the iSeries Navigator. Overloading UDFs.java contains eight methods. However, three of these methods are merely "overloaded" signatures. (When a class contains more than one method with the same name but different parameter lists, the method is said to be overloaded.) For example, the format currency function has two methods: public static String formatCurrency(BigDecimal number) And:
public static String formatCurrency(BigDecimal number,
int scale)
These methods have corresponding overloads in SQL:
/* The single parameter FMTCUR
assumes two decimal places */
SELECT SalesRep, RepName,
formatCurrency(Sales) AS Sales
FROM SalesSummary
And:
/* The double parameter FMTCUR
accepts amount and scale (decimals) */
SELECT Item,
formatCurrency(Price*Qty/ConvFactor,4) AS Price
FROM SalesLine
Overloading has many uses, including allowing defaults and allowing for alternative data types (float versus decimal, for example). In the case above (as well as in the other two overloaded functions), the purpose of overloading is to supply defaults. If the second parameter (scale) is omitted from FMTCUR, the single parameter version of the formatCurrency method is invoked. This single parameter version subsequently calls the double parameter version and supplies a default value of two decimal places. When writing an overloaded function for supplying defaults, start with the function that provides the most functionality (that is, the most parameters.) Once the base method is coded, the overloaded methods may call the base method and pass in the default parameters. When registering the overloaded functions, SQL will know which version of the method to use by matching the parameter list. Also, when registering overloaded functions, be sure to register a unique name for the function, using the SPECIFIC keyword, so that you can refer to that particular version of the function easily when dropping it. When dropping an overloaded function, use the function's specific name or exact signature, because the function name alone will not be unique. Here are examples of dropping a function by SPECIFC name and by signature: DROP SPECIFIC ROUTINE mylib.formatCurrency_Default /* SPECIFC routine name */ Or: DROP FUNCTION mylib.FMTCUR(DEC(21,9)) /* Signature must match exactly */ Compiling the Program When building Java programs to use as UDFs and stored procedures, the compiled class file must be placed in a special folder on the iSeries: /QIBM/USERDATA/OS400/SQLLIB/FUNCTION If your class is not placed here, an SQL statement using the routine will fail. To compile the .java source into a class file, enter QShell (using the QSH command) and issue the following compile command: javac –d /qibm/userdata/os400/sqllib/function /myfolder/UDFs.java Additionally, once the class is created, it may be optimized specifically for the iSeries by issuing the CRTJVAPGM command:
CRTJVAPGM CLSF('/qibm/userdata/os400/sqllib/function/UDFs.class')
OPTIMIZE(40)
Optimizing the .class file helps performance. For those on a slower box, like myself, another issue encountered when using Java with SQL is that the statement always failed on the first try. As it turns out, this was due to the length of time it takes to start the Java Virtual Machine. To keep the first statement from failing, I had to increase the UDF_TIME_OUT limit specified in the QAQQINI file, in order to give the JVM enough time to start. The default limit is 30 seconds. After the initial slow response time on the first statement, I found performance on subsequent statements was acceptable. After the class file is compiled, register the functions with SQL. The CREATE FUNCTION statements listed below can be copied and pasted into iSeries Navigator or STRSQL. You'll need these functions defined and ready to go for my next article, which will demonstrate how to use the regular expression functions. To understand the regular expression code, see the "Regular Expressions" Java tutorial.
CREATE FUNCTION mylib.FmtCur(Amount DEC(21,9))
RETURNS VARCHAR(26)
EXTERNAL NAME 'UDFs.formatCurrency'
SPECIFIC mylib.formatCurrency_Default
LANGUAGE Java
PARAMETER STYLE Java
FENCED
NO SQL
RETURNS NULL ON NULL INPUT
SCRATCHPAD
DETERMINISTIC
CREATE FUNCTION mylib.FmtCur(Amount DEC(21,9), Scale Integer)
RETURNS VARCHAR(26)
EXTERNAL NAME 'UDFs.formatCurrency'
SPECIFIC mylib.formatCurrency_Scale
LANGUAGE Java
PARAMETER STYLE Java
FENCED
NO SQL
RETURNS NULL ON NULL INPUT
SCRATCHPAD
DETERMINISTIC
CREATE FUNCTION mylib.re_Test(input VARCHAR(500),
regex VARCHAR(500))
RETURNS INTEGER
EXTERNAL NAME 'UDFs.re_Test'
LANGUAGE Java
PARAMETER STYLE Java
FENCED
NO SQL
RETURNS NULL ON NULL INPUT
SCRATCHPAD
DETERMINISTIC
CREATE FUNCTION mylib.re_Find(input VARCHAR(500),
regex VARCHAR(500))
RETURNS VARCHAR(500)
EXTERNAL NAME 'UDFs.re_Find'
SPECIFIC mylib.re_Find_Default
LANGUAGE Java
PARAMETER STYLE Java
FENCED
NO SQL
RETURNS NULL ON NULL INPUT
SCRATCHPAD
DETERMINISTIC
CREATE FUNCTION mylib.re_Find(input VARCHAR(500),
regex VARCHAR(500),
occurrence INTEGER)
RETURNS VARCHAR(500)
EXTERNAL NAME 'UDFs.re_Find'
SPECIFIC mylib.re_Find_occurrence
LANGUAGE Java
PARAMETER STYLE Java
FENCED
NO SQL
RETURNS NULL ON NULL INPUT
SCRATCHPAD
DETERMINISTIC
CREATE FUNCTION mylib.re_Split(input VARCHAR(500),
Delimiter VARCHAR(500))
RETURNS VARCHAR(500)
EXTERNAL NAME 'UDFs.re_Split'
SPECIFIC mylib.re_Split_Default
LANGUAGE Java
PARAMETER STYLE Java
FENCED
NO SQL
RETURNS NULL ON NULL INPUT
SCRATCHPAD
DETERMINISTIC
CREATE FUNCTION mylib.re_Split(input VARCHAR(500),
Delimiter VARCHAR(500),
Occurrence INTEGER)
RETURNS VARCHAR(500)
EXTERNAL NAME 'UDFs.re_Split'
SPECIFIC mylib.re_Split_occurrence
LANGUAGE Java
PARAMETER STYLE Java
FENCED
NO SQL
RETURNS NULL ON NULL INPUT
SCRATCHPAD
DETERMINISTIC
CREATE FUNCTION mylib.re_ReplaceAll(input VARCHAR(500),
regex VARCHAR(500),
Replace VARCHAR(500))
RETURNS VARCHAR(500)
EXTERNAL NAME 'UDFs.re_ReplaceAll'
LANGUAGE Java
PARAMETER STYLE Java
FENCED
NO SQL
RETURNS NULL ON NULL INPUT
SCRATCHPAD
DETERMINISTIC
Selecting the Java Version to Use on the iSeries As mentioned earlier, regular expressions did not become available in Java until Version 1.4. If you have multiple Java Development Kit (JDK) versions loaded, Version 1.4 may not be the default. Changing the default Java version to 1.4 can be done at the system or user level. (See page 16 of the IBM Developer Kit for Java for more information.) To override IBM's defaults, create a text file called SystemDefault.properties. This file will contain a JVM property, followed by its value (property=value). A sample SystemDefault.properties file will look like this: # # Use version 1.4 # java.version=1.4 # Custom property company.name=My Company Name Comments are denoted by a pound (#) sign. For a list of properties that can be set, see "Java system properties for Java 2 Software Development Kit (J2SDK), Standard Edition." Additionally, custom properties (as shown above) can be added. When your file is created, it can be dropped into one of two locations: /QIBM/UserData/Java400 (for all users) or /home/dir (where /home/dir represents the user's home path as specified on the user profile; this override will apply to individual users). When the JVM is started, it will check the user's home folder for a SystemsDefault.properties file. Next, it will look in /QIBM/UserData/Java400 for SystemsDefault.properties. If both exist, the user level overrides the system level. The SQL and Java Tag Team Java's capabilities offer innumerable ways to augment SQL's power. This article demonstrated how to write Java functions and to register them with SQL. In a future article, I'll give examples of how the regular expression functions defined here can be used to make SQL string handling more powerful than ever. Michael Sansoterra is a programmer/analyst for SilverLake Resources, an IT services firm based in Grand Rapids, Michigan. E-mail: msansoterra@silver-lake.com
|
Editors
Contact the Editors |
| Copyright © 1996-2008 Guild Companies, Inc. All Rights Reserved. |