Treasury of new DB2 6.1 (V6R1) Features, Part 4: Index and Data Type Enhancements
March 25, 2009 Michael Sansoterra
So far in this series, we’ve covered how IBM has made life easier for DB2 for i (formerly called DB2/400 or DB2 UDB for iSeries) developers by:
This tip will cover yet another valuable subset of features: new indexing and data type features.
Building an Index Using Expressions
The CREATE INDEX statement has been enhanced to allow column expressions. Previously, indexes containing limited expressions could be built using DDS based logical files, but now SQL has been granted this power. To do this in SQL, simply substitute a column expression where a column name would normally go in the CREATE INDEX statement:
CREATE INDEX APPDATA/ORDERLINE_I1 ON APPDATE/ORDERLINE (SALESREP, LINEAMT * QTY DESC)
In the above example, an index is built on the ORDERLINE table using the SALESREP column followed by the calculation of the extended dollar amount in descending order. The following “Top Sales” query will allow the query optimizer to choose the newly created ORDERLINE_I1 index:
Select SalesRep,Order,Item,LineAmt*Qty As ExtAmount From OrderLine Order By SalesRep, ExtAmount Desc
As expected, the expressions and the sort sequence in the ORDER BY need to match the column names and expressions in the index. The first time I tried this I noticed that the optimizer was performing a table scan instead of using the intended index, but my original test table only had about 500 rows. Doing a similar test on a table with over 100K rows yielded the expected result with the query optimizer picking the expression index.
Here’s an example of how an index expression can benefit a legacy application. Sometimes legacy packages store dates in a less than optimal format. Say you have a PO header table that contains the following columns:
It’s always a pain to do a “cross century” search on dates stored like this. However, we can build an index that will combine these two date piece fields into one large numeric field:
Create Index DataLib/PurchaseHeader_I1 ON DataLib/PurchaseHeader (POCC*1000000+POYMD,PO)
Or, we can even go a step further (assuming there’s valid date data in every row) and create an expression that will cast the numeric values to a DATE data type:
Create Index DataLib/PurchaseHeader_I2 On DataLib/PurchaseHeader (Date(Digits(POCC)||Left(Digits(POYMD),2)||'-'|| Substr(Digits(POYMD),3,2)||'-'|| Substr(Digits(POYMD),5,2)),PO)
Before building an index like the one in this example, you’ll want to ensure that the values in all of the rows are valid. Remember, just because the expression is in your index doesn’t mean you can use it in your Select’s column list. For instance, if you want to see the result of the above date expression in a query result, you’ll need to re-code the expression in your Select statement.
Unfortunately at this time, deterministic User-Defined Functions are not supported in index expressions. Understandably, special registers, sub-selects, aggregate expressions, and other non-deterministic functions are not supported either.
Indexes that Contain Column Names
SQL indexes can now have a list of column names and a record format specified. Previously, SQL indexes did not contain column information and hence could not be directly read by Query/400, COBOL or RPG programs. Ted Holt already did a write up on this topic, so I won’t cover it in detail again.
For quick reference, here is a sample index that uses a key expression and a column list:
Create Index AdvWorks/TranHistII On AdvWorks/TransactionHistory (ProductId, Case When Quantity=0 Then 0 Else ActualCost/Quantity End Desc) RcdFmt TransHistR Add ProductId, ActualCost, Quantity, ReferenceOrderId, TransactionId, ReferenceOrderLineId, TransactionDate, TransactionType, ModifiedDate
This SQL index will be built with a column expression in the index, a record format of TransHistR, and will contain all of the column names listed in the Add clause. Instead of specifying individual column names on the ADD clause, you can also specify the “ALL COLUMNS” option as “ADD ALL COLUMNS.”
Having column information in a specific index should save the query optimizer time by allowing SQL to pull data out of the index without having to reference the data in the underlying base table (a.k.a. physical file). This will also allow RPG and COBOL programs to use an SQL index like a traditional logical file without using embedded SQL. These index enhancements have virtually eradicated the remaining benefits of using DDS. Although, since IBM is moving us away from DDS, I would still like an “i” specific extension to DB2 SQL to allow edit codes or edit words to be specified on a numeric column.
The big remaining limitations here are that column expressions are not allowed in the record format’s column list (that would be very nice), and they suffer the inability to reference multiple tables. To get rid of these limitations IBM’s next logical step is to give developers the ability to create Indexed Views.
Binary Constants and Comparisons with Character Data
Now moving on to our data type related discussions, a new binary string constant has been added to SQL that is very similar to the hex form of the character string constant. A binary string constant starts with the letters BX:
BX'0D0A' /* ASCII Carriage Return & Line Feed Characters */
A binary string constant is automatically interpreted by DB2 as the BINARY data type and hence will not be translated to character data under any circumstances (except an explicit cast). If you’re wondering “why not just use the existing hex constant?” the answer is that the hex constant can, in certain circumstances, be interpreted as character data and hence can be translated to a different character set. For instance, a result set containing a hex constant could be translated by, say, the ODBC driver whereas a binary constant will not undergo such changes.
A related change is that character data tagged with the FOR BIT DATA option specified is now compatible with the binary data type. This means that these two data types can be compared, concatenated, etc., without first explicitly CASTing the data. Prior to V6R1 specifying FOR BIT DATA on a character column would still be treated as character data (even though the intention of this encoding is to store binary data). For example, the following concatenation will fail in V5R4 but succeed in V6R1:
Select Binary(X'F1')||Cast(X'F1' As Char(1) For Bit Data) As Test_Concat From SysIBM/SysDummy1
Even so, there is a still a significant difference between the two types as binary columns are padded with binary zeros (BX’00’) and character based columns (including those with FOR BIT DATA specified) are padded with spaces. While the two are now somewhat compatible, all new applications should specify the BINARY data type for storing binary data unless there is an overriding legacy requirement to keep the column as character.
New DECFLOAT Data Type
A new data type called DECFLOAT has been introduced this release. This data type based on the IEEE 754 standard (actually it’s IEEE 754R) combines the benefits of the compact yet large value range capacity of floating point numbers, but also includes the precision of the decimal data type that is critical to business applications. (Something traditional floating point numbers are not guaranteed to have!)
DECFLOAT can be defined as having either 16 or 34 digits of precision. DECFLOAT(16) requires eight bytes of storage and can hold an exponent range of 10383 to 10384. DECFLOAT(34) requires 16 bytes of storage and can hold 106143 through 106144. If you need a small storage footprint, big numbers and accuracy then DECFLOAT is a great option.
Further, in addition to numbers, this data type is capable of holding any of the following special values:
When inserting these values into a DECFLOAT column, use one of the following special constants: INFINITY, NAN, and SNAN (quotes are not required).
New companion built-in functions for use with the DECFLOAT type are:
For the record, the ILE/C compiler is the only high level language that currently supports DECFLOAT. For other high level languages, the DECFLOAT data will need to be cast to another data type (DEC or DOUBLE, for example) before it is stored in host variables.
New Unicode Data Types
Three new data types have been introduced: NCHAR, NVARCHAR, and NCLOB. These data types are double-byte characters tagged with the Unicode (UTF16) CCSID of 1200. In other words, defining the following column:
is equivalent to defining a column as:
CustomerName VARGRAPHIC(35) CCSID 1200
Similarly, NCHAR is the equivalent of GRAPHIC and NCLOB is the equivalent of DBCLOB (with CCSID 1200 specified).
While DB2 has been able to work with UTF16 since V5R3 this is a nice enhancement because:
The index enhancements that enable indexes to have expressions, record format names, and specified column names eliminate many of the few advantages DDS had. Additionally, the new implementation of binary constants and binary comparisons with character “FOR BIT DATA” have eliminated most of the remaining issues related to binary data handling within DB2 for i. Finally, the new DECFLOAT and Unicode data types give developers more flexibility in numeric data type offerings as well as simplify defining columns and parameters requiring double byte Unicode data.
Michael Sansoterra is a programmer/analyst for i3 Business Solutions, an IT services firm based in Grand Rapids, Michigan. Send your questions or comments for Mike to Ted Holt via the IT Jungle Contact page.