fhg
Volume 9, Number 11 -- March 25, 2009

Treasury of new DB2 6.1 (V6R1) Features, Part 4: Index and Data Type Enhancements

Published: March 25, 2009

by Michael Sansoterra

So far in this series, we've covered how IBM has made life easier for DB2 for i (formerly called DB2/400 or DB2 UDB for iSeries) developers by:

  • Part 1: Cramming greater query functionality and operations into a single statement
  • Part 2: Providing an easy way to summarize data with super groups and grouping sets
  • Part 3: Giving developers options to implement audit tracking and other features useful in a client/server environment by implementing several new client special registers

This tip will cover yet another valuable subset of features: new indexing and data type features.

Building an Index Using Expressions

The CREATE INDEX statement has been enhanced to allow column expressions. Previously, indexes containing limited expressions could be built using DDS based logical files, but now SQL has been granted this power. To do this in SQL, simply substitute a column expression where a column name would normally go in the CREATE INDEX statement:

CREATE INDEX APPDATA/ORDERLINE_I1 ON
APPDATE/ORDERLINE (SALESREP, LINEAMT * QTY DESC)

In the above example, an index is built on the ORDERLINE table using the SALESREP column followed by the calculation of the extended dollar amount in descending order. The following "Top Sales" query will allow the query optimizer to choose the newly created ORDERLINE_I1 index:

  Select SalesRep,Order,Item,LineAmt*Qty As ExtAmount
    From OrderLine
Order By SalesRep, ExtAmount Desc

As expected, the expressions and the sort sequence in the ORDER BY need to match the column names and expressions in the index. The first time I tried this I noticed that the optimizer was performing a table scan instead of using the intended index, but my original test table only had about 500 rows. Doing a similar test on a table with over 100K rows yielded the expected result with the query optimizer picking the expression index.

Here's an example of how an index expression can benefit a legacy application. Sometimes legacy packages store dates in a less than optimal format. Say you have a PO header table that contains the following columns:


POCC

PO Date Century

Numeric(2,0)

POYMD

PO Date (YYMMDD)

Numeric(6,0)


It's always a pain to do a "cross century" search on dates stored like this. However, we can build an index that will combine these two date piece fields into one large numeric field:

Create Index DataLib/PurchaseHeader_I1 ON     
DataLib/PurchaseHeader (POCC*1000000+POYMD,PO)

Or, we can even go a step further (assuming there's valid date data in every row) and create an expression that will cast the numeric values to a DATE data type:

Create Index DataLib/PurchaseHeader_I2
On DataLib/PurchaseHeader 
(Date(Digits(POCC)||Left(Digits(POYMD),2)||'-'||
 Substr(Digits(POYMD),3,2)||'-'|| 
 Substr(Digits(POYMD),5,2)),PO)

Before building an index like the one in this example, you'll want to ensure that the values in all of the rows are valid. Remember, just because the expression is in your index doesn't mean you can use it in your Select's column list. For instance, if you want to see the result of the above date expression in a query result, you'll need to re-code the expression in your Select statement.

Unfortunately at this time, deterministic User-Defined Functions are not supported in index expressions. Understandably, special registers, sub-selects, aggregate expressions, and other non-deterministic functions are not supported either.

Indexes that Contain Column Names

SQL indexes can now have a list of column names and a record format specified. Previously, SQL indexes did not contain column information and hence could not be directly read by Query/400, COBOL or RPG programs. Ted Holt already did a write up on this topic, so I won't cover it in detail again.

For quick reference, here is a sample index that uses a key expression and a column list:

Create Index AdvWorks/TranHistII On AdvWorks/TransactionHistory 
(ProductId, Case When Quantity=0 Then 0 
                 Else ActualCost/Quantity End Desc) 
RcdFmt TransHistR 
Add ProductId, ActualCost, Quantity, 
ReferenceOrderId, TransactionId, ReferenceOrderLineId, 
TransactionDate, TransactionType, ModifiedDate 

This SQL index will be built with a column expression in the index, a record format of TransHistR, and will contain all of the column names listed in the Add clause. Instead of specifying individual column names on the ADD clause, you can also specify the "ALL COLUMNS" option as "ADD ALL COLUMNS."

Having column information in a specific index should save the query optimizer time by allowing SQL to pull data out of the index without having to reference the data in the underlying base table (a.k.a. physical file). This will also allow RPG and COBOL programs to use an SQL index like a traditional logical file without using embedded SQL. These index enhancements have virtually eradicated the remaining benefits of using DDS. Although, since IBM is moving us away from DDS, I would still like an "i" specific extension to DB2 SQL to allow edit codes or edit words to be specified on a numeric column.

The big remaining limitations here are that column expressions are not allowed in the record format's column list (that would be very nice), and they suffer the inability to reference multiple tables. To get rid of these limitations IBM's next logical step is to give developers the ability to create Indexed Views.

Binary Constants and Comparisons with Character Data

Now moving on to our data type related discussions, a new binary string constant has been added to SQL that is very similar to the hex form of the character string constant. A binary string constant starts with the letters BX:

BX'0D0A'   /* ASCII Carriage Return & Line Feed Characters */

A binary string constant is automatically interpreted by DB2 as the BINARY data type and hence will not be translated to character data under any circumstances (except an explicit cast). If you're wondering "why not just use the existing hex constant?" the answer is that the hex constant can, in certain circumstances, be interpreted as character data and hence can be translated to a different character set. For instance, a result set containing a hex constant could be translated by, say, the ODBC driver whereas a binary constant will not undergo such changes.

A related change is that character data tagged with the FOR BIT DATA option specified is now compatible with the binary data type. This means that these two data types can be compared, concatenated, etc., without first explicitly CASTing the data. Prior to V6R1 specifying FOR BIT DATA on a character column would still be treated as character data (even though the intention of this encoding is to store binary data). For example, the following concatenation will fail in V5R4 but succeed in V6R1:

Select Binary(X'F1')||Cast(X'F1' As Char(1) For Bit Data)
       As Test_Concat 
  From SysIBM/SysDummy1

Even so, there is a still a significant difference between the two types as binary columns are padded with binary zeros (BX'00') and character based columns (including those with FOR BIT DATA specified) are padded with spaces. While the two are now somewhat compatible, all new applications should specify the BINARY data type for storing binary data unless there is an overriding legacy requirement to keep the column as character.

New DECFLOAT Data Type

A new data type called DECFLOAT has been introduced this release. This data type based on the IEEE 754 standard (actually it's IEEE 754R) combines the benefits of the compact yet large value range capacity of floating point numbers, but also includes the precision of the decimal data type that is critical to business applications. (Something traditional floating point numbers are not guaranteed to have!)

DECFLOAT can be defined as having either 16 or 34 digits of precision. DECFLOAT(16) requires eight bytes of storage and can hold an exponent range of 10383 to 10384. DECFLOAT(34) requires 16 bytes of storage and can hold 106143 through 106144. If you need a small storage footprint, big numbers and accuracy then DECFLOAT is a great option.

Further, in addition to numbers, this data type is capable of holding any of the following special values:

  • Infinity--A value that represents an infinitely large (and can be positive or negative).
  • Quiet NaN--This value represents an undefined result (i.e., "not a number") that does not generate a warning when processed.
  • Signaling NaN--A value that represents an undefined result that causes a warning message to be generated.

When inserting these values into a DECFLOAT column, use one of the following special constants: INFINITY, NAN, and SNAN (quotes are not required).

New companion built-in functions for use with the DECFLOAT type are:

  • QUANTIZE--This function can be used to control the precision of a DECFLOAT number, since floating point numbers can contain slight inaccuracies QUANTIZE can help by controlling the number of digits to be reported.
  • TOTALORDER--This function reports the result of a comparison operation between two values and takes into consideration the special values when doing a comparison. (For instance, infinity is considered less than a NaN value!)
  • DECFLOAT_SORTKEY--This function returns a special binary value that can be used to sort DECFLOAT values, which will correctly take into account the special values and rules when sorting according to the IEEE 754 standard. (For instance, in this standard the value 2.10000 is less than 2.1. In fact, 2.1000 < 2.100 < 2.10 < 2.1. DECFLOAT_SORTKEY can be used to sort values consistent with the standard.)
  • DECFLOAT--This function is used to CAST a value to the DECFLOAT data type.

For the record, the ILE/C compiler is the only high level language that currently supports DECFLOAT. For other high level languages, the DECFLOAT data will need to be cast to another data type (DEC or DOUBLE, for example) before it is stored in host variables.

New Unicode Data Types

Three new data types have been introduced: NCHAR, NVARCHAR, and NCLOB. These data types are double-byte characters tagged with the Unicode (UTF16) CCSID of 1200. In other words, defining the following column:

CustomerName NVARCHAR(35) 

is equivalent to defining a column as:

CustomerName VARGRAPHIC(35) CCSID 1200

Similarly, NCHAR is the equivalent of GRAPHIC and NCLOB is the equivalent of DBCLOB (with CCSID 1200 specified).

While DB2 has been able to work with UTF16 since V5R3 this is a nice enhancement because:

  • The syntax is understandable (non DB2 developers don't usually know about CCSIDs or the GRAPHIC data type)
  • You don't have to remember the CCSID number for UTF-16
  • These data types provide nice compatibility with SQL Server and to a lesser extent with Oracle (which has NCHAR and NVARCHAR2 data types)

Summary

The index enhancements that enable indexes to have expressions, record format names, and specified column names eliminate many of the few advantages DDS had. Additionally, the new implementation of binary constants and binary comparisons with character "FOR BIT DATA" have eliminated most of the remaining issues related to binary data handling within DB2 for i. Finally, the new DECFLOAT and Unicode data types give developers more flexibility in numeric data type offerings as well as simplify defining columns and parameters requiring double byte Unicode data.


Michael Sansoterra is a programmer/analyst for i3 Business Solutions, an IT services firm based in Grand Rapids, Michigan. Send your questions or comments for Mike to Ted Holt via the IT Jungle Contact page.


RELATED STORIES

Treasury of New DB2 6.1 Features, Part 1: Query Enhancements

Treasury of New DB2 6.1 Features, Part 2: Grouping Sets and Super Groups

Treasury of New DB2 6.1 (V6R1) Features, Part 3: Client Special Registers



                     Post this story to del.icio.us
               Post this story to Digg
    Post this story to Slashdot


Sponsored By
WORKSRIGHT SOFTWARE

Do you need area code information?
Do you need ZIP Code information?
Do you need ZIP+4 information?
Do you need city name information?
Do you need county information?
Do you need a nearest dealer locator system?

We can HELP! We have affordable AS/400 software and data to do all of the above. Whether you need a simple city name retrieval system or a sophisticated CASS postal coding system, we have it for you!

The ZIP/CITY system is based on 5-digit ZIP Codes. You can retrieve city names, state names, county names, area codes, time zones, latitude, longitude, and more just by knowing the ZIP Code. We supply information on all the latest area code changes. A nearest dealer locator function is also included. ZIP/CITY includes software, data, monthly updates, and unlimited support. The cost is $495 per year.

PER/ZIP4 is a sophisticated CASS certified postal coding system for assigning ZIP Codes, ZIP+4, carrier route, and delivery point codes. PER/ZIP4 also provides county names and FIPS codes. PER/ZIP4 can be used interactively, in batch, and with callable programs. PER/ZIP4 includes software, data, monthly updates, and unlimited support. The cost is $3,900 for the first year, and $1,950 for renewal.

Just call us and we'll arrange for 30 days FREE use of either
ZIP/CITY or PER/ZIP4.

WorksRight Software, Inc.
Phone: 601-856-8337
Fax: 601-856-9432
E-mail: software@worksright.com
Web site: www.worksright.com


Senior Technical Editor: Ted Holt
Technical Editor: Joe Hertvik
Contributing Technical Editors: Edwin Earley, Brian Kelly, Michael Sansoterra
Publisher and Advertising Director: Jenny Thomas
Advertising Sales Representative: Kim Reed
Contact the Editors: To contact anyone on the IT Jungle Team
Go to our contacts page and send us a message.

Sponsored Links

Halcyon Software:  Register now for our Multi-Platform Virtualization Webinar, March 31, 10 a.m.
System i Developer:  RPG & DB2 Summit in Orlando, April 15-17 for 3 days of serious training
COMMON:  Join us at the 2009 annual meeting and expo, April 26-30, Reno, Nevada


 

IT Jungle Store Top Book Picks

Easy Steps to Internet Programming for AS/400, iSeries, and System i: List Price, $49.95
The iSeries Express Web Implementer's Guide: List Price, $49.95
Getting Started with PHP for i5/OS: List Price, $59.95
The System i RPG & RPG IV Tutorial and Lab Exercises: List Price, $59.95
The System i Pocket RPG & RPG IV Guide: List Price, $69.95
The iSeries Pocket Database Guide: List Price, $59.00
The iSeries Pocket SQL Guide: List Price, $59.00
The iSeries Pocket Query Guide: List Price, $49.00
The iSeries Pocket WebFacing Primer: List Price, $39.00
Migrating to WebSphere Express for iSeries: List Price, $49.00
Getting Started With WebSphere Development Studio Client for iSeries: List Price, $89.00
Getting Started with WebSphere Express for iSeries: List Price, $49.00
Can the AS/400 Survive IBM?: List Price, $49.00
Chip Wars: List Price, $29.95


 
The Four Hundred
Wall Street Makes IBM, Sun Strange Bedfellows?

Measure Twice, Cut Once Applied to ERP Implementations

UCG Partners with MaxAva, Expands DR and HA Capabilities

As I See It: Generation Gap

BCD Cranks Up Services, Training for PHP Deployments

Four Hundred Stuff
Capitalware Clamps Down on WebSphere MQ's 'Big Dirty' Security Secret

FalconStor Casts a Wider De-Duplication Net

Help/Systems Extends i OS Job Scheduler to Linux and Unix

Shield Adds More Smarts to Budget HA Software

Appian Expands Web-Based Reporting GPS Tracking for Fleets

Four Hundred Monitor
Four Hundred Monitor's
Full iSeries Events Calendar

System i PTF Guide
March 21, 2009: Volume 11, Number 12

March 14, 2009: Volume 11, Number 11

March 7, 2009: Volume 11, Number 10

February 28, 2009: Volume 11, Number 9

February 21, 2009: Volume 11, Number 8

February 14, 2009: Volume 11, Number 7

TPM at The Register
Oracle raises software prices on IBM's Power6 iron

IBM sics lawyers on Bigger Indigo deal

Platform lands OCS cluster deal with HP

SAS schemes $70m biz analytics cloud

Sun's Niagara gets Linux (again)

Penguin floats hybrid Linux supers

Sun and IBM - What price Bigger Indigo?

Sun pitches new cloud as 'Open Platform'

Rackable shrinks CloudRack cookie sheets

Sun breaks through the clouds

California: Cisco gives out some details, finally

Sun lands Sparc-Xeon super on Cape Town

Cisco throws California virt-server gauntlet

Storage software bucks hardware sales trend

THIS ISSUE SPONSORED BY:

WorksRight Software
ProData Computer Services
Halcyon Software


Printer Friendly Version


TABLE OF CONTENTS
Treasury of new DB2 6.1 (V6R1) Features, Part 4: Index and Data Type Enhancements

Looking for Commitment, Part 3

Catching Robot/SCHEDULE Job Failures As They Happen

Four Hundred Guru

BACK ISSUES

From the IT Jungle Forums
How to return value from CL program?

ADO.NET/IBM.Data.DB2.iSeries/ iDB2Connection

Order by alias names

SQL procedure

Insert via Java

iSeries Access for Web

Mimix installation and configuration docs




 
Subscription Information:
You can unsubscribe, change your email address, or sign up for any of IT Jungle's free e-newsletters through our Web site at http://www.itjungle.com/sub/subscribe.html.

Copyright © 1996-2009 Guild Companies, Inc. All Rights Reserved.
Guild Companies, Inc., 50 Park Terrace East, Suite 8F, New York, NY 10034

Privacy Statement