Newsletters Subscriptions Media Kit About Us Contact Search Home

Stuff
OS/400 Edition
Volume 2, Number 17 -- August 28, 2003

The NULL Nemesis


by Michael Sansoterra

After writing "The Valuable NULL," which enumerated the benefits and snares of using NULLs in an SQL environment, I received an interesting response from a reader. He stated that NULLs were more trouble than they're worth because they usually require extra logic to process correctly.

Though it's true that SQL programmers have more to think about when dealing with NULL-capable fields, the benefits of NULLs are worth the extra effort, provided one understands how NULLs behave in expression and conditional operations. (Before I understood these features, NULLs were a great source of grief!) But the reader's statement about NULLs requiring special logic and being an all around pain is certainly true when using NULL-capable fields with high-level languages like C, RPG, or COBOL.

The reason for this difficulty in dealing with NULLs is that high-level languages do not recognize NULL values (most older HLLs were developed before databases allowed for NULLs). Here's an illustration of the problem. Say a database table contains a field called CONTACTNAM, defined as CHAR(30), which allows NULL values. A HLL program reads this database table and encounters a record where the field CONTACTNAM is NULL. Since variables in an HLL program don't know what a NULL is, what value is placed in the variable? In this case, the answer is a blank. In general, when a database field contains a NULL, the HLL program will contain the field's "low value." For example, NULL character fields will contain a blank, NULL numerics will contain a zero, and NULL dates will contain the lowest possible date value for the specified date format (for example, 0001-01-01 for a date defined with the *ISO date format).

Technically, these low-value assignments are incorrect, because, as described my last article, a NULL represents a missing or unknown value. If a numeric field is NULL, which is different from a zero, how can the HLL program know the difference? To allow HLLs to recognize NULL values, a special indicator is used to designate that the variable is NULL. These special "indicators" take the form of variables or specialized built-in functions. It is the responsibility of the HLL programmer to go to the extra effort of checking these NULL indicators to distinguish if the value in the variable is actually the default low value or a NULL. This article focuses on using these indicators in a high-level language to correctly work with NULLs.

Defining NULL-Capable Fields

Here's 'a quick review of how fields in database files are defined to hold a NULL. Shown below are the equivalent SQL and DDS statements needed to create a simple "orders" file. The only field allowed to contain a NULL in this file is ShipDate:

SQL
CREATE TABLE Orders (
OrderID    INTEGER NOT NULL,
CustomerID CHAR(5) NOT NULL, 
OrderDate  DATE    NOT NULL,
ShipDate   DATE,
Unique (OrderID))

DDS
A                                      UNIQUE
A          R ORDERREC
A            ORDERID        9B 0
A            CUSTOMERID     5A
A            ORDERDATE       L         DATFMT(*ISO)
A            SHIPDATE        L         DATFMT(*ISO)
A                                      ALWNULL
A          K ORDERID

In SQL, fields are assumed to be NULL-capable (meaning they are allowed to contain a NULL) unless the NOT NULL keywords are specified. However, fields defined in DDS are assumed not to be NULL-capable unless the ALWNULL keyword is specified.

When defining files, only allow fields to contain NULLs where it makes sense to have NULLs. Don't unnecessarily create NULL-capable fields, because each time a field is NULL-capable extra logic may be required. In general, if the program has the ability to determine the value for a particular field, then the field should not be NULL-capable. For example, a field such as order date should not be NULL-capable, because the program will always know the date the order was taken. However, a ship date field can be "NULL-capable", because the program may not know the actual ship date at the time the order is taken and the program shouldn't shove an arbitrary date value in the field to represent this "unknown."

Now that you know how NULLs are defined, take a look at how they're accessed, with languages such as RPG.

RPG/400 and NULLs

RPG/400 doesn't support processing NULLs in a database file. If a file contains NULLs, specifying the ALWNULL(*YES) compiler option on the Create RPG Program (CRTRPGPGM) command allows the program to access the file as input only, with the caveat that all NULL-capable fields contain the "default" value when a NULL is encountered. This means the RPG/400 program will have no way of distinguishing a NULL from a blank, for example. Only use the ALWNULL(*YES) compiler option if the program will process a field's default value the same as it would if it were a NULL. To be able to work with "nullable" database fields in an RPG/400 program, you must use embedded SQL with indicator variables (more on this later).

RPG IV and NULLs

Fortunately, RPG IV has NULL support for database fields that are NULL-capable. The %NULLIND built-in function is used to test if a field is NULL, or to set a field to NULL or not NULL. The %NULLIND BIF only works with NULL-capable database fields; you can't define a stand-alone field on the D-spec and use %NULLIND to set it to NULL. Program-described files can not contain NULL-capable fields. As with RPG/400, to process files with NULL-capable fields, the ALWNULL keyword must be specified at compile time or as an H-spec keyword. RPG IV's version of ALWNULL accepts one of the following values:

  • NO--Processing files containing NULL-capable fields is not allowed (default).

  • INPUTONLY--Files containing NULL-capable fields can be used in a read-only capacity.

  • USRCTL--Files with NULL-capable fields can participate in read and modification operations (like write, update, or delete), as well as in keyed operations.

To illustrate, assume we'll be working with the "orders" table defined above, and it contains a NULL-capable field called ShipDate. We need to write an order-inquiry program that does something special if ShipDate contains a NULL (that is, if the order hasn't been shipped). Consider the following code excerpt that demonstrates how %NullInd is used to test whether the field ShipDate is NULL:

H ALWNULL(*UsrCtl)
FOrders    IF   E           K Disk
.
.
.
C/Free
    ExFmt GetOrderScreen;
    Chain OrderID Orders;
    If %Found;
        If %NullInd(ShipDate)=*On;
        // Order hasn't been shipped
        EndIf; 
    Else; 
        // Order Not Found 
    EndIf; 
    ... 

As seen in the above example, %NULLIND returns an indicator (Boolean) value of *ON or *OFF to indicate the presence of a NULL in a field. Additionally, when used on the left side of an expression, %NULLIND can be used to manipulate whether a field contains a NULL:

// Set ShipDate to NULL
Eval %NullInd(ShipDate)=*On;

// Set ShipDate to a value
Eval %NullInd(ShipDate)=*Off;
Eval ShipDate=d'2003-08-01';

For clarification, don't attempt to use RPG's *NULL special value to set database fields to NULL, because *NULL is reserved for working with pointers. Also note that placing a value in a field with %NULLIND set to *ON is pointless:

// Set ShipDate to NULL
Eval %NullInd(ShipDate)=*On;
// If this file were 
// opened for update, 
// this statement has
// no effect.
Eval ShipDate=d'2003-08-01';
Update OrdersRec;

In the last example, the second EVAL statement (ShipDate=d'2003-08-01') is useless, as far as the database manager is concerned. If the file is updated when %NULLIND is on, any date value placed in ShipDate is ignored. However, as far as normal RPG processing is concerned, if ShipDate is moved to another field, the date value has meaning, because %NULLIND is only applicable, in the final outcome, to the database manager. Because of this behavior, it is important to implement special logic to check the %NULLIND before assigning or reading values from NULL-capable fields, because RPG itself doesn't pay any attention to %NullInd when doing its normal MOVE, MOVEL, and EVAL operations. The ILE RPG Reference guide has a section entitled "Database Null Value Support," which details a plethora of implications related to this behavior.

Keyed Operations

Keyed operations (like SETLL, READE, and CHAIN) involving one or more NULL-capable key fields are simple, as long as your key list is defined correctly. In this next example, the orders file has an index called OrdersShip built over it by ShipDate and CustomerID:

CREATE INDEX OrdersShip ON Orders (ShipDate, CustomerID)

An RPG program needs to be written to report all orders that have not shipped (that is, the ShipDate contains a NULL). This next section of code demonstrates how to search for NULL values in a key field:

H ALWNULL(*USRCTL)

FOrdersShipIF   E           K DISK
DCount            S              7  0 
DShipDateNull     S              1N

C     Key_ShipDate  KList
C                   KFld      ShipDateNull  ShipDate

C/Free
    // Read all records where
    // the shipdate is NULL
    // (When the NULL indicator is on
    //  the actual content of the ShipDate
    //  field is ignored.) 
    ShipDateNull=*On;
    SetLL Key_ShipDate OrdersShip; 
    ReadE Key_ShipDate OrdersShip;

    Dow Not %EOF; 
        Count=Count+1;
        // Process Records with a shipdate of NULL

        ReadE Key_ShipDate OrdersShip;
    EndDo;

    *InLR=*On;                                     
    Return;                                        
 /End-Free 

The only thing special to note here is the key field (KFLD) definition. Notice that Factor 2, which is normally left blank for the KFLD op code, has an indicator type variable called ShipDateNull specified. This indicator variable controls whether the RPG program should look for NULL values. In this case, since we're searching for records with NULL ship dates, the ShipDateNull indicator variable is turned *ON. To position to a non-NULL value, turn the indicator *Off and set the ShipDate field appropriately. If Factor 2 is omitted from the KFLD definition on a NULL-capable field, the program will have no way to look specifically for a NULL value. OS/400 V5R2 users will be disappointed to learn that the new %KDS built-in function doesn't support NULL positions using an indicator, so for now the traditional KFLD op code must still be used.

When doing keyed positions and reads, remember that, on the iSeries, NULLs are at the bottom of the sort order. If the NULL indicator in Factor 2 is *On, the database manager will ignore any values specified in the ShipDate field and skip right to the NULLs in the file, if any are present. If there are no NULLs to read, the End Of File (EOF) indicator will be turned on, because NULLs come last.

High-Level Languages, Embedded SQL, and NULLs

iSeries shops that have the "DB2 Query Manager and SQL Development Kit" installed (product 5722ST1) can use embedded SQL to access NULL-capable fields from inside an HLL program. Embedded SQL is too large a topic to be covered here,. but if you need help getting up to speed with embedded SQL review "Empower Users with Embedded SQL" or "Dynamic Selection with Embedded SQL."

Here is an RPG-embedded SQL program snippet that retrieves the ShipDate field from the orders file for a given OrderID:

D RPGShipDate     S               D
D RPGOrderID      S             10I 0

C                   Eval      RPGOrderID=11074
C/EXEC SQL
C+  SELECT ShipDate
C+    INTO :RPGShipDate
C+    FROM Orders
C+   WHERE OrderID=:RPGOrderID
C/END-EXEC 
C                   Select
C                   When      SQLCOD=*Zero
    OK
C                   When      SQLCOD=100
    Data not found
C                   Other
    Error/Warning given
C                   EndSl 

In case you're unfamiliar with embedded SQL, here's a quick review.

  • SQL statements are placed between compiler directives /EXEC SQL and /END-EXEC. The plus sign (+) in position 7 is a continuation marker, indicating that the SQL statement spans more than one line.
  • The SQLCOD (SQL code) variable is used by the RPG program to test the success of an SQL statement. SQLCOD and many other SQL related "status" fields are added automatically to the program by the SQL precompiler.
  • An SQL code of zero means the statement ran okay. An SQL code of 100 means the statement didn't return or process data. An SQL code with a negative value indicates an error condition, while an SQL code with a positive value indicates a warning.
  • The identifiers prefixed with a colon (:) are not field names from the database file but are actually RPG variable names. I've prefixed the RPG variable names with the letters RPG to minimize confusion between database fields and RPG variables. When the above SQL statement runs, SQL will look in the file for the OrderID that matches the value contained in variable RPGOrderID. When it finds the record, it will return the ShipDate field value into the RPG variable RPGShipDate. In SQL terminology, these variables are called "host variables", because they are from the "hosting" high-level-language program.
  • The SELECT INTO statement is similar to a CHAIN, and is usually used when there is only one record to be retrieved.

The concept is simple enough. The ShipDate for the given OrderID is placed in program variable RPGShipDate. But what if ShipDate contains a NULL in the file? The way the program is coded now, when the statement retrieves a NULL the SQLCOD will contain error code -305, which has the primary error description of "indicator variable required."

This message indicates that SQL can't return the NULL value into a HLL program variable without the assistance of an "indicator" variable. An indicator variable is simply an extra variable, defined with an SQL data type of SMALLINT, containing either a zero or a negative 1, to indicate if the given variable is NOT NULL or NULL, respectively. Here's the revised code using an indicator variable that will behave correctly if a NULL is read:

D ShipDateNull    S              5I 0
D NULL            S              5I 0 Inz(-1)
C                   Eval      OrderID=11074
C/EXEC SQL 
C+  SELECT ShipDate 
C+    INTO :RPGShipDate:ShipDateNull
C+    FROM Orders 
C+   WHERE OrderID=:OrderID
C/END-EXEC 
C                   Select
C                   When      SQLCOD=*Zero
C                   If        ShipDateNull=NULL
    ShipDate is NULL  Process accordingly
C                   Else 
    ShipDate is NOT NULL  Process accordingly
C                   EndIf 

When using indicator variables in embedded SQL, the ShipDateNull RPG variable is appended immediately after the variable name. As with the %NULLIND built-in function, NULL indicator variables demand that an extra step be taken to check if a variable is NULL, rather than a blank or a zero. When using embedded SQL, every potential field that can have a NULL should have an accompanying NULL indicator variable. If knowing the state of the NULL is unimportant to the HLL program, the COALESCE or IFNULL SQL functions can be used, to negate the need for an indicator variable by guaranteeing that the column will never be NULL.

It has already been noted that NULL indicator variables are supposed to be defined as the equivalent of the SQL SMALLINT data type (a two-byte binary field). In RPG IV, this can be coded by defining a field as "5I 0" on the D-specs. To determine how to define the SMALLINT data type in another language, see the DB2 Universal Database for iSeries SQL Programming with Host Languages guide (PDF format). This guide covers many facets of embedded SQL for each particular high-level language and includes charts on how to define a given HLL data type to match a corresponding SQL data type.

The Nagging Nulls

Using NULLs with high-level languages can be a bit of a chore. But as more applications use NULL-capable fields, remembering some simple rules about NULL indicator variables will allow high-level-language programs and NULLs to coexist peacefully.


Michael Sansoterra is a programmer/analyst for SilverLake Resources, an IT services firm based in Grand Rapids, Michigan. E-mail: msansoterra@silver-lake.com


Sponsored By
ASNA

Why we chose ASNA Visual RPG for .NET:

"WebSphere isn't the easiest software to install and work with. The big difference is that applications can be developed in AVR for .NET so much faster than with WebSphere and Java. [AVR for .NET] allows for the use of familiar legacy RPG language constructs and the use of the powerful .NET Framework at the same time. This allows our RPG/400 developers to be effective in the .NET environment quickly. Whether a customer wants a new application to be developed or an existing AS/400 application to be re-written, we firmly believe that AVR for .NET is the development platform of choice."
--Brian Bunney, PBSI

Learn more about AVR for .NET today!

www.asna.com/infoavrdotnet.aspx


THIS ISSUE
SPONSORED BY:

T.L. Ashford
ASNA
Lakeview Technology
Profound Logic Software


BACK ISSUES

TABLE OF
CONTENTS
Database Normalization, Part 2

The NULL Nemesis

Linux on iSeries Network Alternatives

OS/400 Alert: A New Feature for Midrange Programmer


Editors
Shannon O'Donnell
Kevin Vandever

Managing Editor
Shannon Pastore

Contributing Editors:
Howard Arner
Raymond Everhart
Joe Hertvik
Ted Holt
Marc Logemann
David Morris

Publisher and
Advertising Director:

Jenny Thomas

Advertising Sales Representative
Kim Reed

Contact the Editors
Do you have a gripe, inside dope or an opinion?
Email the editors:
editors@itjungle.com


Copyright © 1996-2008 Guild Companies, Inc. All Rights Reserved.