Newsletters Subscriptions Media Kit About Us Contact Search Home

Stuff
OS/400 Edition
Volume 2, Number 12 -- June 5, 2003

The Valuable NULL


by Michael Sansoterra

Nil, Nada, and NULL: One of these terms is of great benefit to the database programmer; the others are worthless. Coming from the "old school" world of flat files, COBOL and RPG, I had never heard of a NULL until I worked with Microsoft Access, in 1994. Later I came across NULLs on the iSeries. While I found NULLs problematic at first, I eventually found them to be a valuable aid in database development. If NULLs exhibit behavior that amuses or confuses you, read on to find out what NULLs are good for and how they operate.

So what is a NULL? The Oracle Glossary of Terms gives as good a definition as any: "A null value represents missing, unknown, or inapplicable data."

So a NULL is a "non-value"; that is, a NULL indicates that there is no data in a table column or variable. A common error is to assume that a numeric column containing a zero is a NULL. A similar error assumes blanks or an empty string is a NULL. Both of these views are incorrect because zeros, blanks, and an empty string are indeed values. Also, don't confuse the NULL string terminator (ASCII 0) in the C programming language with the NULL discussed here. The NULL in C is a value.

I'll be using the SQL Procedure Language (SPL) to demonstrate a few concepts. For easy reading, all SPL variable names will be prefixed with the @ sign. You can easily substitute embedded SQL here, remembering to replace the @ variables with host variables, prefixed by a colon (:), and also remember to use NULL indicator variables so that the host language can act appropriately on a NULL. (For more on embedded SQL and indicator variables, see the DB2 Universal Database for iSeries SQL Programming with Host Languages guide, in PDF format.)

Of What Value Is a Non-Value?

Since NULLs aren't values, what purpose do they serve? Why would you want to store a NULL in, say, a numeric column rather than just letting it contain a zero? Below are a few examples that demonstrate the benefit of NULLs.

Consider an application for a teacher that tracks students' test scores. Each student is allowed to miss a single test without hurting his grade. Say a student has the following 10 test scores recorded:


Score 1 Score 2 Score 3 Score 4 Score 5 Score 6 Score 7 Score 8 Score 9 Score 10
93.0 86.0 89.0 92.0 94.0 98.0 NULL 84.0 90.0 95.0

From the list, you can see that no score was entered for test 7 , the one "freebee." Assuming these test scores are stored as rows in a table, SQL's average function can be used to determine the average of these 10 scores, with the result of 821 / 9 = 91.2. But wait! There are 10 scores, so how come only nine were used in the average calculation? That's because the SQL aggregate functions ignore NULLs. NULL is better than a zero here, because if a zero were entered the average would be incorrectly calculated as 821 / 10 = 82.1. Since NULL is not a number, it's not included in the average, which saves the programmer from programmatically weeding out the lowest score before calculating the average.

In addition to AVG, other SQL column functions, such as COUNT and SUM, do not regard NULL values in their aggregations.

Below is a sample table called TestScores, which has a column, called Score, that is capable of holding the NULL value (which is a default; specifying NOT NULL prevents a column from containing a NULL).

CREATE TABLE TestScores (
School_Year SMALLINT NOT NULL,
Student_No  SMALLINT NOT NULL,
Test_No     SMALLINT NOT NULL,
Score       DEC(4,1),
PRIMARY KEY (School_Year, Student_No, Test_No))

Here is a sample SELECT statement that would average a student's scores, ignoring NULL values. It also COUNTs the number of scores to ensure that one has been left NULL:

  SELECT Student_No, AVG(Score) AS Average,
      /* Verify we have 10 scores entered */
         CASE WHEN COUNT(Student_No) <>10 
              THEN 'ERROR'
              ELSE 'OK' END AS TestCount,
      /* Verify only one score is NULL    */
         CASE WHEN COUNT(Score) <>9
              THEN 'ERROR'
              ELSE 'OK' END AS ScoreCount
    FROM TestScores
   WHERE "SCHOOL_YEAR"=2002
GROUP BY Student_No
ORDER BY Student_No

For the next example on the benefits of NULL, consider an application for tracking church contributions. Each individual who gives on a regular basis is assigned a box of envelopes identifying an envelope with a contributor. Of course, each box of envelopes must have a unique number. A table for storing individual church membership information may be defined as follows:

CREATE TABLE DataLib/Indiv ( 
Family_No   INTEGER  NOT NULL, 
Indiv_No    INTEGER  NOT NULL, 
FirstName   CHAR(30) NOT NULL, 
LastName    CHAR(30),
Envelope_No SMALLINT,
PRIMARY KEY (Family_No,Indiv_No))

To guarantee column Envelope_No is unique, you build the following index:

CREATE UNIQUE WHERE NOT NULL INDEX DataLib/INDIVENV
                                ON DataLib/INDIV (Envelope_No)

Not every individual member is a regular contributor. Some don't give at all and others give cash. These people won't have an envelope number, so the Envelope_No column will be NULL for these folks. Will this scenario still work? Yes, because when the unique index is built, the WHERE NOT NULL phrase is used to instruct the index to ignore NULLs. The benefit that NULL provides in this scenario is that the index on the Envelope_No column is still guaranteed to be unique, with the exception that NULLs are ignored.

This situation would be invalid if zeros were entered in the envelope number column, because zeros are values and would be included in the duplicate check. Further, programming time is again saved by having to avoid checks for duplicates or from building specialized select/omit logicals to enforce uniqueness for non-zero entries.

The envelope number and test score examples are cases in which NULLs are intended to represent "missing" data. An example of a NULL being "unknown" would be the actual ship date on an order header record. In this case, the actual ship date is unknown and is therefore left NULL until the goods are shipped and a date value is assigned to the column. For the last sample, let's examine a circumstance in which a NULL would represent "inapplicable" data.

Consider the following query to calculate the growth percentage of an item from one year to the next:

/* If there were no sales last year, the growth % will be NULL */
Select Item, CurSales, LYSales,
       Case When LYSales<>0
            Then (CurSales-LYSales)/LYSales*100
       End As GrowthPct
  From SalesSummary

What should we do for new items in which the prior year's sales are zero? In this circumstance, many programmers arbitrarily assign the percentage a value of either 0 or 100. In my opinion, returning NULL is appropriate here, representing "inapplicable" data, because the calculation doesn't apply when there are no sales for the prior year.

In case you're wondering, when omitting the ELSE from the CASE statement, a NULL is returned when none of the preceding conditions are met. In other words, if you don't specify an ELSE, SQL will implicitly treat the CASE statement as if an ELSE NULL were coded.

NULL Pitfalls

Now that you've seen how NULLs can be put to good use, let's consider some potential problems. Not understanding how NULLs operate in expressions and comparisons was one of the biggest pitfalls I encountered when first using NULLs.

In expressions, any operand that contains a NULL will cause the result of the expression to be NULL. For example, 5 + NULL will evaluate to NULL. Combining a string and a NULL such as 'CONSTANT' || NULL will yield a NULL. This behavior can cause unexpected results. Assume you have the following calculation to get an order total:

--
--  Get Customer, Taxable Flag, Order Value (sum of line items)
--
   SELECT Order.CustomerID,
          COUNT(OrderLine.LineSeq) AS NoLines,
          SUM(OrderLine.ExtPrice) AS OrderValue
     INTO @CustomerID,
          @NoLines, @OrderValue
     FROM Order
     JOIN Customer  ON Customer.CustomerID=Order.CustomerID
LEFT JOIN OrderLine ON OrderLine.OrderID=Order.OrderID
    WHERE Order.OrderID=@OrderID
 GROUP BY Order.CustomerID;

--
-- Get taxes, freight and other misc charges
-- for the order
--
SET @OrderCharge=SELECT SUM(ChgAmt)
                   FROM OrderCharges
                  WHERE OrderID=@OrderID;

SET @OrderTotal=@OrderValue+@OrderCharge;

Normally, the @OrderValue will contain the sum of the line items and @OrderCharge will contain charges other than the merchandise, such as taxes and shipping. When there are no special charges, @OrderCharge will have a zero. The sum of both will be @OrderTotal.

But what happens if you have an order that has no order lines but does have special charges (say a single line item for labor). Because of the LEFT JOIN, @OrderValue will contain a NULL; @OrderCharge will have the value of the special charges. The @OrderTotal will contain a NULL because adding a NULL to the @OrderCharge will yield a NULL, but we expected @OrderTotal to equal the @OrderCharge!

Preventing NULL Mayhem

How can you guarantee that NULLs will not throw a monkey wrench in the works whenever you encounter them? The answer lies in the IFNULL built-in function. Its syntax is:

IFNULL ( expression , expression )

The function will return the first non-null result in the list. The only requirement of the function is that all values in the list have compatible data types (that is, you can't mix alpha and numeric). We could revise the @OrderTotal calculation using IFNULL as follows:

SET @OrderTotal=IFNULL(@OrderValue,0)+@OrderCharge;

Using IFNULL in this way will always substitute a zero in the event that @OrderValue contains a NULL, thereby preventing the result from being NULL. The same should be done for @OrderCharge, although the only time it would be NULL on the iSeries is when the OrderCharges table is empty.

It is important to bulletproof your code for potential NULLs using the IFNULL function because an unexpected NULL can cause your mathematical or comparison operations to fail. Two functions similar to IFNULL are COALESCE and VALUE, which also pick the first non-null value from a list of values. These three functions really operate in the same way; portability is the biggest consideration when deciding which one to use.

NULLs, Comparison Operators, and Connectors

Further, NULLs affect the results of comparison operations in, what was for me, a surprising way. Consider the following user-defined function:

Create Function SQLData.CompTest(@Dummy Integer)
Returns VarChar(20)
Language SQL 
Deterministic 
Begin 

Declare @Result VarChar(20); 
 
If @Dummy=5 Then 
    Set @Result = 'Dummy Is 5'; 
ElseIf @Dummy<>5 Then 
    Set @Result = 'Dummy Is Not 5'; 
Else  
    Set @Result = 'NULL'; 
End If; 
 
Return @Result; 
 
End

At first glance it may appear that the ElseIf statement will catch all other possibilities if parameter @Dummy is not equal to five, thereby making the last ELSE statement unnecessary. This is true, except when @Dummy is NULL, in which case the stand-alone ELSE is evaluated and @Result is set to the constant NULL.

The reason for this behavior is revealed when you consider that a comparison operator normally evaluates to TRUE or FALSE. However, when one of the values is NULL, the comparison no longer evaluates to TRUE or FALSE; it evaluates to NULL.

In the IF statement above, when @DUMMY is NULL, the condition @DUMMY=5 is evaluated as a NULL (not FALSE) and so the ELSEIF is invoked. The ELSEIF @DUMMY<>5 will also be evaluated as a NULL (it is not TRUE, even though indeed @DUMMY is not 5). Finally, because the ELSEIF condition wasn't evaluated as TRUE, the final ELSE condition is evaluated.

The same factors need to be kept in mind when dealing with the ANDs and ORs (also called connectors). The tables below demonstrate how NULLs will affect comparisons involving ANDs and ORs.


Condition: Expr1 Or Expr2

Expr 1 Expr 2 Result
True True True
True False True
False True True
False False False
NULL True True
True NULL True
NULL False NULL
False NULL NULL

Condition: Expr 1 And Expr 2

Expr 1 Expr 2 Result
True True True
True False False
False True False
False False False
NULL True NULL
True NULL NULL
NULL False NULL
False NULL NULL

Once again, you can avoid having to worry about a special case for NULL by using IFNULL to substitute a zero if @Dummy contains a NULL:

If IFNULL(@Dummy,0)=5 Then 
    Set @Result = 'Dummy Is 5'; 
Else
    Set @Result = 'Dummy Is Not 5'; 
End If; 

Since you can't use equal (=) or not equal (<>) to test for a NULL, how do you do a comparison? The answer is to use IS NULL or IS NOT NULL respectively. Constructing a statement like the following is syntactically not allowed, and furthermore is illogical, because the result would always be NULL:

If @Dummy=NULL Then 

Here is the proper way to test for a NULL:

If @Dummy IS NULL Then
    Set @Result = 'NULL';
ElseIf @Dummy=5 Then
    Set @Result = 'Dummy Is 5'; 
Else
    Set @Result = 'Dummy Is Not 5'; 
End If; 

The NULL Sort

For a final consideration of NULL, let's ask "where does the NULL fall in the sort order on the iSeries?" The answer is that when an ORDER BY is issued against a column containing NULLs, the NULLs come last. Unfortunately, this behavior isn't consistent among SQL platforms, which can make portability problematic. For instance, Microsoft SQL Server and Microsoft Access both place NULLs at the top of the sort order.

NULL Sense

Understanding the advantages of using NULLs where appropriate will contribute to better database design. Further, making sense of the expression and comparison issues involved with NULLs, and correctly designing applications, will save many hours of troubleshooting heartache.

Finally, beware that NULLs don't necessarily behave in the same way across all platforms; the sort order of the NULL is one example. Platforms may vary in the way that they handle such things as string concatenation or comparison tests. When writing portable applications, make sure to pay close attention to these "NULL" issues.


Michael Sansoterra is a programmer/analyst for SilverLake Resources, an IT services firm based in Grand Rapids, Michigan. E-mail: msans@netpenny.net


Sponsored By
T.L. ASHFORD

BARCODE400 by T.L. Ashford is the easiest
and fastest way to create and print Compliance
Labels directly from the AS/400 and iSeries.

Ashford's comprehensive library of Compliance formats is available to Barcode400 users. AIAG labels for Ford and Motorcraft, GM, and many more are available. BARCODE400 is backed by the best Technical Support Team in the industry.

FREE Guide to Bar Code Labeling

www.tlashford.com or call 800.541.4893


THIS ISSUE
SPONSORED BY:

T.L. Ashford
ASNA
WorksRight Software
Profound Logic Software


BACK ISSUES

TABLE OF
CONTENTS
The Valuable NULL

Measuring Performance from the Java Side

Reading an IFS File from RPG

Become the Master of Your Subfile Domain


Editors
Shannon O'Donnell
Kevin Vandever

Managing Editor
Shannon Pastore

Contributing Editors:
Howard Arner
Raymond Everhart
Joe Hertvik
Ted Holt
Marc Logemann
David Morris

Publisher and
Advertising Director:

Jenny Thomas

Advertising Sales Representative
Kim Reed

Contact the Editors
Do you have a gripe, inside dope or an opinion?
Email the editors:
editors@itjungle.com


Copyright © 1996-2008 Guild Companies, Inc. All Rights Reserved.