Existence Tests and Left Exception Joins
March 23, 2005 Dear Colleagues
SQL has changed a lot since I first used it in 1984, and I can’t think of a single change that hasn’t been for the better. One of the best improvements, to my way of thinking, is the addition of the JOIN clause in SELECT queries.
I have never liked the EXISTS predicate. I find it hard to read. Sometimes it is difficult to make EXISTS yield the results I want. Fortunately, I have found that the LEFT EXCEPTION JOIN makes a good substitute for EXISTS in many cases. The left exception join returns rows from the primary table that have no match in the secondary table. There is also a right exception join, which returns rows from the secondary table that have no match in the primary table.
Suppose I wish to find sales orders that do not belong to a valid customer number. One way to make my request is with the negated IN predicate.
select * from orderhdr where custnbr not in (select custnbr from customer)
IN breaks down if there are two or more fields that I have to check. I talked about this problem in “Using SQL to Find Duplicate Records”. The solution I proposed to that challenge was to use the EXISTS predicate.
Suppose, then, that orders refer to customers by company number and customer number. IN is out, but I can use EXISTS.
select * from orderhdr as o
where not exists
(select * from customer as c
where c.company = o.company
and c.custnbr = o.custnbr)
The LEFT EXCEPTION JOIN equivalent looks more natural to me.
select o.* from orderhdr as o
left exception join customer as c
on o.company = c.company
and o.custnbr = c.custnbr
It’s almost like the usual syntax I use to join two files. The word left is optional, by the way.
Let’s complicate the example even more. Let’s say there’s a record code in the customer file that indicates whether a customer is active or inactive. Let’s retrieve orders for non-existent customers and inactive customers. Here’s the query using EXISTS.
select * from orderhdr as o
where not exists
(select * from customer as c
where c.company = o.company
and c.custnbr = o.custnbr
and reccd = 'A')
The LEFT EXCEPTION JOIN requires a bit of surgery. We must replace CUSTOMER with a nested query.
select o.* from orderhdr as o
left exception join
(select * from customer
where reccd = 'A') as c
on o.company = c.company
and o.custnbr = c.custnbr
If you prefer the exception join, don’t give up EXISTS just yet. I have had less than stellar success in joining three or more tables with the left exception join.
Let’s add an order details file and an item file to the query. The order details file may have many records for each order header file record.
Let’s look for orders that reference a non-existent customer, an inactive customer, or a non-existent item.
My first attempt at a left exception join looks like this:
select o.ordernbr, o.company, o.custnbr, d.itemnbr
from orderhdr as o
join orderdtl as d
on o.ordernbr = d.ordernbr
left exception join (select * from customer
where reccd = 'A') as c
on o.company = c.company
and o.custnbr = c.custnbr
left exception join item as i
on d.itemnbr = i.itemnbr
Here is the equivalent EXISTS query.
select o.ordernbr, o.company, o.custnbr, d.itemnbr
from orderhdr as o
join orderdtl as d
on o.ordernbr = d.ordernbr
where not exists
(select * from customer as c
where reccd = 'A'
and o.company = c.company
and o.custnbr = c.custnbr)
and not exists
(select * from item as i
where d.itemnbr = i.itemnbr)
The problem with this query is that it only returns orders that fail to find an active customer and a valid part number. That does not satisfy the search criteria. The EXISTS version is easily corrected by replacing the AND before the last EXISTS clause with OR.
select o.ordernbr, o.company, o.custnbr, d.itemnbr
from orderhdr as o
join orderdtl as d
on o.ordernbr = d.ordernbr
where not exists
(select * from customer as c
where reccd = 'A'
and o.company = c.company
and o.custnbr = c.custnbr)
or not exists
(select * from item as i
where d.itemnbr = i.itemnbr)
If there’s an easy way to code this query with an exception join, I hope one of you knows it and will write to me. The following works, but I think it’s nasty.
select o.ordernbr, d.orderline, o.company, o.custnbr, d.itemnbr
from orderhdr as o
join orderdtl as d
on o.ordernbr = d.ordernbr
exception join
(select * from customer
where reccd = 'A') as c
on o.company = c.company
and o.custnbr = c.custnbr
union
select o.ordernbr, d.orderline, o.company, o.custnbr, d.itemnbr
from orderhdr as o
join orderdtl as d
on o.ordernbr = d.ordernbr
exception join item as i
on d.itemnbr = i.itemnbr
The SELECT preceding the UNION looks for orders for invalid or inactive customer numbers. The SELECT following the UNION looks for orders with invalid item numbers. The UNION operator combines the two result sets, eliminating the duplicates in the process.
with OrderData as (
select o.ordernbr, d.orderline, o.company, o.custnbr, d.itemnbr
from orderhdr as o
join orderdtl as d
on o.ordernbr = d.ordernbr)
select o.*
from OrderData as o
exception join
(select * from customer
where reccd = 'A') as c
on o.company = c.company
and o.custnbr = c.custnbr
union
select o.*
from OrderData as o
exception join item as i
on o.itemnbr = i.itemnbr
The following might be slightly better, but I’m not thrilled with it either.
If you’re running i5/OS V5R3, you can also use the EXCEPT operation to find rows in one table that have no match in another table. See Michael Sansoterra’s excellent article “More V5R3 SQL Enhancements” for more information on this.
Looking for data that does not correlate to other tables is an everyday necessity in information processing installations. It’s good that IBM continues to give us newer ways to find it.
–Ted



