Let One Row Represent a Group
April 28, 2010 Ted Holt
Dear Esteemed and Highly Competent Professional,
From time to time, I need to retrieve one row (record) to represent a group of rows that share similar characteristics. It’s a good technique, one that I’d like to share with the readers of this unpretentious publication.
Assume a table (physical file) of sales order details. Each row represents a line of a customer order. Some customers submit an entire order at the same time, but others prefer to keep one order open with us, adding new lines to the order as needed. Here are the columns (fields) of the sales order detail file.
Column Type Size Description ========== ========== ==== ========================= ORDERNO Packed 5,0 Order number LINENO Packed 3,0 Line number ITEMNO Character 6 Item number QTYORDERED Packed 7,0 Quantity ordered QTYSHIPPED Packed 7,0 Quantity shipped PRICE Packed 7,2 Unit price CRTTIME Timestamp Date/time created CRTUSER Character 10 Created by user CHGTIME Timestamp Date/time of last change CHGUSER Character 10 Changed by user
We’ve been asked to list all open orders (those that have not been fully shipped), but we are only to show the earliest unshipped line. That is, an order may have a dozen unshipped lines, but only the first unshipped one is to be retrieved from the database.
The unshipped lines are the ones where quantity shipped is less than quantity ordered. As for the earliest order line, that would be the one with the earliest creation timestamp.
This first step is to find the key information for the earliest open line. If no two order lines within an order can have the same timestamp, then this select statement finds the first unshipped line of each order.
select OrderNo, min(CrtTime) as CrtTime from salesordd where QtyShipped < QtyOrdered group by OrderNo order by OrderNo
The result set looks like this:
ORDERNO CRTTIME ======= ========================== 101 2010-04-21-11.15.49.000000 102 2010-04-21-10.05.00.000000 103 2010-04-21-14.37.31.000000 105 2010-04-21-10.30.03.030000
That’s great, but where are all the other fields on the line? They’re missing, because we can’t include non-grouping fields in a grouping query.
The technique we need is to join this result set back to the dataset from which it came, using an inner join, like this:
with FirstOpenOrderLine as (select OrderNo, min(CrtTime) as CrtTime from salesordd where QtyShipped < QtyOrdered group by OrderNo) select d.OrderNo, d.LineNo, d.QtyOrdered, d.QtyShipped, (d.QtyOrdered - d.QtyShipped) as Balance, d.Price, (d.QtyOrdered - d.QtyShipped) * d.Price as Extended, substr(char(d.crttime), 1, 16) as TimeEntered from salesordd as d join FirstOpenOrder as fool on d.OrderNo = fool.OrderNo and d.CrtTime = fool.CrtTime order by d.OrderNo, d.crttime
The common table expression creates the result set we just saw, giving it the name FirstOpenOrderLine. The main select joins FirstOpenOrderLine to the table (or view) from which it was built, SALESORDD. The result set looks like this:
ORDERNO LINENO ORDERED SHIPPED BALANCE PRICE EXTENDED TIMEENTERED ======= ====== ======= ======= ======= ===== ======== =============== 101 2 12 0 12 1.00 12.00 2010-04-21-11.15 102 1 10 5 5 2.00 10.00 2010-04-21-10.05 103 3 5 0 5 1.10 5.50 2010-04-21-14.37 105 2 6 0 6 2.00 12.00 2010-04-21-10.30
Let’s take it a step further. Suppose two lines of an order can have the same timestamp field, down to the microsecond. Then what? Let’s make an executive decision that we’ll retrieve the row with the lowest line number. Our query changes slightly.
with FirstTimestamp as (select OrderNo, min(CrtTime) as CrtTime from salesordd where QtyShipped < QtyOrdered group by OrderNo), FirstOpenOrderLine as (select d.OrderNo, d.CrtTime, min(LineNo) as LineNo from salesordd as d join FirstTimeStamp as fts on d.OrderNo = fts.OrderNo and d.CrtTime = fts.CrtTime where d.QtyShipped < d.QtyOrdered group by d.OrderNo, d.CrtTime) select d.OrderNo, d.LineNo, d.QtyOrdered, d.QtyShipped, (d.QtyOrdered - d.QtyShipped) as Balance, d.Price, (d.QtyOrdered - d.QtyShipped) * d.Price as Extended, substr(char(d.crttime), 1, 16) as TimeEntered from salesordd as d join FirstOpenOrderLine as fool on d.OrderNo = fool.OrderNo and d.CrtTime = fool.CrtTime and d.LineNo = fool.LineNo order by d.OrderNo, d.crttime
The first common table expression gets the earliest order entry date of unshipped orders. The second common table expression gets the first line for each order and earliest date. The main select now includes another join field: line number.
I used this technique recently, when I had to retrieve a field (column) from the first operation of each item’s routing. In my shop, the first operation is normally given a sequence of 10, and subsequent operations are numbered in increments of 10 to allow for insertions. Thus, the first operation in most routings is number 10, but the first routing for some items is operation five.
I thought I had published this tip already, but apparently I have not, according to Google. If I did, the review won’t hurt.