|
Let One Row Represent a Group
Published: April 28, 2010
by Ted Holt
Dear Esteemed and Highly Competent Professional,
From time to time, I need to retrieve one row (record) to represent a group of rows that share similar characteristics. It's a good technique, one that I'd like to share with the readers of this unpretentious publication.
Assume a table (physical file) of sales order details. Each row represents a line of a customer order. Some customers submit an entire order at the same time, but others prefer to keep one order open with us, adding new lines to the order as needed. Here are the columns (fields) of the sales order detail file.
Column Type Size Description
========== ========== ==== =========================
ORDERNO Packed 5,0 Order number
LINENO Packed 3,0 Line number
ITEMNO Character 6 Item number
QTYORDERED Packed 7,0 Quantity ordered
QTYSHIPPED Packed 7,0 Quantity shipped
PRICE Packed 7,2 Unit price
CRTTIME Timestamp Date/time created
CRTUSER Character 10 Created by user
CHGTIME Timestamp Date/time of last change
CHGUSER Character 10 Changed by user
We've been asked to list all open orders (those that have not been fully shipped), but we are only to show the earliest unshipped line. That is, an order may have a dozen unshipped lines, but only the first unshipped one is to be retrieved from the database.
The unshipped lines are the ones where quantity shipped is less than quantity ordered. As for the earliest order line, that would be the one with the earliest creation timestamp.
This first step is to find the key information for the earliest open line. If no two order lines within an order can have the same timestamp, then this select statement finds the first unshipped line of each order.
select OrderNo, min(CrtTime) as CrtTime
from salesordd
where QtyShipped < QtyOrdered
group by OrderNo
order by OrderNo
The result set looks like this:
ORDERNO CRTTIME
======= ==========================
101 2010-04-21-11.15.49.000000
102 2010-04-21-10.05.00.000000
103 2010-04-21-14.37.31.000000
105 2010-04-21-10.30.03.030000
That's great, but where are all the other fields on the line? They're missing, because we can't include non-grouping fields in a grouping query.
The technique we need is to join this result set back to the dataset from which it came, using an inner join, like this:
with FirstOpenOrderLine as
(select OrderNo, min(CrtTime) as CrtTime
from salesordd
where QtyShipped < QtyOrdered
group by OrderNo)
select d.OrderNo, d.LineNo, d.QtyOrdered, d.QtyShipped,
(d.QtyOrdered - d.QtyShipped) as Balance,
d.Price,
(d.QtyOrdered - d.QtyShipped) * d.Price as Extended,
substr(char(d.crttime), 1, 16) as TimeEntered
from salesordd as d
join FirstOpenOrder as fool
on d.OrderNo = fool.OrderNo
and d.CrtTime = fool.CrtTime
order by d.OrderNo, d.crttime
The common table expression creates the result set we just saw, giving it the name FirstOpenOrderLine. The main select joins FirstOpenOrderLine to the table (or view) from which it was built, SALESORDD. The result set looks like this:
ORDERNO LINENO ORDERED SHIPPED BALANCE PRICE EXTENDED TIMEENTERED
======= ====== ======= ======= ======= ===== ======== ===============
101 2 12 0 12 1.00 12.00 2010-04-21-11.15
102 1 10 5 5 2.00 10.00 2010-04-21-10.05
103 3 5 0 5 1.10 5.50 2010-04-21-14.37
105 2 6 0 6 2.00 12.00 2010-04-21-10.30
Let's take it a step further. Suppose two lines of an order can have the same timestamp field, down to the microsecond. Then what? Let's make an executive decision that we'll retrieve the row with the lowest line number. Our query changes slightly.
with FirstTimestamp as
(select OrderNo, min(CrtTime) as CrtTime
from salesordd
where QtyShipped < QtyOrdered
group by OrderNo),
FirstOpenOrderLine as
(select d.OrderNo, d.CrtTime, min(LineNo) as LineNo
from salesordd as d
join FirstTimeStamp as fts
on d.OrderNo = fts.OrderNo
and d.CrtTime = fts.CrtTime
where d.QtyShipped < d.QtyOrdered
group by d.OrderNo, d.CrtTime)
select d.OrderNo, d.LineNo, d.QtyOrdered, d.QtyShipped,
(d.QtyOrdered - d.QtyShipped) as Balance,
d.Price,
(d.QtyOrdered - d.QtyShipped) * d.Price as Extended,
substr(char(d.crttime), 1, 16) as TimeEntered
from salesordd as d
join FirstOpenOrderLine as fool
on d.OrderNo = fool.OrderNo
and d.CrtTime = fool.CrtTime
and d.LineNo = fool.LineNo
order by d.OrderNo, d.crttime
The first common table expression gets the earliest order entry date of unshipped orders. The second common table expression gets the first line for each order and earliest date. The main select now includes another join field: line number.
I used this technique recently, when I had to retrieve a field (column) from the first operation of each item's routing. In my shop, the first operation is normally given a sequence of 10, and subsequent operations are numbered in increments of 10 to allow for insertions. Thus, the first operation in most routings is number 10, but the first routing for some items is operation five.
I thought I had published this tip already, but apparently I have not, according to Google. If I did, the review won't hurt.
Post this story to del.icio.us
Post this story to Digg
Post this story to Slashdot
|