Guru: Combine Related Rows Using SQL
February 12, 2018 Ted Holt
A reader writes: “Hey, Ted. In our ERP system, certain business objects, such as sales orders and purchase orders, can have multiple comment records. Is it possible, using SQL, to combine all the comment records for an order into one long comment and retrieve it as a column in a result set?”
I can relate to this. I can remember supporting an ERP system where not only the orders, but the order detail lines, could have such comments. End users depend heavily on such unstructured data to do their jobs. To answer your question, yes, it is possible and it isn’t difficult.
Let’s begin with two sales order tables — headers and comments:
create table slsordhd (OrderID dec(5), OrderDate dec(7), CustID dec(7), primary key (OrderID)); insert into slsordhd values (1, 1180130, 26424), (2, 1180131, 75179), (3, 1180201, 38493); create table slsordcmt (OrderID dec(5), Sequence dec(3), comment char(20), primary key (OrderID, Sequence)); insert into slsordcmt values (1, 1, 'Don''t ship without c'), (1, 2, 'hecking with Butch a'), (1, 3, 't x255.'), (2, 1, 'Partial shipment is '), (2, 2, 'NOT acceptable.'), (3, 1, 'Low priority');
Now we have three orders with comments. In the systems I’ve worked with, the comment fields are long, at least 80 bytes, but I made them only 20 bytes to keep the amount of data manageable.
The function that combines the rows is LISTAGG. It combines a set of string values and it can separate them with a separator string of your choice. One common use I’ve seen for this function is to build a string of comma-separated values.
Here’s the LISTAGG function to combine the comments for each order.
select OrderID, listagg(trim(comment)) within group(order by OrderID, Sequence) as Cmt from slsordcmt group by OrderID
|1||Don’t ship without checking with Butch at x255.|
|2||Partial shipment is NOT acceptable.|
The argument to LISTAGG is the comment field without leading or trailing blanks. Only you can decide if you should retain blanks or not.
The last line tells the query to group rows on a common order ID. GROUP BY has been around for decades, so chances are you’re very familiar with it.
The WITHIN GROUP may be new to you. This clause is part of LISTAGG, and its purpose is to tell the query engine how to sequence the rows within each group. In this case, I said that I want the comment rows sorted by order ID and sequence number.
The order comment is probably not useful by itself. You probably want to combine it with other data. In the following query, I use a lateral subquery to combine it with fields from the order header file. LATERAL lets me run the second SELECT, which has the LISTAGG function, for each row of the first SELECT. The good thing about LATERAL is that the second SELECT is permitted to reference columns from the first SELECT. In this case, that’s h.OrderID. If you’re still not comfortable with the lateral subquery, sometimes called lateral correlation or lateral join, see the related stories at the end of this article.
select h.OrderID, h.OrderDate, h.CustID, x.Cmt from slsordhd as h, lateral (select OrderID, listagg(trim(comment)) within group(order by OrderID, Sequence) as Cmt from slsordcmt as c where c.OrderID = h.OrderID group by OrderID) as x where h.OrderDate <= 1180131
|1||1180130||26424||Don’t ship without checking with Butch at x255.|
|2||1180131||75179||Partial shipment is NOT acceptable.|
Of course, you may want even more tables and/or views in the join. Here’s the same query with the customer master table added in order to get the customer name. I was able to join the tables first and then tack the LATERAL on afterward.
select h.OrderID, h.OrderDate, h.CustID, cu.CustName, x.Cmt from slsordhd as h left join cust as cu on h.CustID = cu.CustID, lateral (select OrderID, listagg(trim(comment)) within group(order by OrderID, Sequence) as Cmt from slsordcmt as c where c.OrderID = h.OrderID group by OrderID) as x where h.OrderDate <= 1180131
|1||1180130||26424||ACME||Don’t ship without checking with Butch at x255.|
|2||1180131||75179||SUPERIOR||Partial shipment is NOT acceptable.|
That’s all there is to it. Play with it a bit and you’ll be a LISTAGG wizard in no time.
Using Lateral Correlation To Define Expressions In DB2 For i
This is fantastic. Thank you Ted!!
The following statement does not work on my V7R1 system.
within group(order by OrderID, Sequence) as Cmt
group by OrderID
It says “Keyword GROUP not expected. Valid tokens: , FROM INTO.”
Any idea why ?
Thanks & Regards
SQL BiF LISTAGG added with v7. 3 R2 and v7. 2 R6…
I would add a good standard practice would be to include ON OVERFLOW TRUNCATE so the aggregated result string is truncated if the actual length of the result string exceeds the result length. See https://www.itjungle.com/2018/02/12/guru-combine-related-rows-using-sql/