LAG And LEAD Functions In DB2 for i 7.3
May 31, 2016 Michael Sansoterra
Database devs, you can kiss many of your cursors goodbye, because DB2 for i has two sweet new functions named LAG and LEAD that will allow you to extract a column value from a different row within the result set (relative to the current row) without programmatically looping through each row.
From the current row, LAG instructs DB2 to look backward in the result set a specified number of rows and retrieve a value. LEAD allows DB2 to look ahead to a specified row beyond the current row and extract a value. In DB2 parlance, these new functions belong to the ordered OLAP specification.
Say your boss wants a simple sales report that summarizes sales by quarter. You come up with the following query, boom, done!
SELECT YEAR(OrderDate)||'-Q'||QUARTER(OrderDate) AS SalesQtr, SUM(SubTotal) AS SalesAmount FROM SalesOrderHeader WHERE OrderDate>='2006-01-01' GROUP BY YEAR(OrderDate)||'-Q'||QUARTER(OrderDate) ORDER BY SalesQtr
But then, for ease of comparison, the boss wants to see the prior quarter’s value in every row, in its own column next to the current quarter’s value. One SQL technique is to create a temp table and then use a cursor to save off the prior row’s data. Or, you could use a common table expression (CTE) with the ROW_NUMBER function and do a self join to the prior row (yuk!) to get both values in the same row. But then you remember you’re on IBM i 7.3, the greatest DB2 for i release yet, and that the LAG function can look back at a prior row for a value and include it in the current row.
After wrapping the above query in a CTE (because you can’t nest a SUM aggregate function within the LAG function), simply add the LAG function to get the SalesAmount value from the prior row:
WITH QuarterlySales AS ( SELECT YEAR(OrderDate)||'-Q'||QUARTER(OrderDate) AS SalesQtr, SUM(SubTotal) AS SalesAmount FROM SalesOrderHeader WHERE OrderDate>='2006-01-01' GROUP BY YEAR(OrderDate)||'-Q'||QUARTER(OrderDate) ) SELECT q.*, LAG(SalesAmount,1) OVER(ORDER BY SalesQtr) AS PriorQtr FROM QuarterlySales q ORDER BY SalesQtr
The LAG function pulls the SalesAmount value from one row prior to the current row. In its simplest form LAG is used like this:
LAG(column expression,number of rows prior,[default value]) OVER(ORDER BY . . .)
The query’s result set looks like this:
I’ve bolded some of the values in the results so you can see how the PriorQtr column pulls the data from the prior row’s SalesAmt in the result set. The PriorQtr column in the first row is NULL because there are no prior rows to extract from. Though not shown here, you can specify a default value such as a zero if the requested row does not exist in the result set.
To make sure the results are consistent, an ORDER BY is required in the function’s “window order clause”. In this example, the OVER. . . ORDER BY is the same as the final SELECT’s ORDER BY. However, it need not be this way, as allowing a sort independent of the result set offers more versatility. (More on this in the upcoming LEAD example.)
At this point it would be easy to add more expression columns to the query, such as the variance or percent change between the prior quarter and the current quarter.
Based on how quickly you fulfilled the last programming request, the boss now requests the sales amount from the same quarter of the prior year be included in the report. Once again LAG comes to the rescue, as it can look back four rows (because there are four quarters per year) to get the sales data from the prior year:
/* This example uses LAG to look backward in the result set to lookup values in prior rows. Whenever you use a fixed offset with lead or lag, you should be confident your data will support it. For example, if this example's result set is missing a quarter, the prior qtr last year column will contain invalid data. */ WITH QuarterlySales AS ( SELECT YEAR(OrderDate)||'-Q'||QUARTER(OrderDate) AS SalesQtr, SUM(SubTotal) AS SalesAmount FROM SalesOrderHeader WHERE OrderDate>='2006-01-01' GROUP BY YEAR(OrderDate)||'-Q'||QUARTER(OrderDate) ) SELECT q.*, LAG(SalesAmount,1) OVER(ORDER BY SalesQtr) AS PriorQtr, LAG(SalesAmount,4) OVER(ORDER BY SalesQtr) AS LastYearQtr FROM QuarterlySales q ORDER BY SalesQtr
The results looks like this:
When using LEAD or LAG with an aggregate expression, you will need to use a CTE or nested select to give a column name to the aggregate expression. For example, this next query is invalid because it references SalesAmount, which itself is an aggregate expression at the same level as LAG:
SELECT YEAR(OrderDate)||'-Q'||QUARTER(OrderDate) AS SalesQtr, SUM(SubTotal) AS SalesAmount, LAG(SalesAmt,1) OVER(ORDER BY YEAR(OrderDate)||'-Q'||QUARTER(OrderDate)) AS SalesAmtPrior FROM SalesOrderHeader WHERE OrderDate>='2006-01-01' GROUP BY YEAR(OrderDate)||'-Q'||QUARTER(OrderDate)
For the record, LAG can contain a column expression, but not an aggregate column expression, so this next example is also invalid:
SELECT YEAR(OrderDate)||'-Q'||QUARTER(OrderDate) AS SalesQtr, SUM(SubTotal) AS SalesAmount, LAG(SUM(SubTotal),1) OVER(YEAR(OrderDate)||'-Q'||QUARTER(OrderDate)) AS SalesAmtPrior - Invalid FROM SalesOrderHeader WHERE OrderDate>='2006-01-01' GROUP BY YEAR(OrderDate)||'-Q'||QUARTER(OrderDate)
Now let’s consider LEAD, LAG’s companion function. These functions are similar except that LEAD looks at rows beyond the current row position in the result set.
LEAD(column expression,number of rows after,[default value]) OVER(ORDER BY . . .)
To illustrate, say you have a TransactionHistory table that, among other things, contains transaction entries for completed manufacturing work orders. Each transaction includes the quantity produced for a given product ID. A requirement for a new “Manufacturing Production Report” specifies for every completed work order transaction, the date and quantity should be shown along with the next future quantity and transaction date for the same product.
Reporting the product ID, transaction date and quantity produced, along with the next trans date and next quantity (for the same product ID) in the same row, will give management insight into how frequently products are being made and indicate the diversity in the quantities produced over time.
Using LEAD makes this task simple:
SELECT TransactionDate,ProductId,Quantity, LEAD(TransactionDate,1) OVER( PARTITION BY ProductId ORDER BY TransactionDate) AS Next_Date, LEAD(Quantity,1,-1) OVER( PARTITION BY ProductId ORDER BY TransactionDate) AS Next_Quantity FROM TransactionHistory WHERE TransactionType='W' -- Completed Work Order ORDER BY TransactionDate,ProductId,TransactionId
A few thing to note about the LEAD function usage:
An abridged result set is shown below with attention called to product ID 899, which was produced on 2007-09-04, 2007-09-08 and 2007-09-17. As you can see, LEAD peeks ahead to the rows beyond the current row, snags the requested value and places it in the current row.
NULLs And Rodney Dangerfield Get “No Respect”
One other thing to note about LAG and LEAD is that they can be instructed to respect or ignore NULLs. (The default is RESPECT.) Say you have an order shipment report that uses the LAG function to find the prior order’s ship date within customer. However, if the prior order (relative to the current row) hasn’t been shipped (i.e., the prior row’s ShipDate is null), LAG can ignore the NULL and continue its ordered search until it finds a row with a non-NULL ShipDate. For example:
LAG (ShipDate,1) OVER(PARTITION BY CustomerId ORDER BY OrderDate,OrderId) IGNORE NULLS AS PriorShipDate -- Get first non-null ShipDate
Better Analytics = Better Reporting
And now for the finishing pun you’ve all been dreading. Don’t let your dev skills LAG by ignoring the new DB2 for i analytic functions, rather use them to LEAD the charge to better reporting. LAG and LEAD allow developers to easily mine valuable information from result sets without having to resort to cursors or temp tables.