OLAP Aggregation Specification In DB2 For i 7.3
May 10, 2016 Michael Sansoterra
If I could use one word to describe the new DB2 for i OLAP features in IBM i 7.3, it would be “Booyah”! (According to the Urban dictionary, “booyah” is “used in order to abruptly express great joy, usually brought on by victory or some other sort of accomplishment.”) Though there are many new features in i 7.3 under the umbrella of OLAP, for now I’m going to demonstrate the new OLAP aggregation specification.
It was way back in 2006 in V5R4 that DB2 for i developers were first treated to OLAP (Online Analytical Processing) functions. If you’re unfamiliar with these early OLAP features, please review my article New in V5R4: OLAP Ranking Specifications. These early OLAP features involved the “ordered OLAP specification” (RANK and DENSE_RANK functions) and the “numbering specification” (ROW_NUMBER function).
OLAP Aggregation Specification
The new OLAP “aggregation specification” is familiar as it involves the use of well-known aggregate functions such as COUNT, MAX, SUM and AVG. Whereas in the past use of these aggregate functions was limited to single row or GROUP BY “summary” queries, they can now be used in detail queries through the use of an OLAP window partition clause.
To begin, say you have a simple annual sales report query that returns sales order detail information ordered by order date:
SELECT soh.SalesOrderId, soh.OrderDate, soh.CustomerId, sod.ProductId, sod.OrderQty, sod.UnitPrice, DEC(sod.OrderQty*UnitPrice,11,2) AS SubTotal FROM SalesOrderDetail sod JOIN SalesOrderHeader soh ON soh.SalesOrderId=sod.SalesOrderId WHERE OrderDate BETWEEN '2006-01-01' AND '2006-12-31' ORDER BY soh.OrderDate
While looking over the report, the boss decides it would be nice to see how much each customer purchased for the entire year. In the past, a developer would typically code a scalar subquery to return this info using the SUM aggregate function and add it to the above query:
(SELECT SUM(OrderQty*UnitPrice) FROM SalesOrderDetail sod1 JOIN SalesOrderHeader soh2 ON soh2.SalesOrderId=sod1.SalesOrderId WHERE soh2.CustomerId=soh.CustomerId AND soh2.OrderDate BETWEEN '2006-01-01' AND '2006-12-31') AS Customer_SubTotal
The problem with this type of subquery is that it is increases I/O, reduces readability, requires duplication of predicates, and is just clumsy to maintain.
In i 7.3, the above Customer_Subtotal calculation can be accomplished by using the SUM aggregate function with a window partition clause:
SELECT soh.SalesOrderId, soh.OrderDate, soh.CustomerId, sod.ProductId, sod.OrderQty, sod.UnitPrice, DEC(sod.OrderQty*UnitPrice,11,2) AS SubTotal, SUM(OrderQty*UnitPrice) OVER(PARTITION BY CustomerId) AS Customer_SubTotal FROM SalesOrderDetail sod JOIN SalesOrderHeader soh ON soh.SalesOrderId=sod.SalesOrderId WHERE OrderDate BETWEEN '2006-01-01' AND '2006-12-31' ORDER BY soh.OrderDate
Notice there is no GROUP BY clause! The abridged results look like the following:
The OVER portion of the aggregation-specification includes a “window partition clause” and it allows the aggregate to be included in the detail results without requiring a GROUP BY. In fact, you can think of the PARTITION BY as analogous to the GROUP BY. The “partition by” in this example instructs DB2 to calculate the SUM of all the rows in the result set by CustomerId and store the temporary result “under the covers.” Then, as the DB2 builds the result set, it shows the SUM for the specific CustomerId on the current row.
If the boss also wanted to see the total annual product sales subtotal for the current product under review, you could give him another column that specifies the SUM aggregate partitioned by the product ID:
SUM(OrderQty*UnitPrice) OVER(PARTITION BY ProductId) AS Product_SubTotal
Since each OLAP aggregation has its own PARTITION BY, a single query can now have many aggregations based on different aggregation groups, in contrast to the old-style GROUP BY queries that can only have one aggregation grouping per query.
Now, for a few more examples. To show the product sales by product ID within customer, simply specify both columns in the PARTITION BY:
SUM(OrderQty*UnitPrice) OVER(PARTITION BY CustomerId,ProductId) AS Customer_Product_SubTotal
To include the annual sales for the entire year as a column specify:
SUM(OrderQty*UnitPrice) OVER(PARTITION BY YEAR(OrderDate)) AS Annual_SubTotal
Alternatively, specify a constant in the expression to aggregate an entire result set:
SUM(OrderQty*UnitPrice) OVER(PARTITION BY 1) AS Annual_SubTotal
I should mention that in the order of processing a query, DB2 will process the OLAP aggregate functions prior to subsetting the rows based on a FETCH FIRST clause.
Other aggregate functions such as average work equally well, such as this expression that calculates the average customer sales:
AVG(OrderQty*UnitPrice) OVER(PARTITION BY CustomerId) AS Avg_Customer_SubTotal
Or COUNT can be used to show the number of annual order detail lines for each customer:
COUNT(*) OVER(PARTITION BY CustomerId) AS Annual_Customer_Order_Lines
The Window Aggregation Group Clause
And now, I’ll show one more trick that DB2 has up its sleeve called the “window aggregation group clause.” This feature will put the analytical value of your queries into overdrive. Consider the query (revised from the first example) that shows a rolling total by customer:
SELECT soh.SalesOrderId, soh.OrderDate, soh.CustomerId, sod.ProductId, sod.OrderQty, sod.UnitPrice, DEC(sod.OrderQty*UnitPrice,11,2) AS SubTotal, SUM(OrderQty*UnitPrice) OVER(PARTITION BY CustomerId ORDER BY ProductId ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Rolling_Customer_Total FROM SalesOrderDetail sod JOIN SalesOrderHeader soh ON soh.SalesOrderId=sod.SalesOrderId WHERE OrderDate BETWEEN '2006-01-01' AND '2006-03-31' ORDER BY soh.CustomerId,sod.ProductId
The Rolling_Customer_Total column creates a rolling total of a customer’s purchases. The “window aggregation group clause” shown consists of three components:
The abridged results of the query are shown below with the rolling total column bolded along with the first row of each CustomerId partition:
For one more example, consider the case of a simple annual sales report where there is one subtotal shown per day. But now the boss wants to see a 30-day rolling average of the daily sales subtotals. This feat is now easy enough to accomplish:
WITH AnnualSales AS ( SELECT soh.OrderDate, DAYOFYEAR(soh.OrderDate) AS DayNo, SUM(soh.SubTotal) DailyTotal FROM SalesOrderHeader soh WHERE OrderDate BETWEEN '2006-01-01' AND '2006-12-31' GROUP BY soh.OrderDate ) SELECT s.*, AVG(DailyTotal) OVER(ORDER BY OrderDate ROWS BETWEEN 29 PRECEDING AND CURRENT ROW) AS "30DayRollingAvg"s FROM AnnualSales s ORDER BY OrderDate
In this case, the average aggregate function is used to take the average of the daily subtotal based on the values in the current and prior 29 rows within the ordered result set. The partial output is shown below:
As you can see, if there are not 29 rows prior to the current row, DB2 takes as many prior rows as it can. So in this example, the first true 30-day average is not calculated until January 30th.
The Final Analysis
The ability to easily write queries with complex aggregations based on a window of data within the entire result set is a stellar feature of IBM i 7.3. The window can be based on a partition of data within the result set (such as a customer or product) or it can be based on rows in the result set relative to the position of the current row. Stay tuned to Four Hundred Guru. The best news is that there is plenty more to discuss in terms of new OLAP features in DB2 for i.