• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • OLAP Aggregation Specification In DB2 For i 7.3

    May 10, 2016 Michael Sansoterra

    If I could use one word to describe the new DB2 for i OLAP features in IBM i 7.3, it would be “Booyah”! (According to the Urban dictionary, “booyah” is “used in order to abruptly express great joy, usually brought on by victory or some other sort of accomplishment.”) Though there are many new features in i 7.3 under the umbrella of OLAP, for now I’m going to demonstrate the new OLAP aggregation specification.

    It was way back in 2006 in V5R4 that DB2 for i developers were first treated to OLAP (Online Analytical Processing) functions. If you’re unfamiliar with these early OLAP features, please review my article New in V5R4: OLAP Ranking Specifications. These early OLAP features involved the “ordered OLAP specification” (RANK and DENSE_RANK functions) and the “numbering specification” (ROW_NUMBER function).

    OLAP Aggregation Specification

    The new OLAP “aggregation specification” is familiar as it involves the use of well-known aggregate functions such as COUNT, MAX, SUM and AVG. Whereas in the past use of these aggregate functions was limited to single row or GROUP BY “summary” queries, they can now be used in detail queries through the use of an OLAP window partition clause.

    To begin, say you have a simple annual sales report query that returns sales order detail information ordered by order date:

    SELECT soh.SalesOrderId,
           soh.OrderDate,
           soh.CustomerId,
           sod.ProductId,
           sod.OrderQty,
           sod.UnitPrice,
           DEC(sod.OrderQty*UnitPrice,11,2) AS SubTotal
      FROM SalesOrderDetail sod
      JOIN SalesOrderHeader soh ON soh.SalesOrderId=sod.SalesOrderId
     WHERE OrderDate BETWEEN '2006-01-01' AND '2006-12-31'
    ORDER BY soh.OrderDate
    

    While looking over the report, the boss decides it would be nice to see how much each customer purchased for the entire year. In the past, a developer would typically code a scalar subquery to return this info using the SUM aggregate function and add it to the above query:

     (SELECT SUM(OrderQty*UnitPrice)
          FROM SalesOrderDetail sod1
          JOIN SalesOrderHeader soh2 ON soh2.SalesOrderId=sod1.SalesOrderId
         WHERE soh2.CustomerId=soh.CustomerId
           AND soh2.OrderDate BETWEEN '2006-01-01' AND '2006-12-31') 
        AS Customer_SubTotal
    

    The problem with this type of subquery is that it is increases I/O, reduces readability, requires duplication of predicates, and is just clumsy to maintain.

    In i 7.3, the above Customer_Subtotal calculation can be accomplished by using the SUM aggregate function with a window partition clause:

    SELECT soh.SalesOrderId,
           soh.OrderDate,
           soh.CustomerId,
           sod.ProductId,
           sod.OrderQty,
           sod.UnitPrice,
           DEC(sod.OrderQty*UnitPrice,11,2) AS SubTotal,
           SUM(OrderQty*UnitPrice) 
           OVER(PARTITION BY CustomerId) AS Customer_SubTotal
      FROM SalesOrderDetail sod
      JOIN SalesOrderHeader soh ON soh.SalesOrderId=sod.SalesOrderId
     WHERE OrderDate BETWEEN '2006-01-01' AND '2006-12-31'
    ORDER BY soh.OrderDate
    

    Notice there is no GROUP BY clause! The abridged results look like the following:

    OrderId

    OrdDate

    Customer

    Product

    Qty

    Price

    Subtotal

    CustSubTotal

    45047

    1/1/2006

    29614

    707

    3

    20.1865

    60.55

    227.80

    45062

    1/1/2006

    29661

    707

    1

    20.1865

    20.18

    148.78

    45047

    1/1/2006

    29614

    708

    1

    20.1865

    20.18

    227.80

    45062

    1/1/2006

    29661

    708

    2

    20.1865

    40.37

    148.78

    45047

    1/1/2006

    29614

    711

    3

    20.1865

    60.55

    227.80

    45062

    1/1/2006

    29661

    711

    1

    20.1865

    20.18

    148.78

    45062

    1/1/2006

    29661

    712

    2

    5.1865

    10.37

    148.78

    45047

    1/1/2006

    29614

    714

    2

    28.8404

    57.68

    227.80

    45062

    1/1/2006

    29661

    714

    2

    28.8404

    57.68

    148.78

    45047

    1/1/2006

    29614

    715

    1

    28.8404

    28.84

    227.80

    The OVER portion of the aggregation-specification includes a “window partition clause” and it allows the aggregate to be included in the detail results without requiring a GROUP BY. In fact, you can think of the PARTITION BY as analogous to the GROUP BY. The “partition by” in this example instructs DB2 to calculate the SUM of all the rows in the result set by CustomerId and store the temporary result “under the covers.” Then, as the DB2 builds the result set, it shows the SUM for the specific CustomerId on the current row.

    If the boss also wanted to see the total annual product sales subtotal for the current product under review, you could give him another column that specifies the SUM aggregate partitioned by the product ID:

    SUM(OrderQty*UnitPrice) OVER(PARTITION BY ProductId) 
    AS Product_SubTotal
    

    Since each OLAP aggregation has its own PARTITION BY, a single query can now have many aggregations based on different aggregation groups, in contrast to the old-style GROUP BY queries that can only have one aggregation grouping per query.

    Now, for a few more examples. To show the product sales by product ID within customer, simply specify both columns in the PARTITION BY:

    SUM(OrderQty*UnitPrice) OVER(PARTITION BY CustomerId,ProductId)
    AS Customer_Product_SubTotal
    

    To include the annual sales for the entire year as a column specify:

    SUM(OrderQty*UnitPrice) OVER(PARTITION BY YEAR(OrderDate)) 
    AS Annual_SubTotal
    

    Alternatively, specify a constant in the expression to aggregate an entire result set:

    SUM(OrderQty*UnitPrice) OVER(PARTITION BY 1) AS Annual_SubTotal
    

    I should mention that in the order of processing a query, DB2 will process the OLAP aggregate functions prior to subsetting the rows based on a FETCH FIRST clause.

    Other aggregate functions such as average work equally well, such as this expression that calculates the average customer sales:

    AVG(OrderQty*UnitPrice) OVER(PARTITION BY CustomerId) 
    AS Avg_Customer_SubTotal
    

    Or COUNT can be used to show the number of annual order detail lines for each customer:

    COUNT(*) OVER(PARTITION BY CustomerId) 
    AS Annual_Customer_Order_Lines
    

    The Window Aggregation Group Clause

    And now, I’ll show one more trick that DB2 has up its sleeve called the “window aggregation group clause.” This feature will put the analytical value of your queries into overdrive. Consider the query (revised from the first example) that shows a rolling total by customer:

    SELECT soh.SalesOrderId,
           soh.OrderDate,
           soh.CustomerId,
           sod.ProductId,
           sod.OrderQty,
           sod.UnitPrice,
           DEC(sod.OrderQty*UnitPrice,11,2) AS SubTotal,
           SUM(OrderQty*UnitPrice) 
           OVER(PARTITION BY CustomerId 
                ORDER BY ProductId 
                ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
                AS Rolling_Customer_Total
      FROM SalesOrderDetail sod
      JOIN SalesOrderHeader soh ON soh.SalesOrderId=sod.SalesOrderId
     WHERE OrderDate BETWEEN '2006-01-01' AND '2006-03-31'
    ORDER BY soh.CustomerId,sod.ProductId
    

    The Rolling_Customer_Total column creates a rolling total of a customer’s purchases. The “window aggregation group clause” shown consists of three components:

    1. PARTITION BY CustomerId–The SUM aggregation is reset on the change in the CustomerId within the result set.
    2. ORDER BY ProductId–Specifies the order of the aggregation across the rows. In a rolling total scenario this ORDER BY should typically match the ORDER BY clause of the entire result set so that the rolling SUM is easily understood.
    3. ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW–Instructs DB2 that the aggregate is to be calculated based on all rows starting from the beginning of the partition (CustomerId) to the current row. This is how the rolling total is formed.

    The abridged results of the query are shown below with the rolling total column bolded along with the first row of each CustomerId partition:

    OrderId

    OrdDate

    Customer

    Product

    Qty

    Price

    Subtotal

    Rolling Total

    45047

    1/1/2006

    29614

    707

    3

    20.19

    60.55

    60.56

    45047

    1/1/2006

    29614

    708

    1

    20.19

    20.18

    80.75

    45047

    1/1/2006

    29614

    711

    3

    20.19

    60.55

    141.31

    45047

    1/1/2006

    29614

    714

    2

    28.84

    57.68

    198.99

    45062

    1/1/2006

    29661

    707

    1

    20.19

    20.18

    20.19

    45062

    1/1/2006

    29661

    708

    2

    20.19

    40.37

    60.56

    45062

    1/1/2006

    29661

    711

    1

    20.19

    20.18

    80.75

    45062

    1/1/2006

    29661

    712

    2

    5.19

    10.37

    91.12

    45062

    1/1/2006

    29661

    714

    2

    28.84

    57.68

    148.80

    For one more example, consider the case of a simple annual sales report where there is one subtotal shown per day. But now the boss wants to see a 30-day rolling average of the daily sales subtotals. This feat is now easy enough to accomplish:

    WITH AnnualSales AS (
    SELECT soh.OrderDate,
           DAYOFYEAR(soh.OrderDate) AS DayNo,
           SUM(soh.SubTotal) DailyTotal
      FROM SalesOrderHeader soh
     WHERE OrderDate BETWEEN '2006-01-01' AND '2006-12-31'
    GROUP BY soh.OrderDate
    )
    SELECT s.*,
           AVG(DailyTotal) 
           OVER(ORDER BY OrderDate 
                ROWS BETWEEN 29 PRECEDING AND CURRENT ROW) 
           AS "30DayRollingAvg"s
      FROM AnnualSales s
    ORDER BY OrderDate
    

    In this case, the average aggregate function is used to take the average of the daily subtotal based on the values in the current and prior 29 rows within the ordered result set. The partial output is shown below:

    OrderDate

    DayNo

    DailyTotal

    30DayAvg

    1/1/2006

    1

    725046.42

    725046.42

    1/2/2006

    2

    14313.08

    369679.75

    1/3/2006

    3

    28041.32

    255800.27

    1/4/2006

    4

    17713.07

    196278.47

    1/5/2006

    5

    7855.64

    158593.91

    1/6/2006

    6

    21266.34

    135705.98

    1/7/2006

    7

    8554.74

    117541.52

    1/8/2006

    8

    25365.43

    106019.50

    1/9/2006

    9

    17891.35

    96227.49

    1/10/2006

    10

    14109.80

    88015.72

    1/11/2006

    11

    31619.59

    82888.80

    1/12/2006

    12

    25047.89

    78068.72

    1/13/2006

    13

    7855.64

    72667.72

    1/14/2006

    14

    31669.59

    69739.28

    1/15/2006

    15

    21380.60

    66515.37

    1/16/2006

    16

    24666.33

    63899.80

    1/17/2006

    17

    25365.43

    61633.07

    1/18/2006

    18

    15711.28

    59081.86

    1/19/2006

    19

    18590.45

    56950.74

    1/20/2006

    20

    21469.62

    55176.68

    1/21/2006

    21

    24691.33

    53725.00

    1/22/2006

    22

    31644.59

    52721.34

    1/23/2006

    23

    10734.81

    50895.84

    1/24/2006

    24

    25568.71

    49840.54

    1/25/2006

    25

    5675.56

    48073.94

    1/26/2006

    26

    17509.79

    46898.40

    1/27/2006

    27

    14833.90

    45710.83

    1/28/2006

    28

    15508.00

    44632.15

    1/29/2006

    29

    11255.63

    43481.24

    1/30/2006

    30

    27145.18

    42936.70

    1/31/2006

    31

    21762.16

    19493.89

    2/1/2006

    32

    1919379.38

    82996.10

    2/2/2006

    33

    18590.45

    82681.08

    As you can see, if there are not 29 rows prior to the current row, DB2 takes as many prior rows as it can. So in this example, the first true 30-day average is not calculated until January 30th.

    The Final Analysis

    The ability to easily write queries with complex aggregations based on a window of data within the entire result set is a stellar feature of IBM i 7.3. The window can be based on a partition of data within the result set (such as a customer or product) or it can be based on rows in the result set relative to the position of the current row. Stay tuned to Four Hundred Guru. The best news is that there is plenty more to discuss in terms of new OLAP features in DB2 for i.

    Michael Sansoterra is a DBA for Broadway Systems in Grand Rapids, Michigan. Send your questions or comments for Mike via the IT Jungle Contact page.

    RELATED STORY

    New in V5R4: OLAP Ranking Specifications

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags:

    Sponsored by
    Raz-Lee Security

    Start your Road to Zero Trust!

    Firewall Network security, controlling Exit Points, Open DB’s and SSH. Rule Wizards and graphical BI.

    Request Demo

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Sponsored Links

    COMMON:  2016 Annual Meeting & Expo, May 15 - 18, in New Orleans! Great Power Systems event!
    NGS :  Webinar: Realizing the Power of IBM i with NGS-IQ. May 11. RSVP now!
    Profound Logic Software:  'i on the Enterprise' Worldwide Virtual Event. June 8. Register Now!

    IBM To Sell Off Two-Thirds Of The Rochester Labs Jazzing Up IBM i In COMMON’s Quarter

    One thought on “OLAP Aggregation Specification In DB2 For i 7.3”

    • Glenn Gundermann says:
      September 6, 2022 at 12:30 pm

      I’m looking to do a rolling total but I’m not sure how to.. The first three columns are in an orders table and I want to calculate the fourth column as a rolling total of the item being ordered from all the orders. E.g.

      order_no, item_no, order_qty, rolling_item_order_qty
      order1, 11111, 1, 1
      order1, 22222, 2, 2
      order2, 11111, 3, 4
      order2, 22222, 4, 6

      Reply

    Leave a Reply Cancel reply

Volume 16, Number 11 -- May 10, 2016
THIS ISSUE SPONSORED BY:

ProData Computer Services
BCD Software
T.L. Ashford

Table of Contents

  • OLAP Aggregation Specification In DB2 For i 7.3
  • Sending Escape Messages From RPG, Take 2
  • When You Reach Your Break(ing) Point. . . Or Not

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • Public Preview For Watson Code Assistant for i Available Soon
  • COMMON Youth Movement Continues at POWERUp 2025
  • IBM Preserves Memory Investments Across Power10 And Power11
  • Eradani Uses AI For New EDI And API Service
  • Picking Apart IBM’s $150 Billion In US Manufacturing And R&D
  • FAX/400 And CICS For i Are Dead. What Will IBM Kill Next?
  • Fresche Overhauls X-Analysis With Web UI, AI Smarts
  • Is It Time To Add The Rust Programming Language To IBM i?
  • Is IBM Going To Raise Prices On Power10 Expert Care?
  • IBM i PTF Guide, Volume 27, Number 20

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle