New OLAP Aggregate Functions In DB2 for i, Part 1

July 12, 2016 Michael Sansoterra

In my last tips about the new OLAP features in DB2 for i 7.3, I discussed the OLAP Aggregation Specification and the new LAG and LEAD OLAP functions. In this article and the next one, I continue the discussion of new OLAP features by highlighting four new OLAP aggregate functions: FIRST_VALUE, LAST_VALUE, nTH_VALUE, and RATIO_TO_REPORT. The value of these functions is that they can reference data from other rows in a query result set relative to the current row.

The new OLAP aggregate functions differ from normal aggregate functions (such as MIN, MAX, SUM, and AVG) in that they can only be used in conjunction with an OLAP aggregation specification (OAS); whereas the normal aggregates can be used with or without an OAS (e.g., they can be used in a GROUP BY summary query). Recall that the aggregation spec consists of one or more components, including a window partition clause, window order clause, and a window aggregation group clause. Since the DB2 for i terminology can be confusing, I’ll get right to an example. If you’re unfamiliar with OLAP aggregation spec, please review the Related Stories listed at the end of this tip.

FIRST_VALUE And LAST_VALUE

As their names imply, these functions will extract the first or last value of a column or expression within an OLAP window. Say you have a help desk application for tracking support calls. The two primary tables in the app are SupportIncident (header) and SupportCall (detail):

CREATE OR REPLACE TABLE SupportIncident (
Support_Id        INT          NOT NULL 
                  GENERATED BY DEFAULT AS IDENTITY,
Product_Group     VARCHAR(10)  NOT NULL,
Name              NVARCHAR(80) NOT NULL,
TimeZone          VARCHAR(10)  NOT NULL,
EMail             VARCHAR(96)  NOT NULL,
Phone             VARCHAR(20)  NOT NULL,
First_Contact     TIMESTAMP    NOT NULL 
                  DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT pk_SupportIncident PRIMARY KEY (Support_Id)
);

CREATE OR REPLACE TABLE SupportCall (
Support_Id        INT          NOT NULL,
Support_Call_Id   INT          NOT NULL AS IDENTITY,
Support_Timestamp TIMESTAMP(0) NOT NULL,
Support_By_Phone  CHAR(1)      NOT NULL DEFAULT 'Y',
Duration_Sec      INT          NOT NULL,
Remarks           CLOB(10K),
Rep_Id            INT          NOT NULL,
CONSTRAINT pk_SupportCall PRIMARY KEY (Support_Id,Support_Call_Id),
CONSTRAINT fk_Support_Id FOREIGN KEY (Support_Id) 
    REFERENCES SupportIncident (Support_Id)
);

Management has requested that support calls be listed as detail lines on a “support incident” report. The query to feed this report is shown here:

SELECT c.*
  FROM SupportIncident i
  JOIN SupportCall c ON i.Support_Id=c.Support_Id

Further, because some incidents may require several calls (possibly fielded by multiple support center representatives) from a customer before their issue is resolved, when reviewing the detail it’s often helpful to know the customer support representative(s) who opened and closed the incident.

Therefore, each detail line should also include the support representative who opened the incident and the representative who closed the incident (or who last talked to the customer if the incident is still open). The support representative ID is contained in column REP_ID. To fulfill this requirement, simply use the new FIRST_VALUE and LAST_VALUE OLAP aggregate functions:

SELECT c.*,
       First_Value(Rep_Id) OVER(
       PARTITION BY c.Support_Id 
       ORDER BY Support_Timestamp) AS First_Rep_Id,
       Last_Value(Rep_Id)  OVER(
       PARTITION BY c.Support_Id 
       ORDER BY Support_Timestamp 
       ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS Last_Rep_Id
  FROM SupportIncident i
  JOIN SupportCall c ON i.Support_Id=c.Support_Id

The abridged results look something like this (with some columns omitted):

Call_ID

Duration

Remarks

Rep_ID

First_Rep_ID

Last_Rep_ID

310

Problems
faxing – Told him to reboot

210

Thought his DVD tray was a
coffee tray. it broke, doh!

572

A mind
numbing call

188

Transferred him to
Bangladesh call center

442

This guy
won’t give up

642

Told him he can’t fax via
his monitor

924

Offered
his money back — problem solved!

The source values for the First_Rep_Id and Last_Rep_Id columns are highlighted in red.

FIRST_VALUE will get the first value in a windowed set of rows and likewise LAST_VALUE will retrieve the last value. The OLAP aggregation spec controls the scope (PARTITION BY) and order (ORDER BY) of the window of rows that are to be considered by the OLAP function. Recall that this windowing concept allows these OLAP functions to be used in “detail queries” that don’t require a GROUP BY clause.

Looking closely at the use of these new functions:

       First_Value(Rep_Id) OVER(
       PARTITION BY c.Support_Id 
       ORDER BY Support_Timestamp) AS First_Rep_Id,
       Last_Value(Rep_Id)  OVER(
       PARTITION BY c.Support_Id 
       ORDER BY Support_Timestamp 
       ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS Last_Rep_Id

It’s clear that the OLAP window consists of rows bounded by the Support_Id (PARTITION BY Support_Id) and the rows within the window are sorted by the support call’s timestamp. Further, because these functions are OLAP aggregation functions, the data they retrieve is from a row that is relative to the current row in relation to the current OLAP window.

By default, this class of functions will operate on the rows from the start of the window through the current row, as though the following “window aggregation group clause” was specified:

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

This default works fine for the FIRST_VALUE function because the first row of the window is always available for consideration for any given partition. However, the last row in the window is not available by this default. Therefore, a different “window aggregation group clause” is required to instruct LAST_VALUE that it needs to consider all rows from the current row through the end of the window as follows:

ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING

Without this window aggregation group clause, the LAST_VALUE function returns the value from the current row. If this seems confusing, I don’t blame you and suggest you experiment with LAST_VALUE with and without this window aggregation group clause and watch how the behavior changes.

nTH_VALUE

The nTH_VALUE function is similar to FIRST_VALUE and LAST_VALUE with the exception that the requested value is extracted from row n, where n is the row number within the window relative to the start or the end of the window.

Going back to the support query, say management concludes that knowing the rep ID for the third call in a long series of calls is pivotal to understanding how to improve customer service and shorten the number of calls. So, management requests for the rep ID of the third call in an incident to be included on the report as its own column. In this case, the nTH_VALUE function can be used to retrieve the third customer service rep ID in the chain of calls:

SELECT c.*,
       First_Value(Rep_Id)
       OVER(PARTITION BY c.Support_Id 
       ORDER BY Support_Timestamp) 
       AS First_Rep_Id,

       Last_Value(Rep_Id)   
       OVER(PARTITION BY c.Support_Id 
       ORDER BY Support_Timestamp 
       ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) 
       AS Last_Rep_Id,

       nTH_Value(Rep_Id,3) FROM FIRST 
       OVER(PARTITION BY c.Support_Id 
       ORDER BY Support_Timestamp 
       ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) 
       AS "3rd_Rep_Id"

  FROM SupportIncident i
  JOIN SupportCall c ON i.Support_Id=c.Support_Id

As shown in the example, nTH_VALUE requires a second parameter which is the row number within the OLAP window. A literal value of three is given, which means take the requested value (REP_ID) from the third row in the window. The default FROM FIRST indicates the function will get the third from the start of the window. The alternative option is FROM LAST, which would cause the function to get the third row from the end of the window. If there are less than three rows in the window, the function returns NULL.

Based on the result set shown above, for support incident 1, the “3rd_Rep_Id” column will contain the value “8”, which is the Rep_Id in row 3.

As with LAST_VALUE, nTH_VALUE also requires a non-default window aggregation group clause to make this function evaluate the third row over the entire window:

ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

Without this clause (keeping in mind that the default window is the first row through the current row), the nTH_VALUE function would not return a value until row three or later became the current row. In other words, the first two rows would contain a NULL value for the “3rd_Rep_Id” column.

FIRST_VALUE, LAST_VALUE and nTH_VALUE can have an additional option of RESPECT NULLS (default) or IGNORE NULLS specified as the last argument. When IGNORE NULLS is specified, the requested OLAP function will only consider non-null values within the window:

Last_Value(REMARKS) IGNORE NULLS
OVER(PARTITION BY c.Support_Id 
ORDER BY Support_Timestamp 
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS Last_Remark

In this case, LAST_VALUE will return the last non-null REMARKS column for the given support ID. For nTH_VALUE, using IGNORE NULLS will potentially cause the function’s behavior to change from retrieving data from an absolute position in the window to a relative row position that is based on whether NULLs are present.

There’s more to tell. Look for the RATIO_TO_REPORT function, coming in my next article.

Michael Sansoterra is a DBA for Broadway Systems in Grand Rapids, Michigan. Send your questions or comments for Mike via the IT Jungle Contact page.

Table of Contents

Content archive

Recent Posts

Subscribe

Pages

Search