Treasury of new DB2 6.1 Features, Part 2: Grouping Sets and Super Groups
March 11, 2009 Michael Sansoterra
Part 1 of this series focused on some of the stellar query enhancements IBM gave its DB2 for i developer community. It’s hard to believe the 20-plus-year-old AS/400 system (now called IBM i) can continue to look so young given its great strengths in database and other areas. But for now, I want to focus on specialized query enhancements called grouping sets and super groups.
The ability to work with grouping sets and super groups in a query is an overdue feature, in my opinion (which incidentally has long been available with other versions of DB2 and other database servers). This nifty enhancement gives developers a simple way to include summary rows within their query’s result set.
To illustrate, let’s start with a simple example. Say we have a view called “vwSalesOrderInfo” that contains order summary information:
For clarification, the hierarchy of data is Customers are within a territory, and territories are within the jurisdiction of a sales person. And just for the record, I took this sample data from Microsoft‘s AdventureWorks SQL Server sample database and put it on DB2.
From a business intelligence (BI) standpoint, we could ask several questions about this data, such as:
Of course we could continue to ask many more questions from the data at hand.
We can easily answer the first question “What are the total sales for all orders?” using the following query:
Select Sum(SubTotal) As OrderValue From vwSalesOrderInfo
That query will return the value: 127337180.1126
We can easily answer the second question “what is the total order value (excluding tax, freight) for each customer?” using the following query:
Select Customer,Sum(SubTotal) As OrderValue From vwSalesOrderInfo Group By Customer Order By Customer
Partial formatted results for this query look like the following:
We can answer questions 3 through 5 in the same way by simply “changing out” the column to “summarize by” in the SELECT, GROUP BY, and ORDER BY clauses. For instance, instead of summarizing by Customer, we can summarize by territory.
Now, what if we wanted to combine all of this information and answer all these questions in a single query? Couldn’t we simply expand the query to summarize by all combinations as follows?
Select Year(OrderDate) As OrderYear,Sales_Person, Territory,Customer,Sum(SubTotal) As OrderValue From vwSalesOrderInfo Group By Year(OrderDate),Sales_Person,Territory,Customer Order By OrderYear,Sales_Person,Territory,Customer
This modified query answered all of our questions–or did it? Actually, it didn’t because the summaries given by this query offer summaries at the lowest “group” or “break” level. In this case it’s still the customer column. What we’re really looking for though is the ability to add additional “summary rows” in our result set that summarize the data by each of the grouping levels.
To accomplish this, we could add subtotal to the above query by using a Common Table Expression (CTE) and including some UNION ALL statements to combine the total rows sets into detail data. We’d also probably need to do a few ORDER BY tricks to ensure the subtotal rows are placed in the proper sequence within the detail rows of the result set.
This is where the super group concept comes into play. A super group will add summary rows to our query for each column or expression in the GROUP BY clause. For us old-time COBOL and RPG programmers, this is akin to “control break” totals. This new summary function is implemented by adding the WITH ROLLUP keywords to the GROUP BY clause:
Select Year(OrderDate) As OrderYear,Sales_Person, Territory,Customer,Sum(SubTotal) As OrderValue From vwSalesOrderInfo Group By Year(OrderDate),Sales_Person,Territory,Customer With Rollup /* Implement Super Groups */ Order By OrderYear,Sales_Person,Territory,Customer
Here is a subset of the query’s results:
The rows with the purple cells are summary rows added to the query results compliments of ROLLUP. Each purple cell actually contains a NULL, which is how ROLLUP behaves for the column or group of columns it is summarizing. The first summarized row tells us that the total order value for sales rep “Pamela Ansman-Wolfe” within the Northwest territory region is about $730K. The next row indicates that Ansman-Wolfe sold the same amount for all territories. (In this case, this rep has only one territory.)
If we look further down the result set we’ll see similar summary rows added after each change in territory, sales person, and year. Finally, ROLLUP courteously gives us a grand total row as well. Here are the last few rows of the result set: