What Can I Select When I Group?
April 25, 2007 Dear highly esteemed professional colleague
It’s not uncommon for someone to ask me for help with error SQL0122, which involves SQL commands that use the GROUP BY clause to produce summary figures. Since I see this error over and over and over, I thought it would be good to explain how summary queries work. The standard rule I hear is that the SELECT clause of summary queries can list aggregate fields (those in the GROUP BY clause), expressions that are in the GROUP BY clause, and column functions, such as COUNT, SUM, and AVG. That’s close, but slightly inaccurate.
A reader who contacted me recently was working with a field that had, among other things, a character date in MMDDYYYY format and a currency value. He needed to summarize the file by date. His first attempt looked something like this:
SELECT SomeDate, SUM(SomeNumber) FROM mydata GROUP by SomeDate ORDER BY SomeDate
The SELECT clause contains a grouping field (SomeDate) and a column function, SUM. If any non-grouping fields had been in the SELECT clause, SQL0122 (Column WAREHOUSE or expression in SELECT list not valid.) would have been the result.
SELECT SomeDate, Warehouse, SUM(SomeNumber) FROM mydata GROUP by SomeDate ORDER BY SomeDate
Let’s go back to the reader’s query. The query returned accurate data, but the result set was sequenced by month, then day, then year.
SomeDate Sum ======== === 01022007 25 07032005 50 09032002 95 11201998 35 12312006 125
This is not what he wanted. He needed to format the data so that it would sort in proper chronological sequence. I had him replace SomeDate with an expression that would convert the date to YYYYMMDD format.
SELECT substr(SomeDate,5,4)||substr(SomeDate,1,4), SUM(SomeNumber) FROM mydata GROUP by substr(SomeDate,5,4)||substr(SomeDate,1,4) ORDER BY substr(SomeDate,5,4)||substr(SomeDate,1,4)
And this is the result set.
SomeDate Sum ======== === 19981120 35 20020903 95 20050703 50 20061231 125 20070102 25
It’s OK to group and sort by expressions as long as they match the expressions in the SELECT clause. But that does not mean that the expressions in the SELECT clause have to exactly match the arguments in the GROUP BY clause. For example, here is another version of the previous query that returns the same results.
SELECT substr(SomeDate,5,4)||substr(SomeDate,1,4), SUM(SomeNumber) FROM mydata GROUP by SomeDate ORDER BY 1
The SELECT clause and GROUP BY clause do not match. They don’t have to match, because the date-conversion expression in the SELECT clause does not reference a non-grouping field. The system can group the data by the distinct date values, but present those values in a different format.
Here’s another allowable expression that does not match the GROUP BY clause. While the date is still used as a grouping field, the record set returns that date two weeks into the future.
SELECT DATE(substr(SomeDate,5,4)||'-'|| substr(SomeDate,1,2)||'-'|| substr(SomeDate,3,2)) + 14 days, SUM(SomeNumber) FROM mydata GROUP by SomeDate ORDER BY 1
An expression can even combine two or more grouping fields. To illustrate, assume a database file, three of whose fields are a terms code, a credit code, and an order value. If the concatenation of the credit code and terms code is meaningful in some way, you might use a query of this kind:
SELECT CreditCode||TermCode, SUM(OrderValue) FROM mydata GROUP by CreditCode, TermCode ORDER BY 1
The result set has two columns. The first is the credit code concatenated to the terms code. The second is the sum of the order values.
You might also do something of this nature:
SELECT CreditCode, TermCode, CreditCode||TermCode, SUM(OrderValue) FROM mydata GROUP by CreditCode, TermCode ORDER BY 1
Here the grouping fields are in the result set, along with their concatenation.
You may even build something like this, where the TermsCode is rearranged:
SELECT CreditCode||substr(TermCode,2,1)||substr(TermCode,1,1), SUM(OrderValue) FROM mydata GROUP by CreditCode, TermCode ORDER BY 1
The two-byte terms code is reversed and appended to the credit code to create the first column of the result set.
The point is that your result set layout, as defined in the SELECT clause, is not required to match the GROUP BY clause. You are free to build any expressions, provided you do not refer to non-grouping fields.