Throw Away Old Summary Query Techniques
January 12, 2016 Ted Holt
It’s a new year! Out with the old! In with the new! That exhortation applies not only to everyday life, but also to the ways we write SQL queries. Let me show you two ways we used to have to write summary queries and two new ways that are better.
First, we need a database to query. Here’s a table of customers.
ID Name === ===================== 101 J. Cheever Loophole 102 Otis B. Driftwood 103 Quincy Adams Wagstaff 104 Rufus T. Firefly
And here’s a table of shipments.
ID CUSTOMER ITEM QUANTITY PRICE === ======== ====== ======== ===== 1 102 AB-200 2 .25 2 104 BC-300 4 1.00 3 102 AB-200 1 .20 4 101 EF-303 2 1.00 5 102 BB-772 6 .10
Now, suppose you’ve been asked to sum up these sales by customer. For a professional like yourself, that’s easy.
select customer, sum(quantity * price) from sales group by customer order by customer CUSTOMER SUM ======== ==== 101 2.00 102 1.30 104 4.00
Oh, but now the requestor asks you to include the name of each customer. Since the customer name is not in the sales file, you must add the customer table to the query.
You could do this:
select s.customer, c.name, sum(s.quantity * s.price) from sales as s join cust as c on s.customer = c.id group by s.customer, c.name order by s.customer CUSTOMER NAME SUM ======== ===================== ==== 101 J. Cheever Loophole 2.00 102 Otis B. Driftwood 1.30 104 Rufus T. Firefly 4.00
It gets the proper results, but to do so it must include the customer name as a summary column. (That is, customer name had to be added to the GROUP BY clause.) Adding this unnecessary grouping column (field) is extra work that can increase the run time when the query runs over a large amount of data.
You could do this instead:
select s.customer, max(c.name), sum(s.quantity * s.price) from sales as s join cust as c on s.customer = c.id group by s.customer order by s.customer CUSTOMER MAX(NAME) SUM ======== ===================== ==== 101 J. Cheever Loophole 2.00 102 Otis B. Driftwood 1.30 104 Rufus T. Firefly 4.00
The customer name is no longer a summary column, but notice the MAX function in SELECT. Since the customer name is functionally dependent on customer ID, the system will waste time finding the maximum of many identical values. It’s extra work, and it’s also misleading, as it implies that the value of NAME may be different for sales to the same customer number.
I used to use such methods years ago, when SQL was less powerful, but I don’t use them now and I don’t recommend them to you. Instead, I would access the summary query as if it were a table or view:
select s.customer, c.name, s.sales from (select customer, sum(quantity * price) as sales from sales group by customer) as s join cust as c on s.customer = c.id order by s.customer CUSTOMER NAME SUM ======== ===================== ==== 101 J. Cheever Loophole 2.00 102 Otis B. Driftwood 1.30 104 Rufus T. Firefly 4.00
The customer name is not a summary field and there is no MAX function. The system runs the inner query first to build the summary figures, then joins that result set to the customer master table to get the customer name.
You could also use a common table expression.
with s as (select customer, sum(quantity * price) as sales from sales group by customer) select s.customer, c.name, s.sales from s join cust as c on s.customer = c.id order by s.customer
I still see queries like the first two. Maybe old habits die hard.
Not only do I encourage people to use the newer techniques, I encourage them to consider rewriting any old queries that use these techniques.
Replacing old techniques with new ones may seem like a small thing, but small things can make a big difference. As Jim Rohn said, “Success is doing ordinary things extraordinarily well.”