Guru: Debugging Common Table Expressions
October 21, 2019 Ted Holt
I cannot say enough good things about common table expressions. Words like wonderful and marvelous don’t begin to describe them. However, CTEs do add a bit of complexity to an SQL query, and when the result set doesn’t contain the correct results, any common table expression can be the culprit. Fortunately, debugging queries with common table expressions is not difficult.
To illustrate what I mean, let’s assume we have a query that retrieves shipment information for one day. It involves a few tables:
- a one-row table containing a shipment date
- a table of shipment header information
- a customer master table
Here it is:
with Temp01 as (select s.shipmentID, s.customerID, s.amount from shipments as s where s.shipdate = (select BillingDate from BillingDate)), Temp02 as (select sum(t1.amount) as TotalShipped from Temp01 as t1), Temp03 as (select t1.shipmentID, t1.customerID, c.name, t1.amount, dec(round(t1.amount / t2.TotalShipped * 100,2),5,2) as Percentage from Temp01 as t1 cross join Temp02 as t2 join customers as c on t1.customerID = c.account) select t3.* from Temp03 as t3 order by t3.shipmentID
I don’t want any smart-aleck emails telling me that this is a stupid way to solve the query. I know that. I just need something with a few common table expressions to illustrate the process of debugging. In this case, there are three of them, cleverly named TEMP01, TEMP02, and TEMP03.
Let’s assume further that one of the people whom we serve has informed us that the shipment figures coming out of this query are wrong. We find it hard to believe. It always worked properly during testing. Here are the results of the query for October 21, 2019:
SHIPMENTID | CUSTOMERID | NAME | AMOUNT | PERCENTAGE |
10 | 100 | Sarah Bellum | 500.0 | 33.33 |
10 | 100 | Bill Fold | 500.0 | 33.33 |
11 | 110 | Polly Fonnick | 400.0 | 26.67 |
12 | 100 | Sarah Bellum | 600.0 | 40.0 |
12 | 100 | Bill Fold | 600.0 | 40.0 |
That’s odd. There were three shipments that day, and two of them appear to have gone to two customers. That’s impossible.
Obviously one of the common table expressions has a problem, but which one? Common table expressions are not permanent database objects, so how do we see their contents? It’s easy. We query the common table expressions, one by one, instead of the final SELECT expression.
First, let’s see what the first common table expression is retrieving.
with Temp01 as
(select s.shipmentID, s.customerID, s.amount
from shipments as s
where s.shipdate =
(select BillingDate from BillingDate)),
Temp02 as
(select sum(t1.amount) as TotalShipped
from Temp01 as t1),
Temp03 as
(select t1.shipmentID, t1.customerID, c.name, t1.amount,
dec(round(t1.amount / t2.TotalShipped * 100,2),5,2)
as Percentage
from Temp01 as t1
cross join Temp02 as t2
join customers as c
on t1.customerID = c.account)
/* select t3.* from Temp03 as t3
order by t3.shipmentID; */
select * from Temp01
I have commented-out the final SELECT and temporarily added a SELECT that shows the contents of the first CTE. The modified final SELECT no longer references CTE’s Temp02 and Temp03.
SHIPMENTID | CUSTOMERID | AMOUNT |
10 | 100 | 500.0 |
12 | 100 | 600.0 |
11 | 110 | 400.0 |
That looks fine. There are three orders: two for customer 100 and one for customer 110. What does Temp02 look like?
with Temp01 as
(select s.shipmentID, s.customerID, s.amount
from shipments as s
where s.shipdate =
(select BillingDate from BillingDate)),
Temp02 as
(select sum(t1.amount) as TotalShipped
from Temp01 as t1),
Temp03 as
(select t1.shipmentID, t1.customerID, c.name, t1.amount,
dec(round(t1.amount / t2.TotalShipped * 100,2),5,2
) as Percentage
from Temp01 as t1
cross join Temp02 as t2
join customers as c
on t1.customerID = c.account)
/* select t3.* from Temp03 as t3
order by t3.shipmentID; */
select * from Temp02
TOTALSHIPPED |
1500.00 |
That’s good. We’re back to Temp03. It has to be the culprit. Here’s the result set again.
SHIPMENTID | CUSTOMERID | NAME | AMOUNT | PERCENTAGE |
10 | 100 | Sarah Bellum | 500.0 | 33.33 |
10 | 100 | Bill Fold | 500.0 | 33.33 |
11 | 110 | Polly Fonnick | 400.0 | 26.67 |
12 | 100 | Sarah Bellum | 600.0 | 40.0 |
12 | 100 | Bill Fold | 600.0 | 40.0 |
How can customer number 100 have two names? And then it hits us: a customer is identified by two values — a company number and an account number. Querying the customer master file for customer 100 shows us two customers:
select * from customers where account = 100
COMPANY | ACCOUNT | NAME |
1 | 100 | Sarah Bellum |
2 | 100 | Bill Fold |
It appears that testing was done with the data of one company only. Or perhaps the test data included more than one company’s shipments, but no two companies had common customer account numbers in the dataset. Whatever happened, the solution is to add the company ID to the query.
with Temp01 as (select s.shipmentID, s.companyID, s.customerID, s.amount from shipments as s where s.shipdate = (select BillingDate from BillingDate)), Temp02 as (select sum(t1.amount) as TotalShipped from Temp01 as t1), Temp03 as (select t1.shipmentID, t1.companyID, t1.customerID, c.name, t1.amount, dec(round(t1.amount / t2.TotalShipped * 100,2),5,2) as Percentage from Temp01 as t1 cross join Temp02 as t2 join customers as c on t1.companyID = c.company and t1.customerID = c.account) select t3.* from Temp03 as t3 order by t3.shipmentID
SHIPMENTID | COMPANYID | CUSTOMERID | NAME | AMOUNT | PERCENTAGE |
10 | 1 | 100 | Sarah Bellum | 500.0 | 33.33 |
11 | 1 | 110 | Polly Fonnick | 400.0 | 26.67 |
12 | 2 | 100 | Bill Fold | 600.0 | 40.0 |
Temp01 now contains the company number and the join to CUSTOMERS in Temp03 now includes the company ID. The query is accurate.
The moral of the story, then, is that you do not have to copy and paste pieces — or comment out pieces — of a query that uses common table expressions in order to view the intermediate results that they provide. Just query the CTE in the final query.
I leave you with one more interesting fact. I learned recently from Rob Bestgen of IBM that the query engine doesn’t even bother to run the CTEs that follow the one that you’re focusing on because it can determine that the results of those common table expressions won’t be used. The people who bring us DB2 for i are nothing short of amazing! Words like wonderful and marvelous don’t begin to describe them either.
Agree completely on all of your examples and accolades for both CTEs and IBM!
Great job Ted. that’s typically how I would’ve debugged it as well. I’m new to my SQL Server DBA (been IBM/RPG’er since system/38 days) and needing this skill more and more each day.