Subqueries vs. Joins

July 8, 2009 Hey, Ted

Please settle a disagreement. Another developer claims that it is bad to use an SQL subquery if the same results can be achieved with a join. He says that database management systems can optimize joins better than they can optimize subqueries, which means that joins run faster than subqueries. Is this true? I seem to have heard somewhere that the query engine converts a subquery to a join when possible.

–Dee

If the other developer is experienced with database management systems other than DB2, than he may be correct. However, when it comes to DB2 for i, you are correct.

For the benefit of other readers, let’s first define the term subquery. A subquery is a query inside another query. Probably the most common place to put a subquery is in the WHERE clause of a SELECT statement, like this:

select cusnum, lstnam, city, state
  from qcustcdt
 where state in
        (select abbr
           from states
          where region = 'W')
 order by 1

The STATES table has three columns (fields):

Two-character state abbreviation
State name
Region code

The subquery (the inner query) builds a list of state abbreviations from the rows (records) that have a region code of “W”. The outer query selects customers who live in the states in the list.

Sometimes a query with a join yields the same results.

select cus.cusnum, cus.lstnam, cus.city, cus.state
  from qcustcdt as cus
  join states as st
    on cus.state = st.abbr
 where st.region = 'W'
 order by 1

In this case, the system joins the two tables on common two-digit state abbreviation and selects the customers in region W. But both queries return the same data.

I sent Dee’s question to Dan Cruikshank of IBM, and this is his response:

Both query optimizers (CQE and SQE) attempt to implement SQL subqueries as join composites (convert a subquery to a join.) This allows the optimizer (especially SQE) to take advantage of DB2 for i’s extensive join technology (i.e., look-ahead predicate generation).

This does not mean that you should write joins instead of subqueries. That would not bode well for the many third-party SQL code generator products that use subqueries instead of joins. In fact, even though DB2 for i does not support updateable join syntax, the optimizer will attempt to convert a searched UPDATE containing scalar subselects, subqueries, or both to an updateable join under the covers.

Self-referencing UPDATEs (i.e., the table being updated is included as part of the subquery) may result in a copy being made of the table being updated. This would certainly cause an increase in memory and disk demand, although I would suspect that this is more of a one time only type scenario.

Because CQE has more limitations than SQE, IBM recommends avoiding the CQE optimizer if you can. Review the reason why the query was dispatched to the CQE optimizer and correct it. This information is captured by the database monitor and can be displayed using the Visual Explain tool or the System i Navigator Database Monitor Analysis dashboard.

The two most common reasons for CQE are:

Specifying a DDS Logical File on the FROM clause of a query
The presence of a select/omit logical file when the QAQQINI setting for IGNORE_DERIVED_INDEXES is *NO

The bottom line: if you like subqueries, use them.

–Ted

                     Post this story to del.icio.us
               Post this story to Digg
    Post this story to Slashdot