How Do I Join Tables? Let Me Count The Ways
September 1, 2015 Ted Holt
Normalization is the process of making sure that each datum is stored in the proper table. Storing data in the wrong place gives rise to anomalies, a fancy word for problems, and you have enough problems already. To make sense of normalized data requires that tables be joined. Do you know the methods to code a join with SQL and the advantages and disadvantages of each one?
Method 1: WHERE
When I first learned SQL, joining was done in the WHERE clause of the SELECT statement. Here’s an example.
select h.*, d.* from SalesOrderHeaders as h, SalesOrderLines as d where h.OrderNumber = d.OrderNumber
In this query, I request all columns of the sales order header table and all columns of the sales order lines. The WHERE clause tells the system to match the data on a common sales order number. That is, the header information on any row of the result set applies to one line of the same sales order.
This syntax has its shortcomings, the greatest of which I consider to be that unmatched orders are dropped. That is, if an order has a header but no lines, or one or more lines but no header, that order does not show up in the result set. DBMS providers had to come up with extensions to support outer joins.
I do not use this syntax because I see no advantage in it. I recommend that you know this syntax so you can recognize and understand it if you see it, but I do not recommend that you use it.
Method 2: JOIN . . . USING
This is an implementation of the natural join. The first thing to know about natural joins is that two tables/views are joined on common field names of the same data type.
select * from SalesOrderHeaders join SalesOrderLines using (OrderNumber)
OrderNumber is the common column. That is to say, the order number is named OrderNumber in both tables and is defined with the same data type in both tables.
The second thing to know about natural joins is how they behave when you use an asterisk to select all the columns. Here’s the result set from the previous query.
The result set consists of:
This makes sense to me. Why return two columns with the same data? Would you like to see something that does not make sense to me? Look at this:
select h.*, d.* from SalesOrderHeaders as h join SalesOrderLines as d using (OrderNumber)
Here’s the result set:
Look, Ma! No order number. It’s as if the common columns are not part of either table.
I rarely use the JOIN . . . USING syntax. I don’t need it, because there’s something better, namely . . .
Method 3: JOIN . . . ON
This is the method that does it all. You can join on whatever you want to, regardless of column name and data type.
Here’s an example of JOIN . . . ON.
select h.*, d.* from SalesOrderHeaders as h join SalesOrderLines as d on h.OrderNumber = d.OrderNumber
And here’s the result set.
Not only can I put column names in the join expression, I can even put expressions and literals. In this example, manufacturing job number 123456, for example, matches sales order S123456.
select m.MfgOrderNo, m.DueDate, c.SalesOrderNo, c.CustomerNo from MfgOrdHdr as m left join SalesOrdHdr as c on m.jobno = substr(c.SalesOrderNo,2)
This is the syntax I use and recommend.