fhg
Volume 10, Number 10 -- March 17, 2010

The State of the UNION

Published: March 17, 2010

by Skip Marchesani

DB2 for i supports the concept of union, which creates or derives a single result set table by combining two other result set tables, each of which was derived from a SELECT statement or another UNION. (See the discussion on cascading unions at the end of this article). A union is very useful when an SQL query must operate on two or more tables where JOIN cannot be used to produce the desired result set table.

UNION is actually a clause for what is referred to in SQL terminology as a FULLSELECT. The combination of the first SELECT statement, followed by the UNION clause, followed by the second SELECT statement comprises the FULLSELECT.

The simplified SQL syntax for a FULLSELECT is:

First SELECT statement
	UNION clause
Second SELECT statement
	ORDER BY clause (optional)	

DB2 for i supports four types of union clauses:

  • UNION (or UNION DISTINCT)
  • UNION ALL
  • EXCEPT DISTINCT
  • INTERSECT DISTINCT

UNION creates or derives a final single result set table by combining the result set table from the first SELECT statement with the result set table from the second SELECT statement, eliminating any duplicate rows in the final result set.

UNION ALL is the same as UNION except that all rows, including duplicates, from the first and second SELECT statements are included in the final result set table.

EXCEPT DISTINCT differs from UNION in that the final result set table consists of rows that only exist in the result set table for the first SELECT statement, eliminating any duplicate rows in the final result set.

INTERSECT DISTINCT differs from UNION in that the final result set table consists only of those rows that exist in the result set tables for both the first and second SELECT statements, eliminating any duplicate rows in the final result set.

To better understand how the four types of UNION work, let's look as some examples.

We will use two tables: the employee master file for 2010 called EMP, and the employee history file for 2009 called EMPHIST. The rows and columns for these two tables are shown below.

EMP--Employee Master Table for 2010
Nbr	Nam		Dpt	 Sal
10	Ed        	911	7000
20	Heikki    	901	6000
30	John      	977	3200
40	Mike      	977	6500
50	Marcela   	911	7500
60	Frank     	990	6500
EMPHIST--Employee History Table for 2009
Nbr	Nam		Dpt	 Sal
10	Ed        	911	6800
30	John      	977	3000
60	Frank     	990	6300
70	Sam       	901	3200

Note that there are six employees in the table called EMP, four employees in the table called EMPHIST, with employee numbers 10, 30, and 60 appearing in both tables.

UNION

With UNION, the final result set table consists of the rows from the result set table from the first SELECT statement combined with rows from the result set table from the second SELECT statement, with any duplicate rows eliminated in the final result set.

Following is the syntax for UNION.

SELECT nbr, nam, dpt FROM emp
     UNION
SELECT nbr, nam, dpt FROM emphist
     ORDER BY nbr

And, the final result set table that is created by UNION contains the following rows.

Nbr	Nam		Dpt
10	Ed        	911
20	Heikki    	901
30	John      	977
40	Mike      	977
50	Marcela   	911
60	Frank     	990
70	Sam       	901

Note that there are a total of 10 rows in both tables (six in EMP and four in EMPHIST), but only seven rows in the final result set table. Since this is a UNION, and employee numbers 10, 30, and 60, exist in both the result set table for the first SELECT statement (SELECT . . . FROM emp) and second SELECT statement (SELECT . . . FROM emphist), the duplicate rows are eliminated and the rows for employee numbers 10, 30, and 60, only appear once in the final result set table.

Suppose we change the syntax for the UNION to include the SAL column. What happens?

SELECT nbr, nam, dpt, sal FROM emp
     UNION
SELECT nbr, nam, dpt, sal FROM emphist
     ORDER BY nbr

The final result set table has all 10 rows as shown below. Why?

Nbr	Nam		Dpt	 Sal
10	Ed        	911	7000
10	Ed        	911	6800
20	Heikki    	901	6000
30	John      	977	3200
30	John      	977	3000
40	Mike      	977	6500
50	Marcela   	911	7500
60	Frank     	990	6500
60	Frank     	990	6300
70	Sam       	901	3200

From a DB2 for i perspective, one row in the final result set is considered a duplicate if the value for each column of that row matches the value of the corresponding column of another row in the final result set.

Even though NBR, NAM, and DPT, are the same for three pair of rows, SAL is different in each row and therefore there are no rows with duplicate data.

UNION ALL

With UNION ALL, all rows from the result set tables from the first and second SELECT statements, including duplicates, are included in the final result set table.

Below is the syntax for UNION ALL. Note that SAL was included in the syntax for UNION ALL to aid in an example later in this discussion.

SELECT nbr, nam, dpt, sal FROM emp
     UNION ALL
SELECT nbr, nam, dpt, sal FROM emphist
     ORDER BY nbr

And, the final result set table that is created by UNION ALL contains the following rows.

Nbr	Nam		Dpt	 Sal
10	Ed        	911	7000
10	Ed        	911	6800
20	Heikki    	901	6000
30	John      	977	3200
30	John      	977	3000
40	Mike      	977	6500
50	Marcela   	911	7500
60	Frank     	990	6500
60	Frank     	990	6300
70	Sam       	901	3200

The final result set contains 10 rows in the final result set table. The table called EMP has six rows, the table called EMPHIST has four rows, and since UNION ALL includes all rows from both the first and second SELECT statements, 10 rows in the final result set table is what we expect to see. And yes, this is the same result set from the previous UNION example where SAL was included in both the first and second SELECT statements.

Note that without going back and looking at the salary in the respective tables, it is not easy to tell where each row in the final result set table came from and what year the salary references.

However, a simple modification to the UNION ALL syntax will allow us to identify the source of each row. We will add a literal named YEAR to identify the year at the beginning of the select list for both the first and second SELECT statements as shown below (2009 for EMPHIST and 2010 for EMP). Note that YEAR was also added as the secondary ordering criteria.

SELECT '2010' AS year, nbr, nam, dpt, sal FROM emp
     UNION ALL
SELECT '2009' AS year, nbr, nam, dpt, sal FROM emphist
     ORDER BY nbr, year

The UNION ALL final result set table then looks like the following, and it is then no problem to determine the table where each row originated.

Year    Nbr	Nam		Dpt	 Sal
2009	10	Ed        	911	6800
2010	10	Ed        	911	7000
2010	20	Heikki    	901	6000
2009	30	John      	977	3000
2010	30	John      	977	3200
2010	40	Mike      	977	6500
2010	50	Marcela   	911	7500
2009	60	Frank     	990	6300
2010	60	Frank     	990	6500
2009	70	Sam       	901	3200

EXCEPT DISTINCT

With EXCEPT DISTINCT, the final result set table consists of rows that only exist in the result set table for the first SELECT statement, and any duplicate rows in the final result set are eliminated.

Following below is the syntax for EXCEPT DISTINCT. Note that SAL is not included this time.

SELECT nbr, nam, dpt FROM emp
     EXCEPT DISTINCT
SELECT nbr, nam, dpt FROM emphist
     ORDER BY nbr

And, the final result set table that is created by EXCEPT DISTINCT contains the following rows.

Nbr	Nam		Dpt
20	Heikki    	901
40	Mike      	977
50	Marcela   	911

The result set table for the first SELECT statement has one row for employee numbers 10, 20, 30, 40, 50 and 60, in the EMP table. The result set table for the second SELECT statement has one row for employee numbers 10, 30, 60, and 70, from the EMPHIST table. The result set table for EXCEPT DISTICNT will have one row for each row found in the result set table for the first SELECT statement that is not in the result set table for the second SELECT statement. Since employee numbers 10, 30, and 60, are in the result set tables for both the first and second SELECT statements, only employee numbers 20, 40, and 50, are in the final result set table for the EXCEPT DISTINCT.

INTERSECT DISTINCT

With INTERSECT DISTINCT, the final result set table consists of only those rows that exist in the result set tables for both the first and second SELECT statements, and any duplicate rows in the final result set are eliminated.

Following below is the syntax for INTERSECT DISTINCT. Note that SAL is not included this time

SELECT nbr, nam, dpt FROM emp
     INTERSECT DISTINCT
SELECT nbr, nam, dpt FROM emphist
     ORDER BY nbr

And, the final result set table that is created by INTERSECT DISTINCT contains the following rows.

Nbr	Nam		Dpt
10	Ed        	911
30	John      	977
60	Frank     	990

The result set table for the first SELECT statement has one row for employee numbers 10, 20, 30, 40, 50, and 60, in the EMP table. The result set table for the second SELECT statement has one row for employee numbers 10, 30, 60, and 70, from the EMPHIST table. The result set table for INTERSECT DISTICNT will have one row for each row found in the result set table for both the first and second SELECT statements. Since employee numbers 10, 30, and 60, are in the result set tables for both the first and second SELECT statements, they are in the final result set table for the INTERSECT DISTINCT. In effect, this is the opposite of EXCEPT DISTINCT.

Cascading Unions

If more than two tables need to be combined, you can use a cascading union. The simplified syntax for a cascading union follows below.

	SELECT statement #1
		UNION clause #1
	SELECT statement #2
                        UNION clause #2
	SELECT statement #3	
		UNION clause #3
	…
		UNION clause #n-1
SELECT statement #n
ORDER BY clause (optional)

As stated in the beginning of this article, a union creates or derives a final, single result set table by combining two other result set tables, each of which were derived from a SELECT statement or another UNION clause.

To clarify this statement, in the above syntax for a cascading union the final result set table for UNION clause #1 becomes the first result set table for UNION clause #2. The final result set table for UNION clause #2 becomes the first result set table for UNION clause #3, and so on.

Union Considerations

The result set table for the first SELECT statement must have the same number of columns as the result set table for the second SELECT statement. From a positional perspective, the column names do not have to be the same, nor must their data attributes or types be identical. The only requirement is that data types and attributes must be compatible. That is, one numeric type is compatible with any other numeric type, and one character type is compatible with any other character type, etc.

From a positional perspective, if the column name in the result set table for the first SELECT statement is the same as the column name in the result set table for the second SELECT statement, the corresponding column in the final result set table will have that same name. If the column names are different, then the corresponding column in the final result set table will be named with the ordinal number of the position of that column in the final result set table.

Where To Use Union

As stated in the beginning of this article, a union is very useful when an SQL statement or query must operate on two or more tables where JOIN cannot be used to produce the desired result set table.

For example, if you have multiple history tables--one for each specific time period (year, month, week, etc.), and their record formats are similar and compatible, a union is a good way to query and combine two or more of these tables to derive a single, final result set table using SQL.

Specifically I use union to combine multiple tables to do some detailed data analysis, which will be the subject of my next tip.


Skip Marchesani retired from IBM after 30 years and is now a consultant with Custom Systems Corporation. He is also a founding partner of System i Developer and the RPG & DB2 Summit. Skip spent much of his IBM career working with the Rochester Development Lab on projects for S/38 and AS/400 and was involved with the development of the AS/400. He was part of the team that taught early AS/400 education to customers and IBM lab sites worldwide. Skip is recognized as an industry expert on DB2 for i and the author of the book DB2/400: The New AS/400 Database. He specializes in providing customized education for any area of the System i, iSeries, and AS/400; does database design and design reviews; and performs general System i, iSeries, and AS/400 consulting for interested clients. He has been a speaker for user groups, technical conferences, and System i, iSeries, and AS/400 audiences around the world. He is an award-winning COMMON speaker and has received its Distinguished Service Award. Send your questions or comments for Skip to Ted Holt via the IT Jungle Contact page.


RELATED STORIES

Grouping a Union

A Database Union is Not a Join

Weird SQL Union Results



                     Post this story to del.icio.us
               Post this story to Digg
    Post this story to Slashdot


Sponsored By
WORKSRIGHT SOFTWARE

Do you need area code information?
Do you need ZIP Code information?
Do you need ZIP+4 information?
Do you need city name information?
Do you need county information?
Do you need a nearest dealer locator system?

We can HELP! We have affordable AS/400 software and data to do all of the above. Whether you need a simple city name retrieval system or a sophisticated CASS postal coding system, we have it for you!

The ZIP/CITY system is based on 5-digit ZIP Codes. You can retrieve city names, state names, county names, area codes, time zones, latitude, longitude, and more just by knowing the ZIP Code. We supply information on all the latest area code changes. A nearest dealer locator function is also included. ZIP/CITY includes software, data, monthly updates, and unlimited support. The cost is $495 per year.

PER/ZIP4 is a sophisticated CASS certified postal coding system for assigning ZIP Codes, ZIP+4, carrier route, and delivery point codes. PER/ZIP4 also provides county names and FIPS codes. PER/ZIP4 can be used interactively, in batch, and with callable programs. PER/ZIP4 includes software, data, monthly updates, and unlimited support. The cost is $3,900 for the first year, and $1,950 for renewal.

Just call us and we'll arrange for 30 days FREE use of either
ZIP/CITY or PER/ZIP4.

WorksRight Software, Inc.
Phone: 601-856-8337
Fax: 601-856-9432
E-mail: software@worksright.com
Web site: www.worksright.com


Senior Technical Editor: Ted Holt
Technical Editor: Joe Hertvik
Contributing Technical Editors: Erwin Earley, Brian Kelly, Michael Sansoterra
Publisher and Advertising Director: Jenny Thomas
Advertising Sales Representative: Kim Reed
Contact the Editors: To contact anyone on the IT Jungle Team
Go to our contacts page and send us a message.

Sponsored Links

PowerTech:  Strengthen your security. Get a FREE Compliance Assessment today!
New Generation Software:  NGS-IQ: Reporting & BI Software. Easy to Install. Easy to Use.
COMMON:  Join us at the annual 2010 conference, May 3 - 6, in Orlando, Florida


 

IT Jungle Store Top Book Picks

Easy Steps to Internet Programming for AS/400, iSeries, and System i: List Price, $49.95
The iSeries Express Web Implementer's Guide: List Price, $49.95
The System i RPG & RPG IV Tutorial and Lab Exercises: List Price, $59.95
The System i Pocket RPG & RPG IV Guide: List Price, $69.95
The iSeries Pocket Database Guide: List Price, $59.00
The iSeries Pocket SQL Guide: List Price, $59.00
The iSeries Pocket Query Guide: List Price, $49.00
The iSeries Pocket WebFacing Primer: List Price, $39.00
Migrating to WebSphere Express for iSeries: List Price, $49.00
Getting Started With WebSphere Development Studio Client for iSeries: List Price, $89.00
Getting Started with WebSphere Express for iSeries: List Price, $49.00
Can the AS/400 Survive IBM?: List Price, $49.00
Chip Wars: List Price, $29.95


 
The Four Hundred
Old Code Meets New Ideas in Latest App Modernization Projects

IBM Cuts Prices for Upgrades to Power 595s

Power 750 Servers Running i Get SAP Benchmarks

Mad Dog 21/21: Mad Dog 21/21: The Teahad Pilot, the Sycophant Senator, and IBM

Internal Disk Arrays Prop Up Storage Sales in Q4

Four Hundred Stuff
System i Automation from RevSoft is 'FAB'ulous

Profound Simplifies Screen Design in Genie Web-Enablement Tool

'Legacy' Integration Blossoms During Recession, Seagull Says

Jack Henry Shop Adopts Banking Framework from IBM

Spinnaker Adds JDE Consulting to 3rd Party Maintenance Business

Four Hundred Monitor
Four Hundred Monitor's
Full iSeries Events Calendar

System i PTF Guide
March 13, 2010: Volume 12, Number 11

March 6, 2010: Volume 12, Number 10

February 27, 2010: Volume 12, Number 09

February 20, 2010: Volume 12, Number 08

February 13, 2010: Volume 12, Number 07

February 6, 2010: Volume 12, Number 06

TPM at The Register
VMware opens up Workstation 7.1 beta

Shuttleworth heir opens up on Ubuntu biz

Football U goes to 3Leaf for HPC

Hedge fund suitor denies Novell asset sale rumors

Super Micro to launch AMD render cloud

CA eats Nimsoft cloud watcher

Swedes serve up flicks with KVM

Citrix tunes XenApp for Windows Server R2

Voltaire brings InfiniBand switch to the masses

Cisco 'forever changes internet' with... a router

Terracotta's Ehcache back-ends Hibernate

Tilera wins VC from Broadcom, Quanta, NTT

THIS ISSUE SPONSORED BY:

WorksRight Software
SEQUEL Software
Halcyon Software


Printer Friendly Version


TABLE OF CONTENTS
The State of the UNION

A Good Use for Global Variables

Does An IPL Really Help Improve System Performance?

Four Hundred Guru

BACK ISSUES




 
Subscription Information:
You can unsubscribe, change your email address, or sign up for any of IT Jungle's free e-newsletters through our Web site at http://www.itjungle.com/sub/subscribe.html.

Copyright © 1996-2010 Guild Companies, Inc. All Rights Reserved.
Guild Companies, Inc., 50 Park Terrace East, Suite 8F, New York, NY 10034

Privacy Statement