fhg
Volume 8, Number 3 -- January 23, 2008

A Database Union is Not a Join

Published: January 23, 2008

Hey, Brian:

We recently acquired a company that has had its own payroll system for many years. We are going to combine the two payrolls starting next year but in the meantime, we would like to be able to use some of our internal cost analysis programs on the combined data without combining the two files.

We have created an earning master physical file in which we are staging the data that we will use to merge the two systems next year. This file has the exact shape as our current payroll master file. Without creating another file and physically merging them, thus creating another burden for our staff, I would like to "join" all the records using DDS into one file and override the report programs so that they use this logical file for reporting. However, when I try to join them, I do not get all the records. In fact, depending on which file I specify first in the join, I get exactly that many records in my output view. Offhand, do you know what I am doing wrong?

--Aaron


Hi Aaron,

E. F. "Ted" Codd, the designer of relational database, defined several relational operations. The select operation limits the rows (records) in a view. The project operation limits the columns (fields) in a view. The join is also a powerful operation that combines data from matching records in multiple files. However, the join does not have the ability to take two separate record sets and merge them into one file, such that the number of records in the merged file equals the total number of records in both files. The merge operation as defined by Ted Codd is called union.

Though there is no UNION keyword in DDS, there is a way to code for UNION. In the next two examples, you have an old payroll master file and a new master file. In essence this is your situation. Your objective is to marry (logically merge or union) the two files, using fields that are common to both of them.

Unlike a join, the result of a union is a view of similar data in multiple files as if the data were all lumped together, and then sorted by the union key level field. In other words, if there were a three-file union, and each of the three physical files had 33 records, the union view would have 99 records--all sorted together like one big homogenous file view.

Now, contrast union with join. For example, let's say you joined the same three physical files. Assume, for this instance, that you know that the data is perfect and that the resulting join would produce a logical file that was really a natural, inner, and matching join (three names for the same thing). In other words, there would be a matching record in each of the secondaries (underlying physical files), for each primary record. In other words again, both secondary physical files would each have one data record to supply for each primary record presented by the join logical view.

If this were all true, then after the join, the resulting view would not show 99 records, as in the union example. Instead it would show just 33 records. As a friend of mine would say, "'combinatorically' speaking, the join combines the three records into one record." Therefore, each record would show a piece (one or several fields) from each of the physical files that were joined.

The following is an example of how to code a union for your shop. First we show the two physical files, NEWMSTR and OLDMSTR and then the test data in the files and then the join with some of the fields.

 A*  PAYROLL MASTER   NEWMSTR
 A          R EARNMSTR     ... TEXT('EARNINGS MASTER')
 A            EMPLNAME      25                        
 A            EMPINL         2                        
 A            ACCT#          6  0                     
 A            PAYCOD         1                        
 A            RATE           6S 2                     
 A            EMPNO          6  0                     
 A            LCHDAT          L ... TEXT('ADDED FIELD')
 A          K EMPNO                                    
 A*  PAYROLL MASTER   OLDMSTR
 A          R OLDEARNR  ...     TEXT('OLD EARN MASTER')
 A            EMPLNAME      25                         
 A            EMPINL         2                         
 A            ACCT#          6  0                      
 A            PAYCOD         1                         
 A            RATE           6S 2                      
 A            EMPNO          6  0                      
 A          K EMPNO                                    

Query Against NEWMSTR

 Line   ....+....1....+....2....+....3....+....4....+....5....+....6....+.
        EMPLNAME        EMPINL  ACCT# PAYCOD     RATE    EMPNO  LCHDAT    
 000001 Snidely           JJ   245765   D        3.45        0  2007-12-13
 000002 Norman            CJ   897345   X        5.63       20  2007-12-14
 000003 Binsnon           JX   567345   K        4.53       30  2007-12-15
 ****** ********  End of report  ********       

Query Against OLDMSTR

 Line   ....+....1....+....2....+....3....+....4....+....5....+ 
        EMPLNAME        EMPINL  ACCT# PAYCOD     RATE    EMPNO  
 000001 Waldono           GG   567567   H        5.53        5  
 000002 Finstermick       KC   094536   L        4.67       15  
 000003 Walloby           FM   345298   H        6.89       25  
 ****** ********  End of report  ********                       

DDS LF - UNION - Some Fields

A* LOGICAL FILE (PRUNSOME)
A*                        
A          R PRUNSOMR    ...    PFILE(NEWMSTR OLDMSTR) 
A            EMPLNAME                                         
A            EMPINL                                           
A            RATE                                             
A            EMPNO                                            
A            PAYCOD                                           
           K EMPNO                                            

Query Against PRUNSOME

 Line   ....+....1....+....2....+....3....+....4....+...
        EMPLNAME        EMPINL     RATE    EMPNO  PAYCOD
 000001 Snidely           JJ       3.45        0    D   
 000002 Waldono           GG       5.53        5    H   
 000003 Finstermick       KC       4.67       15    L   
 000004 Norman            CJ       5.63       20    X   
 000005 Walloby           FM       6.89       25    H   
 000006 Binsnon           JX       4.53       30    K   
 ****** ********  End of report  ********               

Union takes two almost identically structured files and combines them. A good DDS coding example is that shown above, in which there are new or current masters (NEWMSTR) and old masters (OLDMSTR). The new, combined, view of the data (PRUNSOME logical file) includes only the named (projected) fields from all records in both the old Payroll Earnings Master (OLDMSTR) and the new Payroll Earnings Master (NEWMSTR).

Overall, up to 32 files can be unioned. Though there are two files defined in the PFILE, in this example, without library qualifiers, this is not a join, since there is no Join Record Format (J in column 17). Each record contains data from just one physical file, not both--as in a JOIN. A key field (EMPNO in this case) must be specified in a union. The key field sorts the records in this case by employee number. Note in the PRUNSOME query output that the records from the NEWMSTR and OLDMSTR files are interleaved due to the sort on EMPNO.

The trick is to use non-join file DDS and to specify both files using the PFILE keyword as shown in the example above. It provides the same facility as physically merging the files and when you override your reporting analysis programs to this file, as long as the fields are exposed, both companies' data will be available for management analysis. Hope this does the trick for you.

Here are a few other examples of union using DDS.

DDS for PRUNALL Logical File

 A*  UNION A FILE WITH ITSELF -- UNION LOGICAL FILE PRUNALL   
 A*                                                           
 A          R EARNMSTR                  PFILE(NEWMSTR NEWMSTR)
 A          K EMPNO                                           

This union is with the fields from all records in both the "old payroll master" and the "new payroll master." Notice the format name is EARNMSTR. This says that the union will use the format (all the fields) of the new master file. There cannot be any extra fields in NEWMSTR, compared to the second file, whether it is OLDMSTR or another file or another reference to NEWMSTR as in the example above. Since all fields in NEWMSTR also exist in the second reference to NEWMSTR, this is a valid union. If there are any field differences in the two or more files involved in the join, the logical file will not compile and it will produce severe error messages, and thus will not be created. The second copy of NEWMSTR can be replaced in the above union by a different physical file with identical fields as NEWMSTR and this will work the same. However, OLDMSTR cannot be used in this example since it does not have the LCHDAT field as does the NEWMSTR.

Note that the data from NEWMSTR is repeated twice when the new logical file PRUNALL is queried as shown below:

Query of Logical Union FIle PRUNALL

       EMPLNAME        EMPINL  ACCT# PAYCOD     RATE    EMPNO  LCHDAT     
000001 Snidely           JJ   245765   D        3.45        0  2007-12-13 
000002 Norman            CJ   897345   X        5.63       20  2007-12-14
000003 Binsnon           JX   567345   K        4.53       30  2007-12-15
000004 Snidely           JJ   245765   D        3.45        0  2007-12-13
000005 Norman            CJ   897345   X        5.63       20  2007-12-14
000006 Binsnon           JX   567345   K        4.53       30  2007-12-15
****** ********  End of report  ********                                 

Now, let's try a new wrinkle on UNION. Let's use a format keyword to reformat the unioned data. In the next example, we assume that we have added a new field to the new earnings file, and we want a unioned view of the fields in the OLDMSTR file with the NEWMSTR file. The OLMMSTR file does not contain the new field, LCHDAT. This may be something you would do to accommodate consolidations, etc., so that you can get the known information from two similar groups of records. Take a look at the DDS below and ask yourself if it has a new format; if so, what format is that?

In this example, you see another union, but this time, assume that the files have different formats. Since we still want the records unioned, you are going to have to select a format that has fields that exist in both files to govern the new unioned view.

DDS LF PRUNF - UNION – Format

 A*  Logical FIle PRUNF        
 A          R OLDEARNR                  PFILE(NEWMSTR OLDMSTR)
 A                                      FORMAT(OLDMSTR)       
 A          K EMPNO                                           

This new logical view of data called PRUNF, uses the format of the old earnings file. The LCHDAT field is no tin the OLDMSTR file. Thus, we cannot use the EARNMSTR format for the union, since there is no LCHDAT field in the OLDEARN file.

This new format shares the OLDEARNR format from the old payroll master file. Fields from the new master, which are not in the old format, are not included in the new unioned view, called PRUNF.

Query of Logical Union File PRUNF

       EMPLNAME        EMPINL  ACCT# PAYCOD     RATE    EMPNO 
000001 Snidely           JJ   245765   D        3.45        0 
000002 Waldono           GG   567567   H        5.53        5 
000003 Finstermick       KC   094536   L        4.67       15 
000004 Norman            CJ   897345   X        5.63       20 
000005 Walloby           FM   345298   H        6.89       25 
000006 Binsnon           JX   567345   K        4.53       30 
****** ********  End of report  ********                      




                     Post this story to del.icio.us
               Post this story to Digg
    Post this story to Slashdot


Sponsored By
HELP/SYSTEMS

SEQUEL is the single solution for all your System i business intelligence needs.

                                                          · Executive Dashboards
                                                          · Graphical Query and Reporting
                                                          · Drill-Down Data Analysis
                                                          · Multi-Platform Database Access
                                                          · E-Mail Report and File Distribution
                                                          · Secure Web Access

Rely on SEQUEL to meet all your System i data access requirements.

Visit our Web site at http://www.helpsystems.com/400g


Senior Technical Editor: Ted Holt
Technical Editors: Howard Arner, Joe Hertvik, Shannon O'Donnell, Kevin Vandever
Contributing Technical Editors: Joel Cochran, Wayne O. Evans, Raymond Everhart,
Bruce Guetzkow, Brian Kelly, Marc Logemann, David Morris
Publisher and Advertising Director: Jenny Thomas
Advertising Sales Representative: Kim Reed
Contact the Editors: To contact anyone on the IT Jungle Team
Go to our contacts page and send us a message.

Sponsored Links

ProData Computer Services:  Easy access to remote databases from System i programs
Bug Busters Software Engineering:  High availability software that won't break the bank
COMMON:  Join us at the annual 2008 conference, March 30 - April 3, in Nashville, Tennessee


 

IT Jungle Store Top Book Picks

Getting Started with PHP for i5/OS: List Price, $59.95
The System i RPG & RPG IV Tutorial and Lab Exercises: List Price, $59.95
The System i Pocket RPG & RPG IV Guide: List Price, $69.95
The iSeries Pocket Database Guide: List Price, $59.00
The iSeries Pocket Developers' Guide: List Price, $59.00
The iSeries Pocket SQL Guide: List Price, $59.00
The iSeries Pocket Query Guide: List Price, $49.00
The iSeries Pocket WebFacing Primer: List Price, $39.00
Migrating to WebSphere Express for iSeries: List Price, $49.00
iSeries Express Web Implementer's Guide: List Price, $59.00
Getting Started with WebSphere Development Studio for iSeries: List Price, $79.95
Getting Started With WebSphere Development Studio Client for iSeries: List Price, $89.00
Getting Started with WebSphere Express for iSeries: List Price, $49.00
WebFacing Application Design and Development Guide: List Price, $55.00
Can the AS/400 Survive IBM?: List Price, $49.00
The All-Everything Machine: List Price, $29.95
Chip Wars: List Price, $29.95


 
The Four Hundred
Weak Dollar, Services, and Power6 Give IBM a Solid Fourth Quarter

IBM Aims for Server Expansion in 2008, Including System i Reincarnation

Readers Riff on the 2008 System i Wish List

Sun Casts a $1 Billion Net to Catch MySQL

The Rumor Mill on IBM's Impending Platform Announcements

The Linux Beacon
Dell Launches New, Power-Efficient Blade Servers

Sun Casts a $1 Billion Net to Catch MySQL

Weak Dollar, Services, and Power6 Give IBM a Solid Fourth Quarter

IBM Aims for Server Expansion in 2008, Including System i Reincarnation

The X64 Chip Makers Show Financial Improvement in Q4

Four Hundred Stuff
AURA Equipments Beats IBM to the Punch with DB2/400 Engine for MySQL

IBM Sets Sights on Microsoft and SMB with Linux/Domino Combos

Mirroring/400 Considers Move Into U.S. Market

VAI Debuts System i Consultation Services

RJS Updates Document Management Software

Big Iron
Microsoft to IBM: Tolerate PSI Mainframes or Quit Europe

Top Mainframe Stories From Around the Web

Chats, Webinars, Seminars, Shows, and Other Happenings

System i PTF Guide
January 19, 2008: Volume 10, Number 3

January 12, 2008: Volume 10, Number 2

January 5, 2008: Volume 10, Number 1

December 29, 2007: Volume 9, Number 52

December 22, 2007: Volume 9, Number 51

December 15, 2007: Volume 9, Number 50

The Windows Observer
EU Opens Fresh Antitrust Investigation of Microsoft

Microsoft Rises to Sixth on Patent List for 2007

HP Revamps ProLiant Rack and Tower Servers with New X64 Chips

Parallels Server Goes Beta as SWsoft Hires Microsoft Uber-Techie

Top Executive Raikes to Retire from Microsoft

The Unix Guardian
Sun Casts a $1 Billion Net to Catch MySQL

HP Revamps ProLiant Rack and Tower Servers with New X64 Chips

IBM Gets Clustered Storage and EMC Founder with XIV Buy

As I See It: Weighty Matters

Intel Launches White Box Blades as New York State Probes

Four Hundred Monitor
Four Hundred Monitor's
Full iSeries Events Calendar

THIS ISSUE SPONSORED BY:

Help/Systems
WorksRight Software
Guild Companies


Printer Friendly Version


TABLE OF CONTENTS
A Database Union is Not a Join

More About Blocking

Admin Alert: Before You Buy That New System i, Part 2

Four Hundred Guru

BACK ISSUES

From the IT Jungle Forums
FTP in arrival sequence

S36 environment problem

QSH won't write in batch!

SQL Trigger

usine switches in RPGIII???





 
Subscription Information:
You can unsubscribe, change your email address, or sign up for any of IT Jungle's free e-newsletters through our Web site at http://www.itjungle.com/sub/subscribe.html.

Copyright © 1996-2008 Guild Companies, Inc. All Rights Reserved.
Guild Companies, Inc., 50 Park Terrace East, Suite 8F, New York, NY 10034

Privacy Statement