A Database Union is Not a Join
January 23, 2008 Hey, Brian
We recently acquired a company that has had its own payroll system for many years. We are going to combine the two payrolls starting next year but in the meantime, we would like to be able to use some of our internal cost analysis programs on the combined data without combining the two files.
We have created an earning master physical file in which we are staging the data that we will use to merge the two systems next year. This file has the exact shape as our current payroll master file. Without creating another file and physically merging them, thus creating another burden for our staff, I would like to “join” all the records using DDS into one file and override the report programs so that they use this logical file for reporting. However, when I try to join them, I do not get all the records. In fact, depending on which file I specify first in the join, I get exactly that many records in my output view. Offhand, do you know what I am doing wrong?
E. F. “Ted” Codd, the designer of relational database, defined several relational operations. The select operation limits the rows (records) in a view. The project operation limits the columns (fields) in a view. The join is also a powerful operation that combines data from matching records in multiple files. However, the join does not have the ability to take two separate record sets and merge them into one file, such that the number of records in the merged file equals the total number of records in both files. The merge operation as defined by Ted Codd is called union.
Though there is no UNION keyword in DDS, there is a way to code for UNION. In the next two examples, you have an old payroll master file and a new master file. In essence this is your situation. Your objective is to marry (logically merge or union) the two files, using fields that are common to both of them.
Unlike a join, the result of a union is a view of similar data in multiple files as if the data were all lumped together, and then sorted by the union key level field. In other words, if there were a three-file union, and each of the three physical files had 33 records, the union view would have 99 records–all sorted together like one big homogenous file view.
Now, contrast union with join. For example, let’s say you joined the same three physical files. Assume, for this instance, that you know that the data is perfect and that the resulting join would produce a logical file that was really a natural, inner, and matching join (three names for the same thing). In other words, there would be a matching record in each of the secondaries (underlying physical files), for each primary record. In other words again, both secondary physical files would each have one data record to supply for each primary record presented by the join logical view.
If this were all true, then after the join, the resulting view would not show 99 records, as in the union example. Instead it would show just 33 records. As a friend of mine would say, “‘combinatorically’ speaking, the join combines the three records into one record.” Therefore, each record would show a piece (one or several fields) from each of the physical files that were joined.
The following is an example of how to code a union for your shop. First we show the two physical files, NEWMSTR and OLDMSTR and then the test data in the files and then the join with some of the fields.
A* PAYROLL MASTER NEWMSTR A R EARNMSTR ... TEXT('EARNINGS MASTER') A EMPLNAME 25 A EMPINL 2 A ACCT# 6 0 A PAYCOD 1 A RATE 6S 2 A EMPNO 6 0 A LCHDAT L ... TEXT('ADDED FIELD') A K EMPNO
A* PAYROLL MASTER OLDMSTR A R OLDEARNR ... TEXT('OLD EARN MASTER') A EMPLNAME 25 A EMPINL 2 A ACCT# 6 0 A PAYCOD 1 A RATE 6S 2 A EMPNO 6 0 A K EMPNO
Query Against NEWMSTR
Line ....+....1....+....2....+....3....+....4....+....5....+....6....+. EMPLNAME EMPINL ACCT# PAYCOD RATE EMPNO LCHDAT 000001 Snidely JJ 245765 D 3.45 0 2007-12-13 000002 Norman CJ 897345 X 5.63 20 2007-12-14 000003 Binsnon JX 567345 K 4.53 30 2007-12-15 ****** ******** End of report ********
Query Against OLDMSTR
Line ....+....1....+....2....+....3....+....4....+....5....+ EMPLNAME EMPINL ACCT# PAYCOD RATE EMPNO 000001 Waldono GG 567567 H 5.53 5 000002 Finstermick KC 094536 L 4.67 15 000003 Walloby FM 345298 H 6.89 25 ****** ******** End of report ********
DDS LF – UNION – Some Fields
A* LOGICAL FILE (PRUNSOME) A* A R PRUNSOMR ... PFILE(NEWMSTR OLDMSTR) A EMPLNAME A EMPINL A RATE A EMPNO A PAYCOD K EMPNO
Query Against PRUNSOME
Line ....+....1....+....2....+....3....+....4....+... EMPLNAME EMPINL RATE EMPNO PAYCOD 000001 Snidely JJ 3.45 0 D 000002 Waldono GG 5.53 5 H 000003 Finstermick KC 4.67 15 L 000004 Norman CJ 5.63 20 X 000005 Walloby FM 6.89 25 H 000006 Binsnon JX 4.53 30 K ****** ******** End of report ********
Union takes two almost identically structured files and combines them. A good DDS coding example is that shown above, in which there are new or current masters (NEWMSTR) and old masters (OLDMSTR). The new, combined, view of the data (PRUNSOME logical file) includes only the named (projected) fields from all records in both the old Payroll Earnings Master (OLDMSTR) and the new Payroll Earnings Master (NEWMSTR).
Overall, up to 32 files can be unioned. Though there are two files defined in the PFILE, in this example, without library qualifiers, this is not a join, since there is no Join Record Format (J in column 17). Each record contains data from just one physical file, not both–as in a JOIN. A key field (EMPNO in this case) must be specified in a union. The key field sorts the records in this case by employee number. Note in the PRUNSOME query output that the records from the NEWMSTR and OLDMSTR files are interleaved due to the sort on EMPNO.
The trick is to use non-join file DDS and to specify both files using the PFILE keyword as shown in the example above. It provides the same facility as physically merging the files and when you override your reporting analysis programs to this file, as long as the fields are exposed, both companies’ data will be available for management analysis. Hope this does the trick for you.
Here are a few other examples of union using DDS.
DDS for PRUNALL Logical File
A* UNION A FILE WITH ITSELF -- UNION LOGICAL FILE PRUNALL A* A R EARNMSTR PFILE(NEWMSTR NEWMSTR) A K EMPNO
This union is with the fields from all records in both the “old payroll master” and the “new payroll master.” Notice the format name is EARNMSTR. This says that the union will use the format (all the fields) of the new master file. There cannot be any extra fields in NEWMSTR, compared to the second file, whether it is OLDMSTR or another file or another reference to NEWMSTR as in the example above. Since all fields in NEWMSTR also exist in the second reference to NEWMSTR, this is a valid union. If there are any field differences in the two or more files involved in the join, the logical file will not compile and it will produce severe error messages, and thus will not be created. The second copy of NEWMSTR can be replaced in the above union by a different physical file with identical fields as NEWMSTR and this will work the same. However, OLDMSTR cannot be used in this example since it does not have the LCHDAT field as does the NEWMSTR.
Note that the data from NEWMSTR is repeated twice when the new logical file PRUNALL is queried as shown below:
Query of Logical Union FIle PRUNALL
EMPLNAME EMPINL ACCT# PAYCOD RATE EMPNO LCHDAT 000001 Snidely JJ 245765 D 3.45 0 2007-12-13 000002 Norman CJ 897345 X 5.63 20 2007-12-14 000003 Binsnon JX 567345 K 4.53 30 2007-12-15 000004 Snidely JJ 245765 D 3.45 0 2007-12-13 000005 Norman CJ 897345 X 5.63 20 2007-12-14 000006 Binsnon JX 567345 K 4.53 30 2007-12-15 ****** ******** End of report ********
Now, let’s try a new wrinkle on UNION. Let’s use a format keyword to reformat the unioned data. In the next example, we assume that we have added a new field to the new earnings file, and we want a unioned view of the fields in the OLDMSTR file with the NEWMSTR file. The OLMMSTR file does not contain the new field, LCHDAT. This may be something you would do to accommodate consolidations, etc., so that you can get the known information from two similar groups of records. Take a look at the DDS below and ask yourself if it has a new format; if so, what format is that?
In this example, you see another union, but this time, assume that the files have different formats. Since we still want the records unioned, you are going to have to select a format that has fields that exist in both files to govern the new unioned view.
DDS LF PRUNF – UNION – Format
A* Logical FIle PRUNF A R OLDEARNR PFILE(NEWMSTR OLDMSTR) A FORMAT(OLDMSTR) A K EMPNO
This new logical view of data called PRUNF, uses the format of the old earnings file. The LCHDAT field is no tin the OLDMSTR file. Thus, we cannot use the EARNMSTR format for the union, since there is no LCHDAT field in the OLDEARN file.
This new format shares the OLDEARNR format from the old payroll master file. Fields from the new master, which are not in the old format, are not included in the new unioned view, called PRUNF.
Query of Logical Union File PRUNF
EMPLNAME EMPINL ACCT# PAYCOD RATE EMPNO 000001 Snidely JJ 245765 D 3.45 0 000002 Waldono GG 567567 H 5.53 5 000003 Finstermick KC 094536 L 4.67 15 000004 Norman CJ 897345 X 5.63 20 000005 Walloby FM 345298 H 6.89 25 000006 Binsnon JX 567345 K 4.53 30 ****** ******** End of report ********