• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Using UNION for Data Analysis

    March 24, 2010 Skip Marchesani

    To do a quick review of my previous article, UNION is very useful when an SQL statement or query must operate on two or more tables and JOIN cannot be used to produce the desired result set table. For example, if you have multiple history tables–one for each specific time period (year, month, week, etc.), and their record formats are similar and compatible, a union is a good way to query and combine two or more of these tables to derive a single, final result set table using SQL.

    Several months ago, a friend sent me the spreadsheet shown below. I’ve included only the beginning and end of spreadsheet. Creating the spreadsheet manually was taking a lot of time, and he wanted to know if he could use SQL to automate the creation.

    Country         State   2006    2007    2008   2009     Total   Percent
    USA               	CA    400     739      728    758      2625	7.7886
    USA               	IL    534      87     1466    239      2326	6.9014
    USA               	NY    858     838      228    169      2093	6.2101
    …
    Virgin Islands 	        0       0        0      1	      1	0.0029
    Yemen	                 0       0        0      1	      1	0.0029
    

    The purpose of this spreadsheet is to show an analysis of the total number of sales or order transactions (not dollar amounts) by state within country; and for each country/state combination show the four previous years (in this case 2006, 2007, 2008, and 2009), the total number of orders for the country/state combination, and the percentage of all orders.

    There were four separate history tables that had to be analyzed–one for each year (2006, 2007, 2008, and 2009), which my friend had included when he sent the spreadsheet. There was one row in each table that summarized each distinct customer order, and customer country and state were part of the contact information for each order.

    My first thought was: “No way!”

    But always enjoying a good challenge, I decided to give it a try, and was able to do it by using UNION in conjunction with the derived table capability in DB2 for i. I used Run SQL Scripts, my SQL interface of choice for this type of experimentation, and I am currently running V6R1 of Navigator in conjunction with an IBM i running V5R4 of i5/OS.

    After a couple of false starts, step one was to cobble together the following SQL query (i.e., SELECT statement) that had a final result set table in the format I had determined I needed. This query returns one row with a count for each country/state/year combination in the final result set table.

    I used UNION ALL to combine the four history tables because all rows in the intermediate result set tables for the four history tables, including duplicates, needed to be included in the final result set table. A subset of the final result set table is shown after the SQL query.

    SELECT contry, state, COUNT(*) AS S06, 0 AS S07, 0 AS S08, 0 AS S09
           	FROM orders2006
      GROUP BY contry, state
                 UNION ALL
        SELECT contry, state, 0 AS S06, COUNT(*) AS S07, 0 AS S08, 0 AS S09
               	FROM orders2007
      GROUP BY contry, state 
                 UNION ALL
        SELECT contry, state, 0 AS S06, 0 AS S07, COUNT(*) AS S08, 0 AS S09
               	FROM orders2008
      GROUP BY contry, state
                 UNION ALL
        SELECT contry, state, 0 AS S06, 0 AS S07, 0 AS S08, COUNT(*) AS S09
               	FROM orders2009
      GROUP BY contry, state
            ORDER BY contry, state
    

    Subset of the final result set table for above query:

    Country	State	S06      S07	S08	S09
    USA        CA     400        0        0        0
    USA        CA       0      739        0        0
    USA        CA       0        0      728        0
    USA        CA       0        0        0      758
    

    The above is a subset of the final result set for the SQL query preceding it, and shows only data for the state of California in the U.S.A. (there are many more rows for various country/state combinations). Even though each row has a column for the number of orders for each year, the orders for each year are in a separate row. There is one row each for history table year–2006, 2007, 2008, and 2009–and where there is a total for one specific year and the rest of the totals for the other three years in that row are zero.

    There are four SELECT statements in the SQL query–one for each year–and they are combined by the three UNION ALL clauses. Each SELECT statement has CONTRY, STATE, and a column for each year (S06 = 2001, S07 = 2007, S08 = 2008, and S09 = 2009) in the select list; but the number of orders for only one year is summarized (COUNT(*) and GROUP BY) in each of the four SELECT statements (one SELECT statement for 2006, one for 2007, one for 2008, and one for 2009).

    By formatting the four SELECT statements in this manner, I got the row format that I needed, and the result set table for each of the SELECT statements has the same format (number and type of columns). This satisfies the UNION requirement that the result set table for the first SELECT statement have the same number of columns as the result set table for the second SELECT statement, which must have the same number of columns as the result set for the third SELECT statement, and so on.

    The next step is to modify the SQL query so that the separate rows for the orders for each country/state/year combination are summarized into a single row for each country/state combination. This means that the total number of orders for each year will be in a separate column instead of a separate row, and final the result set table will look as follows for the state of California.

    Country       State     2006     2007     2008     2009     Total
    USA            CA        400      739      728      758      2625
    

    I took advantage of the derived table capability in DB2 for i to accomplish the above. I used the SQL query from the first step (four SELECT statements and three UNION clauses) without the ORDER BY clause for the table derivation.

    This SQL query with derived table is shown below, with the derived table portion of the SQL query annotated, and the first five rows of the final result set table following the SQL query. Note that the derived table statements directly follow the FROM clause for the SELECT statement and must include an AS at the end of the derived table statements to name the derived table–in this case it’s named ORDERS

    SELECT contry AS Country, state, SUM(S06) AS "2006",
               SUM(S07) AS "2007", SUM(S08) AS "2008", SUM(S09) AS "2009",
               (SUM(S06) + SUM(S07) + SUM(S08) + SUM(S09)) AS total
         FROM
    
    -- Begin Derived Table
    
      (SELECT contry, state, COUNT(*) AS S06, 0 AS S07, 0 AS S08, 0 AS S09
          FROM orders2006
        GROUP BY contry, state
            UNION ALL
       SELECT contry, state, 0 AS S06, COUNT(*) AS S07, 0 AS S08, 0 AS S09
          FROM orders2007
        GROUP BY contry, state
            UNION ALL
       SELECT contry, state, 0 AS S06, 0 AS S07, COUNT(*) AS S08, 0 AS S09
          FROM orders2008
        GROUP BY contry, state
           UNION ALL
       SELECT contry, state, 0 AS S06, 0 AS S07, 0 AS S08, COUNT(*) AS S09
          FROM orders2009
         GROUP BY contry, state)
    	      
             AS orders
    
    -- End Derived Table
    
         GROUP BY contry, state
         ORDER BY total DESC, contry, state;
    

    The first five rows of the final result set table produced by this SQL query are shown below.

    Country   State   2006   2007   2008   2009   Total
    USA          CA    400    739    728    758    2625
    USA          IL    534     87   1466    239    2326
    USA          NY    858    838    228    169    2093
    USA          FL    302    507    306    450    1565
    USA          MA    711    414    142    258    1525
    

    Conceptually the SQL query from the first step is used to derive or build a single use (temporary) table on the fly. The SELECT statement that begins prior to the derived table statements (shown below and reformatted to make it easier to read) queries and summarizes the rows in the derived table and then orders them in the desired sequence.

    SELECT  contry AS Country, 
        state, 
        SUM(S06) AS "2006",
                    SUM(S07) AS "2007", 
                    SUM(S08) AS "2008", 
                    SUM(S09) AS "2009", 
                    (SUM(S06) + SUM(S07) + SUM(S08) + SUM(S09)) AS total
         FROM
           + ------------------------------------------------ +
           | insert the derived table statements from the SQL |
           | query from step one                              |
           + ------------------------------------------------ +
         GROUP BY contry, state
         ORDER BY total DESC, contry, state
    

    This means that the final result set table contains one row for each country/state combination that contains the following seven columns: CONTRY a.k.a., COUNTRY, STATE, 2006, 2007, 2008, 2009 (the total number of orders for each year), and TOTAL (the total number of orders for all four years).

    Note that in the SELECT statement the double quotes (“) are required around each numeric year to tell SQL that this is a column name and not a numeric literal.

    The summarization for this SELECT statement by country and state, is done in the GROUP BY clause after the derived table statements. The ORDER BY clause following the GROUP BY clause provides the ordering criteria for the rows in the final result set table–total orders (in descending sequence), then country, then state (both in ascending sequence).

    The last step is to calculate the percentage for the total orders for a specific country and state in relation to the total orders for all countries and states.

    The mathematical formula to do this is:

    Country and state order percentage =

    (2006 orders + 2007 orders + 2008 orders + 2009 orders) for specific country and state

    Divided by

    (total 2006 orders + total 2007 orders + total 2008 orders + total 2009 orders) multiplied by 100

    The SQL syntax for this formula when used within the previous SELECT statement follows below:

     ((SUM(S06) + SUM(S07) + SUM(S08) + SUM(S09))
                   /
    ((SELECT COUNT(*) FROM orders2006) +
      (SELECT COUNT(*) FROM orders2007) +
      (SELECT COUNT(*) FROM orders2008) +
      (SELECT COUNT(*) FROM orders2009)) * 100) AS percent
      

    When inserted into the SELECT statement the revised syntax for the SELECT statement (still reformatted for readability) looks as follows:

    SELECT  contry AS Country, 
        state, 
        SUM(S06) AS "2006",
                    SUM(S07) AS "2007",
                    SUM(S08) AS "2008",
                    SUM(S09) AS "2009",
                    (SUM(S06) + SUM(S07) + SUM(S08) + SUM(S09)) AS total,
                    ((SUM(S06) + SUM(S07) + SUM(S08) + SUM(S09))
                           /
                    ((SELECT COUNT(*) FROM orders2006) +
                     (SELECT COUNT(*) FROM orders2007) +
                     (SELECT COUNT(*) FROM orders2008) +
                     (SELECT COUNT(*) FROM orders2009)) * 100) AS percent
         FROM
    	derived table statements - SQL query from step one
         GROUP BY contry, state
         ORDER BY total DESC, contry, state   
    

    And, the entire SQL query, including the derived table statements looks as shown below, with the first five rows in the final result set table following the SQL Query.

    SELECT  contry AS Country,
        state,
        SUM(S06) AS "2006",
                    SUM(S07) AS "2007",
                    SUM(S08) AS "2008",
                    SUM(S09) AS "2009",
                    (SUM(S06) + SUM(S07) + SUM(S08) + SUM(S09)) AS total,
                    ((SUM(S06) + SUM(S07) + SUM(S08) + SUM(S09))
                           /
                      ((SELECT COUNT(*) FROM orders2006) +
                       (SELECT COUNT(*) FROM orders2007) +
                       (SELECT COUNT(*) FROM orders2008) +
                       (SELECT COUNT(*) FROM orders2009)) * 100) AS percent
          FROM
    -- Begin Derived Table 
      (SELECT contry, state, COUNT(*) AS S06, 0 AS S07, 0 AS S08, 0 AS S09
                  FROM orders2006
               GROUP BY contry, state
                      UNION ALL
        SELECT contry, state, 0 AS S06, COUNT(*) AS S07, 0 AS S08, 0 AS S09
                     FROM orders2007
               GROUP BY contry, state 
                      UNION ALL
        SELECT contry, state, 0 AS S06, 0 AS S07, COUNT(*) AS S08, 0 AS S09
                     FROM orders2008
               GROUP BY contry, state
                      UNION ALL
        SELECT contry, state, 0 AS S06, 0 AS S07, 0 AS S08, COUNT(*) AS S09
                     FROM orders2009
               GROUP BY contry, state)
              AS orders
    -- End Derived Table
         GROUP BY contry, state
         ORDER BY total DESC, contry, state
    

    The first five rows of the final result set table produced by this SQL query are shown below.

    Country   State   2006   2007   2008   2009   Total   Percent
    USA         CA     400    739    728    758    2625    7.7886
    USA         IL     534     87   1466    239    2326    6.9014
    USA         NY     858    838    228    169    2093    6.2101
    USA         FL     302    507    306    450    1565    4.6435
    USA         MA     711    414    142    258    1525    4.5248
    

    This is exactly the result set my friend was looking for. Since I developed this SQL query using Run SQL Scripts in Navigator, the SQL query can be saved as an SQL script, sent to my friend, and he can execute it when needed.

    Better still, since he is also running V6R1 of Navigator, he also can use Run SQL Scripts to run the SQL query, and with a couple of clicks can save the final result set table as an Excel spreadsheet–exactly what my friend wanted to do.

    How long did it take me to develop the SQL Query? My false starts were spread over a couple of days, but once I got headed in the right direction it only took me between one and two hours of experimentation from start finish to develop the correctly working SQL query.

    Did UNION play a significant part in providing the solution to this SQL query requirement? The answer is YES. UNION provided the capability to combine the four order history tables into a single result set table, and in the process define the number and type of columns in the result set table to satisfy the requirements of the spreadsheet format. If UNION were not an option in DB2 for i, the solution could not have been provided using a single SQL query and would instead have required multiple SQL queries.

    Skip Marchesani retired from IBM after 30 years and is now a consultant with Custom Systems Corporation. He is also a founding partner of System i Developer and the RPG & DB2 Summit. Skip spent much of his IBM career working with the Rochester Development Lab on projects for S/38 and AS/400 and was involved with the development of the AS/400. He was part of the team that taught early AS/400 education to customers and IBM lab sites worldwide. Skip is recognized as an industry expert on DB2 for i and the author of the book DB2/400: The New AS/400 Database. He specializes in providing customized education for any area of the System i, iSeries, and AS/400; does database design and design reviews; and performs general System i, iSeries, and AS/400 consulting for interested clients. He has been a speaker for user groups, technical conferences, and System i, iSeries, and AS/400 audiences around the world. He is an award-winning COMMON speaker and has received its Distinguished Service Award. Send your questions or comments for Skip to Ted Holt via the IT Jungle Contact page.

    RELATED STORIES

    The State of the UNION

    Grouping a Union

    A Database Union is Not a Join

    Weird SQL Union Results



                         Post this story to del.icio.us
                   Post this story to Digg
        Post this story to Slashdot

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags:

    Sponsored by
    Maxava

    Migrate IBM i with Confidence

    Tired of costly and risky migrations? Maxava Migrate Live minimizes disruption with seamless transitions. Upgrading to Power10 or cloud hosted system, Maxava has you covered!

    Learn More

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Sponsored Links

    Help/Systems:  Robot/CONSOLE monitors System i resources automatically
    Northeast User Groups Conference:  20th Annual Conference, April 12 - 14, Framingham, MA
    COMMON:  Join us at the annual 2010 conference, May 3 - 6, in Orlando, Florida

    IT Jungle Store Top Book Picks

    Easy Steps to Internet Programming for AS/400, iSeries, and System i: List Price, $49.95
    The iSeries Express Web Implementer's Guide: List Price, $49.95
    The System i RPG & RPG IV Tutorial and Lab Exercises: List Price, $59.95
    The System i Pocket RPG & RPG IV Guide: List Price, $69.95
    The iSeries Pocket Database Guide: List Price, $59.00
    The iSeries Pocket SQL Guide: List Price, $59.00
    The iSeries Pocket Query Guide: List Price, $49.00
    The iSeries Pocket WebFacing Primer: List Price, $39.00
    Migrating to WebSphere Express for iSeries: List Price, $49.00
    Getting Started With WebSphere Development Studio Client for iSeries: List Price, $89.00
    Getting Started with WebSphere Express for iSeries: List Price, $49.00
    Can the AS/400 Survive IBM?: List Price, $49.00
    Chip Wars: List Price, $29.95

    Construction App from CGC Gets a Web Portal Source Technologies Now Supports System i with MICR Printer

    Leave a Reply Cancel reply

Volume 10, Number 11 -- March 24, 2010
THIS ISSUE SPONSORED BY:

SEQUEL Software
Northeast User Groups Conference
Twin Data Corporation

Table of Contents

  • Using UNION for Data Analysis
  • Include C Utilities in Your Developer Library: Evaluating a Mathematical String Expression
  • Internet Explorer 8 Doesn’t Support HMC V7. . . or Does It?

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • POWERUp 2025 –Your Source For IBM i 7.6 Information
  • Maxava Consulting Services Does More Than HA/DR Project Management – A Lot More
  • Guru: Creating An SQL Stored Procedure That Returns A Result Set
  • As I See It: At Any Cost
  • IBM i PTF Guide, Volume 27, Number 19
  • IBM Unveils Manzan, A New Open Source Event Monitor For IBM i
  • Say Goodbye To Downtime: Update Your Database Without Taking Your Business Offline
  • i-Rays Brings Observability To IBM i Performance Problems
  • Another Non-TR “Technology Refresh” Happens With IBM i TR6
  • IBM i PTF Guide, Volume 27, Number 18

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle