• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Guru: A Simple, Effective Way To Compare Data Using SQL

    April 20, 2020 Ted Holt

    One thing I have learned over the years is that no matter how much technology changes, people always ask the same questions like: “Why does the computer say we have 50 widgets when we only have 45?” and “Can you tell me who changed the due date of order number 12345?” and “How do sales of widgets this year compare to the same time last year?” It is the last of these that I wish to address today.

    Comparing data is a very common activity. Everybody compares data: sales people, production people, purchasing people, and of course management people. This sort of thing is the reason people set up data warehouses. Fortunately, there is a good formula for answering comparison questions using SQL.

    To compare data, you need five things:

    • two or more data sets
    • a full outer join
    • the COALESCE function
    • a selection of columns
    • an ORDER BY clause

    Let’s look at each one in turn. But first, some illustrative data:

    create table saleshist
      (key        dec(3)    primary key,
       custid     dec(3),
       datesold   date,
       item       char(5),
       quantity   dec(3),
       price      dec(5,2));
       
    insert into saleshist
    (key, custid, datesold, item, quantity, price)
     values
    (  1, 123, '2019-03-01', 'A15', 1, 3.00),
    (  2, 456, '2019-03-01', 'A15', 2, 1.50),
    (  3, 456, '2019-03-05', 'A15', 3, 1.50),
    (  4, 456, '2019-03-05', 'X22', 1, 1.00),
    (  5, 456, '2020-02-08', 'A15', 2, 2.00),
    (  6, 456, '2020-03-08', 'A15', 2, 2.00),
    (  7, 333, '2020-03-03', 'A15', 4, 1.50),
    (  8, 567, '2019-03-10', 'A15', 2, 1.50),
    (  9, 567, '2019-04-01', 'A15', 1, 1.50),
    ( 10, 567, '2020-03-01', 'D44', 2, 2.00);
    

    Suppose you’ve been asked to compare sales of item A15 by customer for March of this year and March of last year. Where do you begin? You begin with two data sets. They may come from two separate tables or sets of tables. They may come from two views. They may be two SELECTS over the same table. They come from wherever the data is stored.

    In my example, the data for both years is stored in the sales history table, SALESHIST, so I use two common table expressions, SALES2019 and SALES2020.

    with sales2019 as
       (select custid, sum(quantity) as qty,
               sum(quantity * price) as sales
          from saleshist
         where item = 'A15'
           and datesold between '2019-03-01' and '2019-03-31'
         group by custid),
    sales2020 as     
       (select custid, sum(quantity) as qty,
               sum(quantity * price) as sales
          from saleshist
         where item = 'A15'
           and datesold between '2020-03-01' and '2020-03-31'
         group by custid)
    . . . more to come ! ! ! . . .
    

    Comparing data means that the two data sets need to be joined. A customer might have bought item A15 in only one year or in both years. This means that we must use a full outer join to get all the data.

    select . . . 
      from sales2019 as s19
      full outer join sales2020 as s20
        on s19.custid = s20.custid     
    

    That’s fabulous, but it doesn’t give the user any information. Let’s begin by listing all the summary columns. In this example, the only summary column is customer ID. But which customer ID do we get—the one from SALES2019 or the one from SALES2020? The only logical answer, of course, is “Yes!” Some customers bought item A15 both years. Others bought in 2019 or 2020, but not both years. The COALESCE function gives us the customer number no matter what.

    select coalesce(s20.custid, s19.custid) as custid,
           . . . more columns . . .
      from sales2019 as s19
      full outer join sales2020 as s20
        on s19.custid = s20.custid
    

    COALESCE returns the first non-null value in the list. We get the customer number whether it’s found in one data set or both data sets. Be sure to use COALESCE over all of the summary columns.

    So far, so good, but the user needs sales figures — how many each customer bought and how much money they paid us.

    select coalesce(s20.custid, s19.custid) as custid,
           s20.qty as "Qty 2020", s19.qty as "Qty 2019",
           s20.sales as "Sales 2020", s19.sales as "Sales 2019"
      from sales2019 as s19
      full outer join sales2020 as s20
        on s19.custid = s20.custid
    

    I’ve grouped two the two quantity fields and the two sales figures, but how you group the columns depends on the task. Most likely you’ll give the output of such queries to people in Excel, and they’ll rearrange them as they want them.

    The last thing to do is to present the result set in a logical sequence.

      order by 1
    

    Use the ORDER BY clause to sort the data any way you need to. The sequence will often be the summary columns, but that’s not necessarily so. In this example, ORDER BY 1 means to sort the result set on the first column, which is the customer ID.

    Putting it all together, we have this query:

    with sales2019 as
       (select custid, sum(quantity) as qty,
               sum(quantity * price) as sales
          from saleshist
         where item = 'A15'
           and datesold between '2019-03-01' and '2019-03-31'
         group by custid),
    sales2020 as     
       (select custid, sum(quantity) as qty,
               sum(quantity * price) as sales
          from saleshist
         where item = 'A15'
           and datesold between '2020-03-01' and '2020-03-31'
         group by custid)
    select coalesce(s20.custid, s19.custid) as custid,
           s20.qty as "Qty 2020", s19.qty as "Qty 2019",
           s20.sales as "Sales 2020", s19.sales as "Sales 2019"
      from sales2019 as s19
      full outer join sales2020 as s20
        on s19.custid = s20.custid     
      order by 1
    

    And this result set:

    CUSTID Qty 2020 Qty 2019 Sales 2020 Sales 2019
    123 – 1 – 3.00
    333 4 – 6.00 –
    456 2 5 4.00 7.50
    567 – 2 – 3.00

    It’s a very simple formula, as simple as a pencil, which also has five components. Don’t let the simplicity fool you — it’s a powerful tool to have in your toolbox.

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags: Tags: 400guru, FHG, Four Hundred Guru, IBM i, SQL

    Sponsored by
    Chordia Consulting

    Most IBM i organizations “get the job done” supporting their business users. Systems are available, data is stored and analyzed, applications are created and enhanced, networks run, servers hum, users seem to be happy, and all is right with their IBM i world.

    But is it?

    Very often, behind-the-scenes the reality is quite different. You may have aging, hard-to-maintain servers; applications may be strung together in ways that no one really understands (especially if something goes wrong); there may be security and compliance risks; perhaps, too much money is being wasted on the wrong things rather than being invested in the right things. If you are a new to IT leadership, you may be unfamiliar with your IT environment and its history and be vulnerable to – but still accountable for – hidden problems.

    In a recent IT Jungle article, the author proposes “Sending Your IT Department To The Teledoctor For A Checkup” as a way to find those hidden problems. Of course there is no substitute for an in-person visit to a doctor. But, when a visit may be too difficult or otherwise not be possible – as is the case today – access to an on-line diagnostic “doctor” can be the next best thing. Chordia Consulting’s Rapid Algorithmic IT Healthcheck (RAITH™) is an on-line “doctor” that provides an objective, high-level diagnostic checkup of an IT organization.

    Many of the interview questions a doctor would ask have specific answers. RAITH’s healthcheck also asks simple questions with specific answers. The analysis based on this on-line “interview” can provide you with a preliminary diagnosis, a first look at where your prioritized opportunities may be. Then you can follow-up with more in-depth review with your own team or with Chordia Consulting or other trusted advisors.

    Why is Chordia’s on-line RAITH service the best choice for many IBM i organizations around the world? It’s simple to conduct, it’s inexpensive, it can be delivered totally remotely, it’s unobtrusive, and it produces results fast, allowing you to tackle problem areas more quickly.

    You can learn more about RAITH for IBM i at https://www.chordiaconsulting.com/raith-for-ibm-i.

    Or you can contact Chordia Consulting using our on-line chat service on our website, via email at info@chordiaconsulting.com, or by phone at 585-210-8002.

    RAITH is also available for purchase by credit card or PayPal at www.chordiaconsulting.com/buy.

    Don’t forget, Chordia is offering a special discount for IT Jungle readers. Just go to www.chordiaconsulting.com/buy and use the discount code RAITH1/3ITJungleGHC at checkout.

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    COVID-19 Response: Manta Cuts Training Prices, Offers Freebies Database Enhancements Galore In Technology Refresh

    One thought on “Guru: A Simple, Effective Way To Compare Data Using SQL”

    • slashsplat says:
      April 28, 2020 at 2:35 pm

      Great, simple idea, well presented. Thanks.

      Reply

    Leave a Reply Cancel reply

TFH Volume: 30 Issue: 26

This Issue Sponsored By

  • Fresche Solutions
  • ProData Computer Services
  • Chordia Consulting
  • WorksRight Software
  • Raz-Lee Security

Table of Contents

  • IBM Grants Amnesty On Software Maintenance After License Charges
  • Database Enhancements Galore In Technology Refresh
  • Guru: A Simple, Effective Way To Compare Data Using SQL
  • COVID-19 Response: Manta Cuts Training Prices, Offers Freebies
  • IBM i PTF Guide, Volume 22, Number 16

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • IBM Extends Dynamic Capacity Pricing Scheme To Its Cloud
  • Here’s What You Should Do About The IBM i Skills Shortage
  • Matillion Founder Recounts Midrange Roots
  • Four Hundred Monitor, February 24
  • IBM i PTF Guide, Volume 23, Number 8
  • iTech Solutions Keeps You In The Know With VERIFi
  • Tech Data’s Take On Certified Pre-Owned IT Gear
  • Guru: Prompting Stored Procedures
  • As I See It: Sunshine Or Oxygen?
  • Looking For Some Insight On IBM i Security

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2021 IT Jungle

loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.