• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Digging Out Data Duplication

    November 6, 2013 Hey, Ted

    Sometimes an SQL query should return one row, yet it returns more than one. The problem turns out to be multiple matching rows in a secondary table. Is there a way to easily isolate the secondary table that causes more than one match?

    –J

    Yes, there is an easy way. But first, let me set up the problem for the edification of other readers.

    Sometimes we execute an SQL query with the expectation that the result set will contain only one row (record), and we are surprised to get back two or more rows instead. At least one table (physical file)–usually one of the secondary tables–has more than one row that matches the join criteria. (I use the word “table” here to mean either table or view.)

    I have noticed two main causes for this behavior.

    1. Someone keyed duplicate information into the database. For example, a table that was loaded from a spreadsheet contains two rows for the same customer, vendor, item, or what have you, when it should contain only one.
    2. The join criteria are insufficient. For instance, the fallible human being who wrote the query joined two tables on a common customer number, but should have joined them on common customer number and company number.

    Finding the problem is not usually trivial because production queries often join several tables, and it’s common for some of the join criteria in a query to involve only secondary tables.

    A simple way to find the culprit is to use the RRN (relative record number) function. Here’s a simple illustration of the technique.

    Assume an item master table. It is uniquely keyed on item number, of course. Assume two other temporary tables that were loaded from spreadsheets. One temporary table has new prices for some items. The other has new descriptions for some items. We expect only one row per item in each table, but people, being the imperfect beings they are, may accidentally load more than one row for an item.

    Here’s a query that should return one row for item BR-549.

    select m.item, np.price, nd.descr
      from itemmast as m
     left join newprices as np
        on m.item = np.item
     left join newdesc as nd
        on m.item = nd.item
     where m.item = 'BR-549'
    

    And here is the result set.

    ITEM     PRICE   DESCR 
    BR-549    5.00   Widget
    BR-549    5.00   Widget
    

    To locate the duplication, add RRN functions for the tables.

    select m.item, rrn(m), rrn(np), rrn(d)
      from itemmast as m
      left join newprices as np
        on m.item = np.item
      left join newdesc as d
        on m.item = d.item
    where m.item = 'BR-549'
    

    And now the verdict:

    ITEM    RRN ( M )  RRN ( NP )  RRN ( D )
    BR-549         2           2          2
    BR-549         2           3          2
    

    Look at the RRN columns. The values are different in the third column, therefore the new prices table is the problem.

    select * from newprices as np
    where np.item = 'BR-549'
    
    ITEM     PRICE
    BR-549    5.00
    BR-549    5.00
    

    –Ted



                         Post this story to del.icio.us
                   Post this story to Digg
        Post this story to Slashdot

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags:

    Sponsored by
    GiAPA – The IBM i Developer’s Best Friend

    Want to Speed Up Your IBM i Applications?

    GiAPA pinpoints where performance can be optimized – down to program statements.

    First performance tips free!

    Highlights from www.GiAPA.com:

    • Automatic analysis of all applications
    • Total potential time savings shown
    • Finds optimizations – even in applications believed to run OK
    • Uses <0.1% CPU
    • Free Trial

    2-minute Intro Video    

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Sponsored Links

    Essextec:  Quick Security Check to analyze the 500 most vulnerable data points on your IBM i
    Bug Busters Software Engineering:  RSF-HA keeps you going while it saves you a bundle
    Secure Infrastructure & Services:  FREE white paper: "9 Reasons IBM Sees a Shift to the Cloud"

    More IT Jungle Resources:

    System i PTF Guide: Weekly PTF Updates
    IBM i Events Calendar: National Conferences, Local Events, and Webinars
    Breaking News: News Hot Off The Press
    TPM @ EnterpriseTech: High Performance Computing Industry News From ITJ EIC Timothy Prickett Morgan

    Allow Repeated Change With Before Triggers Admin Alert: The 4 GB Access Path Size Time Bomb

    Leave a Reply Cancel reply

Volume 13, Number 21 -- November 6, 2013
THIS ISSUE SPONSORED BY:

Robot
WorksRight Software
ASNA

Table of Contents

  • Allow Repeated Change With Before Triggers
  • Digging Out Data Duplication
  • Admin Alert: The 4 GB Access Path Size Time Bomb

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • No Joke: Big Memory And Flash Price Hikes Coming April 1
  • Strategic Topics To Think About For 2026, Part 2
  • Guru: IBM i Job Log Detective Brings Structure To Job Log Analysis In VS Code
  • IBM Launches Hybrid Cloud Backup Product With Cobalt Iron
  • IBM i PTF Guide, Volume 28, Number 10
  • Why You Need To Think About Offsite Data Protection
  • IBM Gets Bob 1.0 Off The Ground
  • You Store The Crown Jewels In A Safe, Not In A Bucket
  • More Power Systems Withdrawals, And Some From Red Hat, Too
  • Price Increases Are Here, Or Pending, And For Sure For Memory

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle