• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Digging Out Data Duplication

    November 6, 2013 Hey, Ted

    Sometimes an SQL query should return one row, yet it returns more than one. The problem turns out to be multiple matching rows in a secondary table. Is there a way to easily isolate the secondary table that causes more than one match?

    –J

    Yes, there is an easy way. But first, let me set up the problem for the edification of other readers.

    Sometimes we execute an SQL query with the expectation that the result set will contain only one row (record), and we are surprised to get back two or more rows instead. At least one table (physical file)–usually one of the secondary tables–has more than one row that matches the join criteria. (I use the word “table” here to mean either table or view.)

    I have noticed two main causes for this behavior.

    1. Someone keyed duplicate information into the database. For example, a table that was loaded from a spreadsheet contains two rows for the same customer, vendor, item, or what have you, when it should contain only one.
    2. The join criteria are insufficient. For instance, the fallible human being who wrote the query joined two tables on a common customer number, but should have joined them on common customer number and company number.

    Finding the problem is not usually trivial because production queries often join several tables, and it’s common for some of the join criteria in a query to involve only secondary tables.

    A simple way to find the culprit is to use the RRN (relative record number) function. Here’s a simple illustration of the technique.

    Assume an item master table. It is uniquely keyed on item number, of course. Assume two other temporary tables that were loaded from spreadsheets. One temporary table has new prices for some items. The other has new descriptions for some items. We expect only one row per item in each table, but people, being the imperfect beings they are, may accidentally load more than one row for an item.

    Here’s a query that should return one row for item BR-549.

    select m.item, np.price, nd.descr
      from itemmast as m
     left join newprices as np
        on m.item = np.item
     left join newdesc as nd
        on m.item = nd.item
     where m.item = 'BR-549'
    

    And here is the result set.

    ITEM     PRICE   DESCR 
    BR-549    5.00   Widget
    BR-549    5.00   Widget
    

    To locate the duplication, add RRN functions for the tables.

    select m.item, rrn(m), rrn(np), rrn(d)
      from itemmast as m
      left join newprices as np
        on m.item = np.item
      left join newdesc as d
        on m.item = d.item
    where m.item = 'BR-549'
    

    And now the verdict:

    ITEM    RRN ( M )  RRN ( NP )  RRN ( D )
    BR-549         2           2          2
    BR-549         2           3          2
    

    Look at the RRN columns. The values are different in the third column, therefore the new prices table is the problem.

    select * from newprices as np
    where np.item = 'BR-549'
    
    ITEM     PRICE
    BR-549    5.00
    BR-549    5.00
    

    –Ted



                         Post this story to del.icio.us
                   Post this story to Digg
        Post this story to Slashdot

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags:

    Sponsored by
    Maxava

    Migrate IBM i with Confidence

    Tired of costly and risky migrations? Maxava Migrate Live minimizes disruption with seamless transitions. Upgrading to Power10 or cloud hosted system, Maxava has you covered!

    Learn More

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Sponsored Links

    Essextec:  Quick Security Check to analyze the 500 most vulnerable data points on your IBM i
    Bug Busters Software Engineering:  RSF-HA keeps you going while it saves you a bundle
    Secure Infrastructure & Services:  FREE white paper: "9 Reasons IBM Sees a Shift to the Cloud"

    More IT Jungle Resources:

    System i PTF Guide: Weekly PTF Updates
    IBM i Events Calendar: National Conferences, Local Events, and Webinars
    Breaking News: News Hot Off The Press
    TPM @ EnterpriseTech: High Performance Computing Industry News From ITJ EIC Timothy Prickett Morgan

    Mobile Apps Get More Native-Like with Sencha Touch Update IBM Enhances Disk And Flash For Power Systems

    Leave a Reply Cancel reply

Volume 13, Number 21 -- November 6, 2013
THIS ISSUE SPONSORED BY:

Robot
WorksRight Software
ASNA

Table of Contents

  • Allow Repeated Change With Before Triggers
  • Digging Out Data Duplication
  • Admin Alert: The 4 GB Access Path Size Time Bomb

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • Public Preview For Watson Code Assistant for i Available Soon
  • COMMON Youth Movement Continues at POWERUp 2025
  • IBM Preserves Memory Investments Across Power10 And Power11
  • Eradani Uses AI For New EDI And API Service
  • Picking Apart IBM’s $150 Billion In US Manufacturing And R&D
  • FAX/400 And CICS For i Are Dead. What Will IBM Kill Next?
  • Fresche Overhauls X-Analysis With Web UI, AI Smarts
  • Is It Time To Add The Rust Programming Language To IBM i?
  • Is IBM Going To Raise Prices On Power10 Expert Care?
  • IBM i PTF Guide, Volume 27, Number 20

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle