• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Digging Out Data Duplication

    November 6, 2013 Hey, Ted

    Sometimes an SQL query should return one row, yet it returns more than one. The problem turns out to be multiple matching rows in a secondary table. Is there a way to easily isolate the secondary table that causes more than one match?

    –J

    Yes, there is an easy way. But first, let me set up the problem for the edification of other readers.

    Sometimes we execute an SQL query with the expectation that the result set will contain only one row (record), and we are surprised to get back two or more rows instead. At least one table (physical file)–usually one of the secondary tables–has more than one row that matches the join criteria. (I use the word “table” here to mean either table or view.)

    I have noticed two main causes for this behavior.

    1. Someone keyed duplicate information into the database. For example, a table that was loaded from a spreadsheet contains two rows for the same customer, vendor, item, or what have you, when it should contain only one.
    2. The join criteria are insufficient. For instance, the fallible human being who wrote the query joined two tables on a common customer number, but should have joined them on common customer number and company number.

    Finding the problem is not usually trivial because production queries often join several tables, and it’s common for some of the join criteria in a query to involve only secondary tables.

    A simple way to find the culprit is to use the RRN (relative record number) function. Here’s a simple illustration of the technique.

    Assume an item master table. It is uniquely keyed on item number, of course. Assume two other temporary tables that were loaded from spreadsheets. One temporary table has new prices for some items. The other has new descriptions for some items. We expect only one row per item in each table, but people, being the imperfect beings they are, may accidentally load more than one row for an item.

    Here’s a query that should return one row for item BR-549.

    select m.item, np.price, nd.descr
      from itemmast as m
     left join newprices as np
        on m.item = np.item
     left join newdesc as nd
        on m.item = nd.item
     where m.item = 'BR-549'
    

    And here is the result set.

    ITEM     PRICE   DESCR 
    BR-549    5.00   Widget
    BR-549    5.00   Widget
    

    To locate the duplication, add RRN functions for the tables.

    select m.item, rrn(m), rrn(np), rrn(d)
      from itemmast as m
      left join newprices as np
        on m.item = np.item
      left join newdesc as d
        on m.item = d.item
    where m.item = 'BR-549'
    

    And now the verdict:

    ITEM    RRN ( M )  RRN ( NP )  RRN ( D )
    BR-549         2           2          2
    BR-549         2           3          2
    

    Look at the RRN columns. The values are different in the third column, therefore the new prices table is the problem.

    select * from newprices as np
    where np.item = 'BR-549'
    
    ITEM     PRICE
    BR-549    5.00
    BR-549    5.00
    

    –Ted



                         Post this story to del.icio.us
                   Post this story to Digg
        Post this story to Slashdot

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags:

    Sponsored by
    Rocket Software

    Two Steps Forward, No Steps Back

    For over 35 years, Rocket Software’s solutions have empowered businesses to modernize their infrastructure, unlock data value, and drive transformation – all while ensuring modernization without disruption.

    LEARN MORE

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Sponsored Links

    Essextec:  Quick Security Check to analyze the 500 most vulnerable data points on your IBM i
    Bug Busters Software Engineering:  RSF-HA keeps you going while it saves you a bundle
    Secure Infrastructure & Services:  FREE white paper: "9 Reasons IBM Sees a Shift to the Cloud"

    More IT Jungle Resources:

    System i PTF Guide: Weekly PTF Updates
    IBM i Events Calendar: National Conferences, Local Events, and Webinars
    Breaking News: News Hot Off The Press
    TPM @ EnterpriseTech: High Performance Computing Industry News From ITJ EIC Timothy Prickett Morgan

    Mobile Apps Get More Native-Like with Sencha Touch Update IBM Enhances Disk And Flash For Power Systems

    Leave a Reply Cancel reply

Volume 13, Number 21 -- November 6, 2013
THIS ISSUE SPONSORED BY:

Robot
WorksRight Software
ASNA

Table of Contents

  • Allow Repeated Change With Before Triggers
  • Digging Out Data Duplication
  • Admin Alert: The 4 GB Access Path Size Time Bomb

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • Liam Allan Shares What’s Coming Next With Code For IBM i
  • From Stable To Scalable: Visual LANSA 16 Powers IBM i Growth – Launching July 8
  • VS Code Will Be The Heart Of The Modern IBM i Platform
  • The AS/400: A 37-Year-Old Dog That Loves To Learn New Tricks
  • IBM i PTF Guide, Volume 27, Number 25
  • Meet The Next Gen Of IBMers Helping To Build IBM i
  • Looks Like IBM Is Building A Linux-Like PASE For IBM i After All
  • Will Independent IBM i Clouds Survive PowerVS?
  • Now, IBM Is Jacking Up Hardware Maintenance Prices
  • IBM i PTF Guide, Volume 27, Number 24

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle