• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Digging Out Data Duplication

    November 6, 2013 Hey, Ted

    Sometimes an SQL query should return one row, yet it returns more than one. The problem turns out to be multiple matching rows in a secondary table. Is there a way to easily isolate the secondary table that causes more than one match?

    –J

    Yes, there is an easy way. But first, let me set up the problem for the edification of other readers.

    Sometimes we execute an SQL query with the expectation that the result set will contain only one row (record), and we are surprised to get back two or more rows instead. At least one table (physical file)–usually one of the secondary tables–has more than one row that matches the join criteria. (I use the word “table” here to mean either table or view.)

    I have noticed two main causes for this behavior.

    1. Someone keyed duplicate information into the database. For example, a table that was loaded from a spreadsheet contains two rows for the same customer, vendor, item, or what have you, when it should contain only one.
    2. The join criteria are insufficient. For instance, the fallible human being who wrote the query joined two tables on a common customer number, but should have joined them on common customer number and company number.

    Finding the problem is not usually trivial because production queries often join several tables, and it’s common for some of the join criteria in a query to involve only secondary tables.

    A simple way to find the culprit is to use the RRN (relative record number) function. Here’s a simple illustration of the technique.

    Assume an item master table. It is uniquely keyed on item number, of course. Assume two other temporary tables that were loaded from spreadsheets. One temporary table has new prices for some items. The other has new descriptions for some items. We expect only one row per item in each table, but people, being the imperfect beings they are, may accidentally load more than one row for an item.

    Here’s a query that should return one row for item BR-549.

    select m.item, np.price, nd.descr
      from itemmast as m
     left join newprices as np
        on m.item = np.item
     left join newdesc as nd
        on m.item = nd.item
     where m.item = 'BR-549'
    

    And here is the result set.

    ITEM     PRICE   DESCR 
    BR-549    5.00   Widget
    BR-549    5.00   Widget
    

    To locate the duplication, add RRN functions for the tables.

    select m.item, rrn(m), rrn(np), rrn(d)
      from itemmast as m
      left join newprices as np
        on m.item = np.item
      left join newdesc as d
        on m.item = d.item
    where m.item = 'BR-549'
    

    And now the verdict:

    ITEM    RRN ( M )  RRN ( NP )  RRN ( D )
    BR-549         2           2          2
    BR-549         2           3          2
    

    Look at the RRN columns. The values are different in the third column, therefore the new prices table is the problem.

    select * from newprices as np
    where np.item = 'BR-549'
    
    ITEM     PRICE
    BR-549    5.00
    BR-549    5.00
    

    –Ted



                         Post this story to del.icio.us
                   Post this story to Digg
        Post this story to Slashdot

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags:

    Sponsored by
    WorksRight Software

    Do you need area code information?
    Do you need ZIP Code information?
    Do you need ZIP+4 information?
    Do you need city name information?
    Do you need county information?
    Do you need a nearest dealer locator system?

    We can HELP! We have affordable AS/400 software and data to do all of the above. Whether you need a simple city name retrieval system or a sophisticated CASS postal coding system, we have it for you!

    The ZIP/CITY system is based on 5-digit ZIP Codes. You can retrieve city names, state names, county names, area codes, time zones, latitude, longitude, and more just by knowing the ZIP Code. We supply information on all the latest area code changes. A nearest dealer locator function is also included. ZIP/CITY includes software, data, monthly updates, and unlimited support. The cost is $495 per year.

    PER/ZIP4 is a sophisticated CASS certified postal coding system for assigning ZIP Codes, ZIP+4, carrier route, and delivery point codes. PER/ZIP4 also provides county names and FIPS codes. PER/ZIP4 can be used interactively, in batch, and with callable programs. PER/ZIP4 includes software, data, monthly updates, and unlimited support. The cost is $3,900 for the first year, and $1,950 for renewal.

    Just call us and we’ll arrange for 30 days FREE use of either ZIP/CITY or PER/ZIP4.

    WorksRight Software, Inc.
    Phone: 601-856-8337
    Fax: 601-856-9432
    Email: software@worksright.com
    Website: www.worksright.com

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Sponsored Links

    Essextec:  Quick Security Check to analyze the 500 most vulnerable data points on your IBM i
    Bug Busters Software Engineering:  RSF-HA keeps you going while it saves you a bundle
    Secure Infrastructure & Services:  FREE white paper: "9 Reasons IBM Sees a Shift to the Cloud"

    More IT Jungle Resources:

    System i PTF Guide: Weekly PTF Updates
    IBM i Events Calendar: National Conferences, Local Events, and Webinars
    Breaking News: News Hot Off The Press
    TPM @ EnterpriseTech: High Performance Computing Industry News From ITJ EIC Timothy Prickett Morgan

    Mobile Apps Get More Native-Like with Sencha Touch Update IBM Enhances Disk And Flash For Power Systems

    Leave a Reply Cancel reply

Volume 13, Number 21 -- November 6, 2013
THIS ISSUE SPONSORED BY:

Robot
WorksRight Software
ASNA

Table of Contents

  • Allow Repeated Change With Before Triggers
  • Digging Out Data Duplication
  • Admin Alert: The 4 GB Access Path Size Time Bomb

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • Positive News From The Kyndryl Mainframe Modernization Report
  • NAViGATE, inPower 2025 On Tap for September 2025
  • Guru: WCA4i And Granite – Because You’ve Got Bigger Things To Build
  • As I See It: Digital Coup
  • IBM i PTF Guide, Volume 27, Number 37
  • AI Is Coming for ERP. How Will IBM i Respond?
  • The Power And Storage Price Wiggling Continues – Again
  • LaserVault Adds Multi-Path Support To ViTL
  • As I See It: Spacing Out
  • IBM i PTF Guide, Volume 27, Numbers 34, 35, And 36

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle