Newsletters Subscriptions Media Kit About Us Contact Search Home

mgo
OS/400 Edition
Volume 3, Number 18 -- March 14, 2003

Using SQL to Find Duplicate Records


Hey, Ted:

Thank you and Guild Companies for the free subscriptions. I enjoy them a lot.

I want to use SQL to find duplicate records in a customer ship-to file. The following statement tells me the customer-number/shipment-number combinations for which there is more than one record.

 
 
   
 
 
 
 
 

Select cusno, shpno, count(*)
from custfile
group by cusno,shpno
having count(*) > 1
 

But I would like to retrieve other fields besides the customer number and the shipment number. Is it possible?

--Li


Yes, Li, it's possible. But before I show you a couple of ways to solve your problem, let's back up just a moment to a simpler query.

Suppose you wanted to retrieve the records of the customers who have more than one shipment. Since you're grouping by only one field (the customer number), you can use a subquery that returns a set of customer numbers.

select * from shipments        
   where custnbr in        
   (select custnbr         
      from shipments       
      group by custnbr     
      having count(*) > 1)

SQL retrieves all the fields in the records of customers with more than one shipment.

But this format won't work if you decide to group on two or more fields, because the IN predicate can handle only a set of single values.

The way I usually have handled this sort of challenge is with the EXISTS predicate.

select * from shipments as a          
  where exists                    
    (select custnbr,shipnbr       
       from shipments as b            
       where a.custnbr = b.custnbr
         and a.shipnbr = b.shipnbr
       group by custnbr, shipnbr  
       having count(*) > 1) 

The EXISTS predicate proves true if the inner select returns at least one row, and returns false if the inner select returns no rows. For each row retrieved in the outer select, SQL executes the inner select and returns a row if at least two records were found with matching customer and shipment numbers.

Now that SQL/400 supports common table expressions, I am getting more accustomed to using a JOIN operation, rather than EXISTS.

select a.* from shipments as a       
  inner join                     
    (select custnbr,shipnbr      
       from ship2 as b           
       group by custnbr, shipnbr 
       having count(*) > 1) as b 
      on a.custnbr = b.custnbr   
     and a.shipnbr = b.shipnbr  

The shipments table is joined to a temporary table, known as b, which consists of duplicate customer-number/shipment-number combinations.

You'll need to be at V4R4 to use common table expressions. For more examples of common table expressions, see the following back issues of Midrange Guru:

"Common Table Expressions Make Summarizing Data Easy"

"Retrieve Subtotals and Detail Data at the Same Time"

--Ted


Sponsored By
ADVANCED SYSTEMS CONCEPTS

Monitor Critical Events
Before your operation goes down!
- iSeries message queues -
- iSeries subsystems -
- Windows Servers -
- TCP/IP devices/services -
- Runaway jobs -

OpCenter Message Alert is the easiest, most versatile way to save your system, and your job! Built on graphical console technology, Message Alert can be configured and monitored remotely, or devices can be paged/emailed in the event of pending disaster. Why continue living in the dark when OpCenter is ready to assist you?

Read More


THIS ISSUE
SPONSORED BY:

Advanced Systems Concepts
Kisco Information Systems


BACK ISSUES

TABLE OF
CONTENTS

Using SQL to Find Duplicate Records

Responsive Rooms with a View

Reader Feedback and Insights: Results of System/36 Tips Poll


Editors
Howard Arner
Joe Hertvik
Ted Holt
David Morris

Managing Editor
Shannon Pastore

Publisher and
Advertising Director:

Jenny Thomas

Advertising Sales Representative
Kim Reed

Contact the Editors
Do you have a gripe, inside dope or an opinion?
Email the editors:
editors@itjungle.com

Copyright © 1996-2008 Guild Companies, Inc. All Rights Reserved.