|
|||||||
|
|
![]() |
|
|
|
|
||
|
Using SQL to Find Duplicate Records Hey, Ted: Thank you and Guild Companies for the free subscriptions. I enjoy them a lot. I want to use SQL to find duplicate records in a customer ship-to file. The following statement tells me the customer-number/shipment-number combinations for which there is more than one record. Select cusno, shpno, count(*) from custfile group by cusno,shpno having count(*) > 1 But I would like to retrieve other fields besides the customer number and the shipment number. Is it possible? --Li Yes, Li, it's possible. But before I show you a couple of ways to solve your problem, let's back up just a moment to a simpler query. Suppose you wanted to retrieve the records of the customers who have more than one shipment. Since you're grouping by only one field (the customer number), you can use a subquery that returns a set of customer numbers.
select * from shipments
where custnbr in
(select custnbr
from shipments
group by custnbr
having count(*) > 1)
SQL retrieves all the fields in the records of customers with more than one shipment. But this format won't work if you decide to group on two or more fields, because the IN predicate can handle only a set of single values. The way I usually have handled this sort of challenge is with the EXISTS predicate. select * from shipments as a where exists (select custnbr,shipnbr from shipments as b where a.custnbr = b.custnbr and a.shipnbr = b.shipnbr group by custnbr, shipnbr having count(*) > 1) The EXISTS predicate proves true if the inner select returns at least one row, and returns false if the inner select returns no rows. For each row retrieved in the outer select, SQL executes the inner select and returns a row if at least two records were found with matching customer and shipment numbers. Now that SQL/400 supports common table expressions, I am getting more accustomed to using a JOIN operation, rather than EXISTS. select a.* from shipments as a inner join (select custnbr,shipnbr from ship2 as b group by custnbr, shipnbr having count(*) > 1) as b on a.custnbr = b.custnbr and a.shipnbr = b.shipnbr The shipments table is joined to a temporary table, known as b, which consists of duplicate customer-number/shipment-number combinations. You'll need to be at V4R4 to use common table expressions. For more examples of common table expressions, see the following back issues of Midrange Guru: "Common Table Expressions Make Summarizing Data Easy" "Retrieve Subtotals and Detail Data at the Same Time" --Ted
|
Editors
Contact the Editors |
| Copyright © 1996-2008 Guild Companies, Inc. All Rights Reserved. |