Update Only One of Duplicate Records

December 5, 2007 Hey, Ted

We have a file that has no unique key. This means that we sometimes get duplicate records in the file. This does not cause us problems in RPG programs because we can read a record, make changes to it, and update it, whether we’re using native I/O or an SQL cursor. However, having duplicate rows is a problem when we use SQL outside of programs. If two or more records are identical, there is no way to code a WHERE clause that will select only one record. Or is there?

–David

It is ugly, but possible, David. Assume table MYDATA, containing the following information:

Name          Age  Location
============  ===   ========
Bill Fold      18   Moscow
Sue Doe-Nymm   25   Athens
Ben Dover      22   Helsinki
Bill Fold      18   Moscow
Polly Wogg     19   Helsinki

Notice that the first and fourth rows are identical. Here’s the UPDATE command to change the name in only one of them.

UPDATE qtemp.mydata AS a
   SET NAME = 'Sam O''Var'
 WHERE rrn(a) =
       (SELECT MAX(rrn(b))
          FROM qtemp.mydata AS b
         WHERE NAME = 'Bill Fold')

Let’s see what’s taking place. Notice first the subquery in the last three lines. It returns the relative record number of the last row with Bill Fold in the name column. (If you prefer to update the first of the duplicate rows, use the MIN function instead.) The UPDATE command changes the name in the row whose relative record number matches the relative record number returned in the subquery. Voilà! One, and only one, row updated.

Name          Age  Location
============  ===  ========
Bill Fold      18   Moscow
Sue Doe-Nymm   25   Athens
Ben Dover      22   Helsinki
Sam O'Var      18   Moscow
Polly Wogg     19   Helsinki

By the way, this discussion points out yet another difference between the positioned update and the searched update, which I examined in Updating Through an SQL Cursor.

                     Post this story to del.icio.us
               Post this story to Digg
    Post this story to Slashdot