How To Insert With A Common Table Expression
August 16, 2016 Hey, Ted
I have to merge some new item numbers into our item master file. The problem I’ve run into is that there is more than one record for some items. None of the examples I found on the Web work for me. How do I write an SQL statement to copy all records from one file to another avoiding duplicates on part number?
This is a new wrinkle on a challenge about which I have written before, namely the need to select only one row (record) to represent a group. (See the Related Stories below.) The wrinkle is that not all database management systems handle common table expressions within inserts in the same way. I’ll come back to that in a minute.
First, let’s set up the problem. Assume an item master file.
create table items ( Item_ID char ( 6 ) primary key, Description char ( 20 ), Cost dec ( 7,2 ), Price dec ( 7,2 ), Type char ( 1 ), Class char ( 1 ), UnitOfMeasure char ( 2 ) )
Here’s the load file:
create table dataload ( Item_number char ( 6 ), Description char ( 20 ), Cost dec ( 7,3 ), Price dec ( 7,2 ), DateSold dec ( 8,0 ) ); insert into dataload values ('AT-771', 'Widget' , 177.50 , 258.77 , 20160801), ('BR-549', 'Samples' , 55.04 , 135.80 , 20160801), ('AT-771', '#4 Widget', 738.22 , 1386.36 , 20160802)
The challenge is to load these items into the item master file. Where there are two or more rows for an item, we pick the one with the latest date.
Here’s one method.
insert into items (item_id, description, cost, price) select a.item_number, a.description, a.cost, a.price from dataload as a join (select item_number, max(DateSold) as DateSold from dataload group by item_number) as b on (a.item_number, a.DateSold) = (b.item_number, b.DateSold)
There are two SELECT statements within this INSERT. The system runs the second SELECT–the one in parentheses following JOIN–first to build a temporary table of each distinct item number and the latest date that that item number was sold. This temporary table is joined to the original data on item number and date sold, yielding the desired row for each item. The result set from the join is inserted into the table.
This will not work if there are two sales on the same date for one item. Here’s another method that does not suffer from that limitation.
insert into items (item_id, description, cost, price) with Temp as ( select d.*, row_number() over (partition by d.item_number order by d.datesold desc) as rn from dataload as d) select item_number, description, cost, price from temp where rn = 1
The common table expression, TEMP, uses the ROW_NUMBER function to assign a sequential number to each row within each item number. Think of PARTITION BY as a control break. ORDER BY is a sort, of course. The system sorts the data by item number and numbers each row in the result set. Numbering restarts at 1 for each item.
The second SELECT retrieves the number 1 row for each item, preventing duplicate items from being loaded into the ITEMS table.
What I wanted to point out to you is the syntax to use a common table expression within an insert.
INSERT . . . WITH . . . SELECT . . .
Be aware that the syntax for INSERT that uses a common table expression is different on other systems. I suppose that all database management systems will eventually support all the variations in their ongoing attempt to lure customers from other platforms.
Ted Holt welcomes your comments and questions. Email him through the IT Jungle Contacts page.