Generating An Insert Script From Existing Data
March 15, 2016 Paul Tuohy
I was recently presented with an interesting conundrum. Can you generate an INSERT statement for data that currently exists in a table?
For example, let’s say a table contained the following data:
EMPID NAME GRADE BIRTHDATE JOINEDDATE SALARY 00001 Test AA 1956-05-14 2001-10-10 0.00 00002 Test2 - Has Null AA null null null
Could I generate the corresponding insert statement similar to the following:
insert into TESTSTUFF/TESTEMP (EMPID, NAME, GRADE, BIRTHDATE, JOINEDDATE, SALARY) values ('00001', 'Test', 'AA', '1956-05-14', '2001-10-10', .00), ('00002', 'Test2 - Has Null', 'AA', NULL, NULL, NULL);
The original request was to generate an insert script for all rows and all columns in a specified table. The insert script was going to be part an installation process where a table was to be pre-loaded with data.
But, if I was going to go to the trouble of writing a program, why not expand the requirement so it would generate the insert script based on an SQL statement as opposed to all rows and columns in a table? By using a select statement, we can reduce the number of rows and columns in the insert statement and, if required, replace column values with constant values. Such a script would provide an excellent means of generating test data from live data and giving us the option of not copying sensitive information.
How Does It Work?
The solution makes use a table function (CREATEINSERTSCRIPTC) that calls an RPG program (DESC22) that makes use of SQL Descriptors. If table functions and descriptors are a bit daunting, the good news is that all you have to do is compile the provided program, create the table function and you are ready to go.
This is how the insert statement is generated by calling the table function. The result set can be copied and pasted into a source member of my choice.
The important points to note are:
The following example is, basically, the same as the previous example except that I have replaced everyone’s last name with a constant value.
Creating The Table Function
The following statement is used to create the table function:
CREATE OR REPLACE FUNCTION TESTSTUFF/CREATEINSERTSCRIPTC ( FORSTATEMENT VARCHAR(5000) ) RETURNS TABLE ( LINEBACK VARCHAR(20000) ) LANGUAGE RPGLE SPECIFIC TESTSTUFF/CREATEINSERTSCRIPTC DETERMINISTIC MODIFIES SQL DATA RETURNS NULL ON NULL INPUT NOT FENCED EXTERNAL NAME 'TESTSTUFF/DESC22' PARAMETER STYLE DB2SQL ;
The function is passed a single parameter (FORSTATEMENT), which is the required SQL statement, and returns a table with a single column (LINEBACK), which contains a line of the insert script. Both of these variables need to be large enough to cater for the size the SQL statement and the size of a row of values. Remember, the sizes of the variables are defined on both the function and in the RPG program.
The RPG Program
When a table function is used in an SQL statement, the underlying external program is called numerous times–an initialization call and once for every “row” returned until all rows have been processed. The type of call (initialization, fetch, etc.) is indicated by a Call Type parameter.
These are the global definitions and the mainline for the RPG program DESC22. Refer to the callouts in the source for details:
A: The definition of the parameters must correspond to the definition for the parameters in the function.
B: The program sets SQLState to 02000 to indicate to the caller that all rows have been processed.
C: The call type parameter indicates whether it is an initialization or fetch call.
D: gv_types is an array used to store the data type of each column in the select statement. The type is used to determine whether or not a value should be placed in quotes in the generated insert statement.
E: It is always a good idea to give meaningless values meaningful names.
F: On every call to the program, the relevant subprocedure is called, based on the value of the call type parameter.
h dftActGrp(*no) option(*srcStmt: *noDebugIO) d createInsertScript... d PR extPgm('DESC22') (A) d forStatement 5000a const varying d lineBack 20000a varying d forStatement_ni... d 5i 0 const d lineBack_ni 5i 0 (B) d SQLStateOut 5a d functionName 517a const varying options(*varSize) d specificName 128a const varying options(*varSize) d errorText 70a varying options(*varSize) (C) d callType 5i 0 const d createInsertScript... d PI d forStatement 5000a const varying d lineBack 20000a varying d forStatement_ni... d 5i 0 const d lineBack_ni 5i 0 d SQLStateOut 5a d functionName 517a const varying options(*varSize) d specificName 128a const varying options(*varSize) d errorText 70a varying options(*varSize) d callType 5i 0 const d doCallOpen PR d doCallNext PR d doCallClose PR d useDescriptor s 20a varying inz('TEST_DESC') d myStatement s 2000a varying d i s 10i 0 d gv_numColumns s 10i 0 (D) d gv_types s 10i 0 dim(10000) d gv_padding s 10a varying d gv_firstDone s n d gv_lastDone s n (E) d CALL_OPEN C -1 d CALL_NEXT C 0 d CALL_CLOSE C 1 d EOF C '02000' d QUOTE c '''' /free (F) if (callType = CALL_OPEN); doCallOpen(); elseIf (callType = CALL_NEXT); doCallNext(); elseIf (callType = CALL_CLOSE); doCallClose(); endIf; return; /end-Free
The doCallOpen() subprocedure is called when an initialization call is made to the program. doCallOpen() will:
This is the doCallOpen() subprocedure. Refer to the callouts in the source for details:
A: Allocate the descriptor that will be used in processing the requested SQL statement.
B: Copy the passed parameter to a work field. The passed parameter is defined with the CONST keyword, which means it cannot be used in an embedded SQL statement.
C: Prepare the requested statement. Processing will only continue if the statement is OK.
D: Describe the prepared statement with the descriptor and retrieve the number of columns in the select statement.
E: Construct the start of the first line returned.
F: Loop through each of the column definitions and retrieve the data type and name of each column.
G: Add the column name to the list.
H: Store the data type of the column.
I: Use the descriptor to specify that the column value is to be cast as VARCHAR. When using SQL descriptors to retrieve column values, the rules are very strict in that the definition of the host variable must match exactly the definition of the database column. By casting to VARCHAR we can use a single host variable to retrieve any column value.
J: Declare the cursor for the prepared statement and open the cursor.
K: If there was a problem with the requested statement, return an error to the caller.
P doCallOpen B d PI d colIs s 10a d d_type s 10i 0 /free exec SQL set option naming = *SYS, datFmt = *ISO, timFmt = *ISO; exec SQL (A) allocate descriptor local :useDescriptor with max 5000; (B) myStatement = forStatement; gv_firstDone = *off; gv_lastDone = *off; exec SQL (C) prepare D1 from :myStatement; if SQLCode = 0; exec SQL (D) describe D1 using SQL descriptor local :useDescriptor; exec SQL get descriptor :useDescriptor :gv_numColumns = COUNT; (E) lineback = 'insert into **TEMPTABLE** ('; gv_padding = ''; (F) for i = 1 to gv_numColumns; exec SQL get descriptor :useDescriptor value :i :d_type = TYPE, :colIs = DB2_SYSTEM_COLUMN_NAME; (G) lineBack += gv_padding + %trim(colIs); gv_padding = ', '; (H) gv_types(i) = d_type; (I) exec SQL set descriptor :useDescriptor value :i TYPE = 12, LENGTH = 2000; endFor; lineBack += ') '; gv_padding = 'values '; exec SQL (J) declare C1 scroll cursor for D1; exec SQL open C1; (K) else; SQLStateOut = '38X11'; errorText = 'Invalid SQL Statement!'; endIf; /end-Free p e
The doCallNext() subprocedure is called once for every row to be returned. The important points to note are:
This is the doCallNext() subprocedure. Refer to the callouts in the source for details:
A: If this is the first call, return the line that was constructed in doCallOpen().
B: If all rows have been returned, set the End of File condition and end.
C: Start constructing the returned line.
D: Fetch the next row from the prepared SQL statement.
E: If all rows have been processed, return the final ‘:’, close the cursor and de-allocate the descriptor.
F: When processing a row, loop through each of the requested columns.
G: For each column, retrieve the data for the column and the null indicator.
H: Use the previously stored data types to determine whether or not the value should be enclosed in quotes.
I: Determine whether or not the column is null.
J: Add the value to the line to be returned.
P doCallNext B d PI d colPadding s 3a varying d data s 2000a varying d setQuote s 1a varying d checkNull s 10i 0 /free (A) if not gv_firstDone; gv_firstDone = *on; return; endIf; (B) if gv_lastDone; SQLStateOut = EOF; *inLR = *on; return; endIf; (C) lineBack = gv_padding + '('; gv_padding = ' , '; exec SQL (D) fetch next from C1 into SQL descriptor :useDescriptor; // If this is EOF, return a "termination" line (E) if (SQLCode = 100); lineBack = ';'; gv_lastDone = *on; exec SQL close C1; exec SQL deallocate descriptor local :useDescriptor; // Otherwise - process the values in the columns else; (F) for i = 1 to gv_numColumns; (G) exec SQL get descriptor :useDescriptor value :i :checkNull = INDICATOR, :data = DATA; // Determine if value should be in quotes setQuote = ''; (H) if (gv_types(i) = 1 or gv_types(i) = 12 or gv_types(i) = 40 or gv_types(i) = -95 or gv_types(i) = -96 or gv_types(i) = -350 or gv_types(i) = 9 or gv_types(i) = -400); setQuote = QUOTE; endIf; // Check for Null value (I) if (checkNull < 0); setQuote = ''; data = 'NULL'; endIf; (J) lineBack += colPadding + setQuote + data + setQuote; colPadding = ', '; endFor; lineBack += ')'; endIf; /end-Free p e
Finally, the doCallClose() subprocedure is called when the client makes a close call to the program. The doCallClose() subprocedure closes the cursor, deallocates the descriptor and end the program.
P doCallClose B d PI /free exec SQL close C1; exec SQL deallocate descriptor local :useDescriptor; *inLR = *on; /end-Free p e
At first I thought of doing this using a stored procedure. The problem with a stored procedure is that the program would need to generate all of the result set before returning to the caller. The row at a time processing with the table function made it easier to write.
If copy and pass (of the result set) is too much of a chore, you could write a “wrapper” program that would put the result set in a source member or IFS file.
Hopefully, this is another tool you can add to your utility belt. Enjoy!
Paul Tuohy is CEO of ComCon, an iSeries consulting company, and is one of the co-founders of System i Developer, which hosts the RPG & DB2 Summit conferences. He is an award-winning speaker who also speaks regularly at COMMON conferences, and is the author of “Re-engineering RPG Legacy Applications,” “The Programmers Guide to iSeries Navigator,” and the self-study course called “iSeries Navigator for Programmers.” Send your questions or comments for Paul to Ted Holt via the IT Jungle Contact page.