Generating An Insert Script From Existing Data

March 15, 2016 Paul Tuohy

I was recently presented with an interesting conundrum. Can you generate an INSERT statement for data that currently exists in a table?

For example, let’s say a table contained the following data:

EMPID NAME            GRADE BIRTHDATE  JOINEDDATE SALARY
00001 Test             AA   1956-05-14 2001-10-10  0.00
00002 Test2 - Has Null AA   null       null        null

Could I generate the corresponding insert statement similar to the following:

insert into TESTSTUFF/TESTEMP (EMPID, NAME, GRADE, BIRTHDATE, JOINEDDATE, SALARY) 
values
('00001', 'Test', 'AA', '1956-05-14', '2001-10-10', .00),
('00002', 'Test2 - Has Null', 'AA', NULL, NULL, NULL);

The original request was to generate an insert script for all rows and all columns in a specified table. The insert script was going to be part an installation process where a table was to be pre-loaded with data.

But, if I was going to go to the trouble of writing a program, why not expand the requirement so it would generate the insert script based on an SQL statement as opposed to all rows and columns in a table? By using a select statement, we can reduce the number of rows and columns in the insert statement and, if required, replace column values with constant values. Such a script would provide an excellent means of generating test data from live data and giving us the option of not copying sensitive information.

How Does It Work?

The solution makes use a table function (CREATEINSERTSCRIPTC) that calls an RPG program (DESC22) that makes use of SQL Descriptors. If table functions and descriptors are a bit daunting, the good news is that all you have to do is compile the provided program, create the table function and you are ready to go.

This is how the insert statement is generated by calling the table function. The result set can be copied and pasted into a source member of my choice.

The important points to note are:

Not all columns are selected.
Not all rows are selected.
The literal value for the WHERE clause is enclosed in two single quotes (as opposed to one). This is how you include a single quote in a literal.
The SQL statement, passed as a parameter, can be as simple or as complex as required–join multiple tables, group by, order by etc.

The following example is, basically, the same as the previous example except that I have replaced everyone’s last name with a constant value.

Creating The Table Function

The following statement is used to create the table function:

CREATE OR REPLACE FUNCTION TESTSTUFF/CREATEINSERTSCRIPTC (
	FORSTATEMENT VARCHAR(5000)  )
	RETURNS TABLE (
	LINEBACK VARCHAR(20000) )
	LANGUAGE RPGLE
	SPECIFIC TESTSTUFF/CREATEINSERTSCRIPTC
	DETERMINISTIC
	MODIFIES SQL DATA
	RETURNS NULL ON NULL INPUT
	NOT FENCED
	EXTERNAL NAME 'TESTSTUFF/DESC22'
	PARAMETER STYLE DB2SQL ;

The function is passed a single parameter (FORSTATEMENT), which is the required SQL statement, and returns a table with a single column (LINEBACK), which contains a line of the insert script. Both of these variables need to be large enough to cater for the size the SQL statement and the size of a row of values. Remember, the sizes of the variables are defined on both the function and in the RPG program.

The RPG Program

When a table function is used in an SQL statement, the underlying external program is called numerous times–an initialization call and once for every “row” returned until all rows have been processed. The type of call (initialization, fetch, etc.) is indicated by a Call Type parameter.

These are the global definitions and the mainline for the RPG program DESC22. Refer to the callouts in the source for details:

A: The definition of the parameters must correspond to the definition for the parameters in the function.

B: The program sets SQLState to 02000 to indicate to the caller that all rows have been processed.

C: The call type parameter indicates whether it is an initialization or fetch call.

D: gv_types is an array used to store the data type of each column in the select statement. The type is used to determine whether or not a value should be placed in quotes in the generated insert statement.

E: It is always a good idea to give meaningless values meaningful names.

F: On every call to the program, the relevant subprocedure is called, based on the value of the call type parameter.

     h dftActGrp(*no) option(*srcStmt: *noDebugIO)

     d createInsertScript...
     d                 PR                  extPgm('DESC22')
(A)  d  forStatement               5000a   const varying
     d  lineBack                  20000a   varying
     d  forStatement_ni...
     d                                5i 0 const
     d  lineBack_ni                   5i 0
(B)  d  SQLStateOut                   5a
     d  functionName                517a   const varying options(*varSize)
     d  specificName                128a   const varying options(*varSize)
     d  errorText                    70a   varying options(*varSize)
(C)  d  callType                      5i 0 const

     d createInsertScript...
     d                 PI
     d  forStatement               5000a   const varying
     d  lineBack                  20000a   varying
     d  forStatement_ni...
     d                                5i 0 const
     d  lineBack_ni                   5i 0
     d  SQLStateOut                   5a
     d  functionName                517a   const varying options(*varSize)
     d  specificName                128a   const varying options(*varSize)
     d  errorText                    70a   varying options(*varSize)
     d  callType                      5i 0 const

     d doCallOpen      PR
     d doCallNext      PR
     d doCallClose     PR

     d useDescriptor   s             20a   varying inz('TEST_DESC')

     d myStatement     s           2000a   varying
     d i               s             10i 0
     d gv_numColumns   s             10i 0
(D)  d gv_types        s             10i 0 dim(10000)
     d gv_padding      s             10a   varying
     d gv_firstDone    s               n
     d gv_lastDone     s               n

(E)  d CALL_OPEN       C                   -1
     d CALL_NEXT       C                   0
     d CALL_CLOSE      C                   1
     d EOF             C                   '02000'

     d QUOTE           c                   ''''

      /free

(F)    if (callType = CALL_OPEN);
          doCallOpen();
       elseIf (callType = CALL_NEXT);
          doCallNext();
       elseIf (callType = CALL_CLOSE);
          doCallClose();
       endIf;
       return;
      /end-Free

The doCallOpen() subprocedure is called when an initialization call is made to the program. doCallOpen() will:

Validate and prepare the requested SQL statement.
Use a descriptor to retrieve descriptions of all the required columns.
Build an array of data types of all of the columns.
Use the descriptor to specify that all values are returned as VARCHAR (more in a moment).
Build the first line which is returned. This is the INSERT line which contains the names of the columns being inserted. This line is not returned on this call but on the first call to doCallNext().

This is the doCallOpen() subprocedure. Refer to the callouts in the source for details:

A: Allocate the descriptor that will be used in processing the requested SQL statement.

B: Copy the passed parameter to a work field. The passed parameter is defined with the CONST keyword, which means it cannot be used in an embedded SQL statement.

C: Prepare the requested statement. Processing will only continue if the statement is OK.

D: Describe the prepared statement with the descriptor and retrieve the number of columns in the select statement.

E: Construct the start of the first line returned.

F: Loop through each of the column definitions and retrieve the data type and name of each column.

G: Add the column name to the list.

H: Store the data type of the column.

I: Use the descriptor to specify that the column value is to be cast as VARCHAR. When using SQL descriptors to retrieve column values, the rules are very strict in that the definition of the host variable must match exactly the definition of the database column. By casting to VARCHAR we can use a single host variable to retrieve any column value.

J: Declare the cursor for the prepared statement and open the cursor.

K: If there was a problem with the requested statement, return an error to the caller.

     P doCallOpen      B
     d                 PI

     d colIs           s             10a
     d d_type          s             10i 0

      /free
       exec SQL
          set option naming = *SYS,
                     datFmt = *ISO,
                     timFmt = *ISO;

       exec SQL
(A)       allocate descriptor local :useDescriptor with max 5000;

(B)    myStatement = forStatement;
       gv_firstDone = *off;
       gv_lastDone = *off;

       exec SQL
(C)       prepare D1 from :myStatement;

       if SQLCode = 0;
          exec SQL
(D)          describe D1 using SQL descriptor 
             local :useDescriptor;

          exec SQL
             get descriptor :useDescriptor 
               :gv_numColumns = COUNT;

(E)       lineback = 'insert into **TEMPTABLE** (';
          gv_padding = '';

(F)       for i = 1 to gv_numColumns;
             exec SQL
                get descriptor :useDescriptor
                    value :i
                          :d_type  = TYPE,
                          :colIs = DB2_SYSTEM_COLUMN_NAME;

(G)          lineBack += gv_padding + %trim(colIs);
             gv_padding = ', ';
(H)          gv_types(i) = d_type;

(I)          exec SQL
                set descriptor :useDescriptor
                    value :i
                          TYPE = 12,
                          LENGTH = 2000;

          endFor;

          lineBack += ') ';
          gv_padding = 'values ';

          exec SQL
(J)          declare C1 scroll cursor for D1;

          exec SQL
             open C1;

(K)    else;
          SQLStateOut = '38X11';
          errorText  = 'Invalid SQL Statement!';
       endIf;


      /end-Free
     p                 e

The doCallNext() subprocedure is called once for every row to be returned. The important points to note are:

On the first call to the subprocedure, the first line (prepared in doCallOpen()) should be returned.
When all rows have been processed, a final row with a semicolon needs to be returned.
When adding column values to the returned line, we need to cater for values that must be enclosed in quotes and for null values.

This is the doCallNext() subprocedure. Refer to the callouts in the source for details:

A: If this is the first call, return the line that was constructed in doCallOpen().

B: If all rows have been returned, set the End of File condition and end.

C: Start constructing the returned line.

D: Fetch the next row from the prepared SQL statement.

E: If all rows have been processed, return the final ‘:’, close the cursor and de-allocate the descriptor.

F: When processing a row, loop through each of the requested columns.

G: For each column, retrieve the data for the column and the null indicator.

H: Use the previously stored data types to determine whether or not the value should be enclosed in quotes.

I: Determine whether or not the column is null.

J: Add the value to the line to be returned.

     P doCallNext      B
     d                 PI

     d colPadding      s              3a   varying

     d data            s           2000a   varying
     d setQuote        s              1a   varying
     d checkNull       s             10i 0

      /free

(A)    if not gv_firstDone;
          gv_firstDone = *on;
          return;
       endIf;

(B)    if gv_lastDone;
          SQLStateOut = EOF;
          *inLR = *on;
          return;
       endIf;

(C)    lineBack = gv_padding + '(';
       gv_padding = ' ,     ';

       exec SQL
(D)       fetch next from C1 into SQL descriptor :useDescriptor;

       // If this is EOF, return a "termination" line
(E)    if (SQLCode = 100);
          lineBack = ';';
          gv_lastDone = *on;
          exec SQL
             close C1;
          exec SQL
             deallocate descriptor local :useDescriptor;


       // Otherwise - process the values in the columns
       else;

(F)       for i = 1 to gv_numColumns;
(G)          exec SQL
                get descriptor :useDescriptor
                    value :i
                          :checkNull = INDICATOR,
                          :data = DATA;

             // Determine if value should be in quotes
             setQuote = '';
(H)          if (gv_types(i) = 1 or
                 gv_types(i) = 12 or
                 gv_types(i) = 40 or
                 gv_types(i) = -95 or
                 gv_types(i) = -96 or
                 gv_types(i) = -350 or
                 gv_types(i) = 9 or
                 gv_types(i) = -400);
                setQuote = QUOTE;
             endIf;

             // Check for Null value
(I)          if (checkNull < 0);
                setQuote = '';
                data = 'NULL';
             endIf;
(J)          lineBack += colPadding + setQuote + data + setQuote;

             colPadding = ', ';
          endFor;

          lineBack += ')';
       endIf;

      /end-Free
     p                 e

Finally, the doCallClose() subprocedure is called when the client makes a close call to the program. The doCallClose() subprocedure closes the cursor, deallocates the descriptor and end the program.

     P doCallClose     B
     d                 PI

      /free
       exec SQL
          close C1;
       exec SQL
          deallocate descriptor local :useDescriptor;

       *inLR = *on;

      /end-Free
     p                 e

Other Thoughts

At first I thought of doing this using a stored procedure. The problem with a stored procedure is that the program would need to generate all of the result set before returning to the caller. The row at a time processing with the table function made it easier to write.

If copy and pass (of the result set) is too much of a chore, you could write a “wrapper” program that would put the result set in a source member or IFS file.

Hopefully, this is another tool you can add to your utility belt. Enjoy!

Paul Tuohy is CEO of ComCon, an iSeries consulting company, and is one of the co-founders of System i Developer, which hosts the RPG & DB2 Summit conferences. He is an award-winning speaker who also speaks regularly at COMMON conferences, and is the author of “Re-engineering RPG Legacy Applications,” “The Programmers Guide to iSeries Navigator,” and the self-study course called “iSeries Navigator for Programmers.” Send your questions or comments for Paul to Ted Holt via the IT Jungle Contact page.

2 thoughts on “Generating An Insert Script From Existing Data”

Rainer aus dem Spring says:

February 2, 2019 at 4:54 am

Dear Paul,

this is a great tool. Thanks for sharing it.

Unfortunately, it has a serious bug:

If an alphanumeric field contains a singlequote, it is not converted to two singlequotes.

See your sample above (O’CONNELL).
OF course, this is easily fixed 🙂

Best wishes,

Rainer

- TYLER OSBORNE says:
  
  September 28, 2022 at 11:12 pm
  
  You know SQL Server has a feature to “Generate Script” that does this. Cannot find anything on IBM i 7.4 other than your laborious manner. Ugh.

Table of Contents

Content archive

Recent Posts

Subscribe

Pages

Search