Stuff
OS/400 Edition
Volume 2, Number 1 -- January 16, 2003

Supercharge Your Search and Replace Tasks


by Michael Sansoterra

[The code for this article is available for download.]

display

As a programmer, not many things make me cringe more than a customer requesting something like the following: "We want to change company three to company one. We need you to substitute a 1 for a 3 in every occurrence of the company field, in every relevant file." A few years back, I received just such a request from a MAPICS customer. After I picked up my jaw from the ground, I thought about how I would identify every company field in every one of the gargantuan number of tables in the entire MAPICS application. The fruit of that effort is the Global Data Change utility (GLBDTACHG).

GLBDTACHG will create multiple SQL statements to search for a value in a specific field type and replace it with a new value. (Download the code for this utility here.)

How It Works

In analyzing this problem, I remembered that file SYSCOLUMNS in QSYS2 would aid my cause. This file contains "meta data" (or attributes) about every field in every file on the system. Armed with the knowledge that the MAPICS company field is always zoned (2,0) and that the text description for this field is consistent enough that I could count on the word company being contained in the field text, I queried this file to identify all fields with these attributes.

The resulting query was similar to the one shown here.

I now had a list of every field that needed to be changed. The following information was now identified:

1. All table (file) names containing a "company" column
2. Column (field) Names
3. Before Value (3)   (Value to find)
4. After Value (1)    (New Value)

This was enough information to have a program construct SQL statements to do all of the changes automatically. With the above information, for example, the following SQL UPDATE statement is easily constructed:

UPDATE MBCDREP    /* Order Line Items */
   SET CDAENB=1   /* Company – After  Value */
 WHERE CDAENB=3   /* Company - Before Value */

The GLBDTACHG command will do this very thing, by accepting the following information:

  • Column attributes to identify a column
  • Old Value to search for
  • New Value to replace with

The utility performs the following functions:

  • Search SYSCOLUMNS for all columns meeting attribute criteria
  • For each column identified, construct an SQL update statement used to search for the SEARCH VALUE and set it to the REPLACE VALUE
  • The resulting SQL statements will be separated by a semicolon and placed in a source member for review and subsequent execution by RUNSQLSTM or Operations Navigator

In essence, GLBDTACHG is nothing more than a glorified search and replace utility for searching an entire application's data. Here is a list of the command's parameters and their function:

GLBDTACHG Command
Parameter Description
SYSCOL System column name contents. In this context, the system column name refers to the 10 character column name, as opposed to the 30 character alias name. If an application has a consistent name or partial name for a column, you can specify that here. For example, if a software package has a company name column that is consistently named xxCONO (where xx is a table identifier), CONO can be specified here.

The *NOCHECK value means that the program will not take the column name into account when searching for eligible columns.
COLDATATYP Choose the data type for your column. In cases where definitions are inconsistent (e.g. if some definitions are packed and others zoned), *ALLNUM can be specified to look for all numeric fields. *ALLCHAR can be specified to look for all character columns (CHAR and VARCHAR.)

*ZONED and *NUMERIC are equivalent. *DECIMAL and *PACKED are equivalent.
COLLEN For numeric fields, the column length is the precision (total number of places including those to the right of the decimal place.) For character and variable character the column length is the defined length of the field.

If your columns don't all have the same length, specify *NOCHECK. *NOCHECK can be used for data types with predefined lengths (such as INT and BIGINT) or if the lengths are inconsistently defined.
COLDEC For numeric columns, specify the scale (number of decimal places.) This value is ignored for non-numeric columns.
COLLABEL If a column has a consistent text definition, specify the text here. Only partial column text is required. For example, to only use columns whose text contains the word "City", enter city here.
OLDVALUE This is the value to search for. It can be a constant, or a valid SQL expression.

For dates, they should be formatted correctly either using an ISO date or a date format that matches your system's default date format.

The constant or expression is limited to 60 characters.
OLDTYPE If the OLDVALUE is an expression, enter *EXPR here. If it's a constant enter *CONST. Hex and graphic constants are expressions. If *CONST is specified, the program will embed the constant in single quotes for character columns. If *EXPR is specified, the program assumes that the user will put in the proper quotes
OPERATOR Enter the comparison operator for the search value. Generally, this will always be equal to do a one for one search/replace. However, other operators such as *GT, *LT,*IN, *LIKE, *NOTLIKE, *BETWEEN, *IS and *ISNOT are allowed.

When using *IS or *ISNOT, the OLDVALUE must be NULL. When using complex comparisons such as *IN or *BETWEEN, enter the criteria as an expression in the OLDVALUE. For example, to identify dates in 2002, then specify *BETWEEN as the operator and '''01/01/2002'' AND ''12/31/2002''' in the OLDVALUE.
NEWVALUE This is the value to replace the old value with. It can be a constant, or an expression. To refer to the column itself in an expression, use ~FLD~. When the SQL statements are generated, ~FLD~ will be replaced with the current column name.

The constant or expression is limited to 60 characters.
NEWTYPE If the NEWVALUE is an expression, enter *EXPR here. If it's a constant enter *CONST. As with OLDTYPE, this parm does nothing besides determine if single quotes should be added around the supplied value. This parm is ignored for non-character data types.
LIBRARY Specify the library to search. *ALL will search all libraries and *NONQLIBS will search all libraries that do not begin with Q.
EXCLDICT Exclude dictionary name. Most software applications contain a data dictionary (or field reference) table. Since these dictionaries contain a large number of fields and no records they should be excluded.
SRCFILE Specify source file name and library for the constructed SQL statements.
SRCMBR Specify the source member where the SQL statements will reside.
MBROPT Specify whether to *ADD to or *REPLACE an existing member.
RUNSQLSTM Specifies whether the SQL statements should be run automatically with RUNSQLSTM. This should be used with caution as all SQL statements created will immediately be run with no commitment control and severity level 40 which means it will execute all statements. Usually it's better to review the source file first and test the statements in a test environment.
SQLNAMING This option determines what naming type the SQL statements will be created with. With *SYS (default) the library and table are separated by a slash. With *SQL, the library and table names are separated by a dot.

Here are some samples uses for the utility.

Example 1: Change State Code for Quebec from QB to QC

This command will find every column in library TEST that contains the letters ST, with label text containing the word STATE and with attributes of CHAR(3). SQL statements will be generated so that whenever the identified field contains the value QB it will be changed to QC.

GLBDTACHG SYSCOL(ST)
          COLDATATYP(*CHARACTER)
          COLLEN(3)
          COLLABEL(STATE)
          OLDVALUE('QB')
          OLDTYPE(*CONST)  
          NEWVALUE('QC')
          NEWTYPE(*CONST)
          LIBRARY(TEST)
          EXCLDICT(DATADICT) 
          SRCFILE(SLR/QSQLSRC) 
          SRCMBR(UPDSTATE)

Example 2: Change NULLs to Zero

The statement will find every column in library DATALIB with a name that contains AMT and is numeric with four decimal positions. Any value that is NULL will be changed to zero.

GLBDTACHG SYSCOL(AMT)
          COLDATATYP(*ALLNUM)
          COLDEC(4)
          OLDVALUE(NULL)
          OLDTYPE(*CONST)
          OPEARTOR(*IS) 
          NEWVALUE(0) 
          NEWTYPE(*CONST)
          LIBRARY(DATALIB)
          SRCFILE(SLR/QSQLSRC)
          SRCMBR(RMVNULLS)

Example 3: Append an Additional Four Binary Zeros to End of All BLOB Columns That Contain Hex ID X'01010200'

GLBDTACHG COLDATATYP(*BLOB)                                
          OLDVALUE('BLOB(''%'' || X''01010200'' || ''%'')')
          OLDTYPE(*EXPR)                                   
          OPEARTOR(*LIKE)                                  
          NEWVALUE('~FLD~ || BLOB(X''00000000'')')         
          NEWTYPE(*EXPR)                                   
          LIBRARY(DATALIB)                                    
          EXCLDICT(*NONE)                                  
          SRCFILE(SLR/QSQLSRC)                            
          SRCMBR(UPDBLOB)                                  
          MBROPT(*REPLACE)                                 

This example uses the LIKE operator to search within the identified column. Also notice that the % wildcard character is embedded inside the BLOB function so that the comparison data types will be compatible (e.g., you can't concatenate BLOB and CHAR fields). Finally, the special value ~FLD~ is used in the NEWVALUE parameter so that any identified fields will refer to themselves. The SQL will look similar to the following:

-- 
--SLR/LABDATA
--Description:Laboratory Test Results
-- 
UPDATE SLR/LABDATA 
   SET EXPRESULT=EXPRESULT || BLOB(X'00000000')
 WHERE EXPRESULT LIKE BLOB('%' || X'01010200' || '%');

Comments on Functionality

This utility works on the assumption that a given field's attributes can be uniquely (or nearly uniquely) identified by name, data type, text, or a combination of these. If your application's database doesn't have consistent column definitions, this utility will not do much good.

Each time the command generates a source member, the statements should be reviewed before executing them. For example, if a few unwanted UPDATEs creep in, because of inconsistent or overlapping attributes, these should be removed. Also, with large software packages, it's common to have work table shells with no members that will end up with an UPDATE statement. These should also be removed. To aid in this identification process, a printout is generated detailing all table, column, and column attributes identified in the search.

Once the source member has been reviewed, the statements can be run with the RUNSQLSTM (Run SQL Statements) command or OpsNav. Your iSeries will have the RUNSQLSTM command if you have V5R1 or higher or the DB2 SQL Development Kit installed.

RUNSQLSTM has a few things to be aware of. By default, a failed SQL statement will cause RUNSQLSTM to stop processing the SQL statements. If you know you may have an SQL statement or two that will fail, and you want processing to continue anyway, you must raise the severity level (ERRLVL) parameter. By raising ERRLVL to 40, all errors are ignored. Also, the COMMIT parameter defaults to *CHG. For updates on large tables, this can result in excessive resource utilization and should be avoided by running without commitment control or inserting appropriate BEGIN TRANSACTION and COMMIT statements in the source (each statement terminated by a semicolon).

RUNSQLSTM also expects the source member to have a data width of 80 characters. This fact is only relevant in the rare circumstance that the text of a single UPDATE statement may extend past a single line. If you use a source member longer than 80 characters, anything past position 80 will be ignored. Likewise, if your member is less than 80 characters, spaces will be inserted to pad the record to 80 columns. If you request that GLBDTACHG execute the source member for you, an 80-column source member is required.

Finally, RUNSQLSTM allows your statements to be syntax checked (without executing) by specifying *SYN on the PROCESS parameter. It's a good idea to do this when searching or replacing using complex expressions. For example, if you accidentally code an extra quotation mark or an out of place parenthesis, GLBDTACHG will construct an invalid SQL statement. Using the syntax checker will help identify problems because GLBDTACHG doesn't syntax check the expressions entered.

The statements can also be run in the Client Access Operations Navigator SQL utility, but the following things should be kept in mind:

  • C/A default naming option is *SQL, so either the statements have to be created with *SQL naming or the C/A option has to be changed with *SYS.
  • Older versions of C/A may not accept the -- SQL comment, so they will have to be removed.
  • The SQL source member created needs to be transferred to a PC or IFS using a utility such as FTP, data transfer, or CPYTOSTMF.

Program Notes

If you're interested in how the program was written, this section is for you.

First, a command definition was built to accept all of the parameters from a command line. Command definitions are useful because they can provide a nice user interface, validate user input, and help manage long and complex parameter lists. For more background on creating commands and using them with RPG programs, see Kevin Vandever's article "Back to Basics: Homegrown OS/400 Commands."

The RPG program source is divided into three main sections: initialization, identifying columns to update from the system catalog, and building UPDATE statements.

The initialization section is simple: Some initial variables are checked to ensure, for example, that the requested source file/member combination is valid.

The second section of code builds an SQL statement to search the SYSCOLUMNS view. The base SQL statement for this search can be found in a compile time array at the end of the program. This compile time data includes the SELECT and FROM portions of the SELECT statement. Portions of the WHERE clause are dynamically built based on the user's selection. The end result will be similar to the SELECT statement in the example query, described above. For example, if the user opted to search library LIVEDATA, the WHERE clause would contain the following:

SYS_DNAME='LIVEDATA'  

As this SQL statement is being constructed, it is written, line by line, to a source member (mentioned above). Why are we doing this? Because once the SQL statement is in a source member, we can convert it to a Query Manager Query (QM Query) object, using the CRTQMQRY command.

If you're not familiar with QM Queries, you can think of them as a container for an SQL statement. A QM Query is invoked with the STRQMQRY command. Using STRQMQRY is powerful because it can dump the results of a SELECT statement into a file when using the OUTPUT(*OUTFILE) parameter, which is what we need to do. The search results will be placed in a file called COLUMNDATA. These results will be needed later to build individual SQL UPDATE statements to actually do the search/replace work against the database.

An easier alternative to going through the QM Query hodge podge would be to execute the SQL statement right inside the program using embedded SQL. However, that method requires the SQL Development Kit product to be installed. Another alternative would be to use the SQL Call Level Interface (CLI). However, this method is somewhat difficult because it uses low-level APIs to run SQL statements and return data. The final alternative is just to let the RPG program read through the SYSCOLUMNS file, but generally it's easier to let SQL do that type of work.

In summary, section two of the program constructs an SQL statement to search the system catalog for our criteria; saves the SQL statement to a source member; converts the source member into a QM Query; and executes the QM Query, which will retrieve all of the columns meeting our criteria and put them in a file called COLUMNDATA in QTEMP.

Section three is where the UPDATE statements are built. First, COLUMNDATA is opened manually (by default RPG opens its files when the program starts, but the USROPN keyword on the F-spec signals to the compiler that the programmer is responsible for opening it). One reason why we don't want to open the file automatically when the program starts is that the file might not yet exist.

Once the file is open, we read the data and build an individual SQL UPDATE statement for each table found to do the search and replace work. There is a slight problem here. Before building the SQL UPDATE statement, we need to know if it will be a simple UPDATE (one field per table) or a compound UPDATE (multiple fields). If it's a simple update, the syntax will be easy to build:

Update Customers
   Set Company=1
 Where Company=3;

However, if a single table has multiple fields that need to be updated, we have to build a compound statement, such as the following:

Update Transfer
   Set FrCono=CASE WHEN FrCono=3 Then 1 Else FrCono End,
       ToCono=CASE WHEN ToCono=3 Then 1 Else ToCono End
 Where FrCono=3
    Or ToCono=3;

Therefore, we need to know beforehand which type of UPDATE statement to build. This is where the NOFIELDS procedure comes in. This code will count the number of columns each table has in it. So every time a record is read and a new table name is encountered, NOFIELDS is invoked to find out how many columns are in the table.

NOFIELDS works by doing a "read ahead" against the COLUMNDATA file. It first saves the relative record number of the current record in the file. Next, it reads records sequentially, incrementing a counter for each additional field it encounters, until a new table name is found. When a new table (or end of file) is encountered, the loop is exited and the file is positioned back to the original record when the procedure was invoked.

When a new table name is encountered, comments are created containing the table information and the UPDATE section of the statement is built: UPDATE LIB/TABLE. Next, the SET section and the WHERE section of the SQL are built simultaneously. When building expressions for the SET portion, the identified column is set equal to the constant or expression specified in the NEWVALUE parameter of the command. Likewise, the WHERE portion is built by using the identified column and the OPERATOR and OLDVALUE parameters of the command. Additionally, if a compound statement is being built, expressions in the SET section are separated by a comma and expressions in the WHERE are separated by an OR.

Once a single table has been processed, its complete SQL UPDATE statement will exist in three different program variables: SQLUpdate, SQLSet, and SQLWhere. (Each section of the SQL statement has its own character variable so they can be constructed independently.) Concatenating these variables will yield the entire SQL statement. At this point these variables may contain large strings. However, we need these statements to be placed neatly in an 80-column format for placement in a source file. The source member we will be populating is called GLBSQLSRC.

This is where the WRITETEXT procedure comes in. WRITETEXT will accept a portion of SQL statement (such as the SQLSet) and write it to the source member. It acts as a simple word processor, in that it takes the unformatted text and causes it to wrap when a certain number of columns has been exceeded (usually 80 for a source member).

WRITETEXT looks for embedded spaces as legitimate points to break up the statement. Because of this, the BEFORE and AFTER values had their embedded spaces replaced with a special hex character in the EDITVALUE subroutine. This was done so that if the search or replace expression contained an embedded space (say, MILLENIUM FALCON), the WRITETEXT routine wouldn't split the expression, if, perchance, the expression became too long. If this type of split were allowed to happen, the SQL processor might erroneously interpret the expression with a large number of embedded spaces. Also notice, before the data is written, any occurrences of the special hex character are restored to a space using the XLATE op code.

In summary, section three of the program performs the following functions: It reads the COLUMNDATA file created in section two; for each table in the COLUMNDATA file, the number of fields that need updating for that table are ascertained by the NOFIELDS procedure; a simple or compound UPDATE SQL statement is built for each table encountered; an entry for the identified column is made in a spool file report; and the WRITETEXT procedure "wraps" the SQL statement and writes it to the source file and member specified in the GLBCHGDTA command.

A final step of the program is to execute the RUNSQLSTM command, if the user calls for it. A command string is built to execute the statement with no commitment control and a level 40 severity (which will allow processing to continue even if an error is encountered.)

GLBDTACHG is useful for managing application wide data changes. Leveraging the benefits of the system column table, the utility creates SQL statements to take the work out of the search and replace process.


Michael Sansoterra is a programmer/analyst for SilverLake Resources, an IT services firm based in Grand Rapids, Michigan. He can be reached at msansoterra@silver-lake.com.


Sponsored By
COMMON

Get the IT training you need by attending COMMON Users Group's Spring 2003 IT Education Conference and Expo, March 9 - 13, in Indianapolis.

Choose from hundreds of sessions and labs covering the widest range of industry topics, including RPG IV, LPAR, WebSphere, and High Availability.

COMMON is the best value in IT education, so don't miss out!

Click and visit www.common.org for details!


THIS ISSUE
SPONSORED BY:

ASNA
COMMON
WorksRight Software
Profound Logic Software


BACK ISSUES

TABLE OF
CONTENTS
Back To Basics: The Date Data Type

Supercharge Your Search and Replace Tasks

Tired of Resetting Terminals on the AS/400? Let Somebody Else Do It

Extending the Windows Shell with VBScript


Editors
Shannon O'Donnell
Kevin Vandever

Managing Editor
Shannon Pastore

Contributing Editors:
Howard Arner
Joe Hertvik
Ted Holt
David Morris
Richard Shaler

Publisher and
Advertising Director:

Jenny Thomas

Advertising Sales Representative
Kim Reed

Contact the Editors
Do you have a gripe, inside dope or an opinion?
Email the editors:
editors@itjungle.com


Copyright © 1996-2008 Guild Companies, Inc. All Rights Reserved.