|
|
![]() |
|
|
Supercharge Your Search and Replace Tasks by Michael Sansoterra [The code for this article is available for download.]
As a programmer, not many things make me cringe more than a customer requesting something like the following: "We want to change company three to company one. We need you to substitute a 1 for a 3 in every occurrence of the company field, in every relevant file." A few years back, I received just such a request from a MAPICS customer. After I picked up my jaw from the ground, I thought about how I would identify every company field in every one of the gargantuan number of tables in the entire MAPICS application. The fruit of that effort is the Global Data Change utility (GLBDTACHG). GLBDTACHG will create multiple SQL statements to search for a value in a specific field type and replace it with a new value. (Download the code for this utility here.) How It Works In analyzing this problem, I remembered that file SYSCOLUMNS in QSYS2 would aid my cause. This file contains "meta data" (or attributes) about every field in every file on the system. Armed with the knowledge that the MAPICS company field is always zoned (2,0) and that the text description for this field is consistent enough that I could count on the word company being contained in the field text, I queried this file to identify all fields with these attributes. The resulting query was similar to the one shown here. I now had a list of every field that needed to be changed. The following information was now identified: 1. All table (file) names containing a "company" column 2. Column (field) Names 3. Before Value (3) (Value to find) 4. After Value (1) (New Value) This was enough information to have a program construct SQL statements to do all of the changes automatically. With the above information, for example, the following SQL UPDATE statement is easily constructed: UPDATE MBCDREP /* Order Line Items */ SET CDAENB=1 /* Company After Value */ WHERE CDAENB=3 /* Company - Before Value */ The GLBDTACHG command will do this very thing, by accepting the following information:
The utility performs the following functions:
In essence, GLBDTACHG is nothing more than a glorified search and replace utility for searching an entire application's data. Here is a list of the command's parameters and their function:
Here are some samples uses for the utility. Example 1: Change State Code for Quebec from QB to QC This command will find every column in library TEST that contains the letters ST, with label text containing the word STATE and with attributes of CHAR(3). SQL statements will be generated so that whenever the identified field contains the value QB it will be changed to QC.
GLBDTACHG SYSCOL(ST)
COLDATATYP(*CHARACTER)
COLLEN(3)
COLLABEL(STATE)
OLDVALUE('QB')
OLDTYPE(*CONST)
NEWVALUE('QC')
NEWTYPE(*CONST)
LIBRARY(TEST)
EXCLDICT(DATADICT)
SRCFILE(SLR/QSQLSRC)
SRCMBR(UPDSTATE)
Example 2: Change NULLs to Zero The statement will find every column in library DATALIB with a name that contains AMT and is numeric with four decimal positions. Any value that is NULL will be changed to zero.
GLBDTACHG SYSCOL(AMT)
COLDATATYP(*ALLNUM)
COLDEC(4)
OLDVALUE(NULL)
OLDTYPE(*CONST)
OPEARTOR(*IS)
NEWVALUE(0)
NEWTYPE(*CONST)
LIBRARY(DATALIB)
SRCFILE(SLR/QSQLSRC)
SRCMBR(RMVNULLS)
Example 3: Append an Additional Four Binary Zeros to End of All BLOB Columns That Contain Hex ID X'01010200'
GLBDTACHG COLDATATYP(*BLOB)
OLDVALUE('BLOB(''%'' || X''01010200'' || ''%'')')
OLDTYPE(*EXPR)
OPEARTOR(*LIKE)
NEWVALUE('~FLD~ || BLOB(X''00000000'')')
NEWTYPE(*EXPR)
LIBRARY(DATALIB)
EXCLDICT(*NONE)
SRCFILE(SLR/QSQLSRC)
SRCMBR(UPDBLOB)
MBROPT(*REPLACE)
This example uses the LIKE operator to search within the identified column. Also notice that the % wildcard character is embedded inside the BLOB function so that the comparison data types will be compatible (e.g., you can't concatenate BLOB and CHAR fields). Finally, the special value ~FLD~ is used in the NEWVALUE parameter so that any identified fields will refer to themselves. The SQL will look similar to the following:
--
--SLR/LABDATA
--Description:Laboratory Test Results
--
UPDATE SLR/LABDATA
SET EXPRESULT=EXPRESULT || BLOB(X'00000000')
WHERE EXPRESULT LIKE BLOB('%' || X'01010200' || '%');
Comments on Functionality This utility works on the assumption that a given field's attributes can be uniquely (or nearly uniquely) identified by name, data type, text, or a combination of these. If your application's database doesn't have consistent column definitions, this utility will not do much good. Each time the command generates a source member, the statements should be reviewed before executing them. For example, if a few unwanted UPDATEs creep in, because of inconsistent or overlapping attributes, these should be removed. Also, with large software packages, it's common to have work table shells with no members that will end up with an UPDATE statement. These should also be removed. To aid in this identification process, a printout is generated detailing all table, column, and column attributes identified in the search. Once the source member has been reviewed, the statements can be run with the RUNSQLSTM (Run SQL Statements) command or OpsNav. Your iSeries will have the RUNSQLSTM command if you have V5R1 or higher or the DB2 SQL Development Kit installed. RUNSQLSTM has a few things to be aware of. By default, a failed SQL statement will cause RUNSQLSTM to stop processing the SQL statements. If you know you may have an SQL statement or two that will fail, and you want processing to continue anyway, you must raise the severity level (ERRLVL) parameter. By raising ERRLVL to 40, all errors are ignored. Also, the COMMIT parameter defaults to *CHG. For updates on large tables, this can result in excessive resource utilization and should be avoided by running without commitment control or inserting appropriate BEGIN TRANSACTION and COMMIT statements in the source (each statement terminated by a semicolon). RUNSQLSTM also expects the source member to have a data width of 80 characters. This fact is only relevant in the rare circumstance that the text of a single UPDATE statement may extend past a single line. If you use a source member longer than 80 characters, anything past position 80 will be ignored. Likewise, if your member is less than 80 characters, spaces will be inserted to pad the record to 80 columns. If you request that GLBDTACHG execute the source member for you, an 80-column source member is required. Finally, RUNSQLSTM allows your statements to be syntax checked (without executing) by specifying *SYN on the PROCESS parameter. It's a good idea to do this when searching or replacing using complex expressions. For example, if you accidentally code an extra quotation mark or an out of place parenthesis, GLBDTACHG will construct an invalid SQL statement. Using the syntax checker will help identify problems because GLBDTACHG doesn't syntax check the expressions entered. The statements can also be run in the Client Access Operations Navigator SQL utility, but the following things should be kept in mind:
Program Notes If you're interested in how the program was written, this section is for you. First, a command definition was built to accept all of the parameters from a command line. Command definitions are useful because they can provide a nice user interface, validate user input, and help manage long and complex parameter lists. For more background on creating commands and using them with RPG programs, see Kevin Vandever's article "Back to Basics: Homegrown OS/400 Commands." The RPG program source is divided into three main sections: initialization, identifying columns to update from the system catalog, and building UPDATE statements. The initialization section is simple: Some initial variables are checked to ensure, for example, that the requested source file/member combination is valid. The second section of code builds an SQL statement to search the SYSCOLUMNS view. The base SQL statement for this search can be found in a compile time array at the end of the program. This compile time data includes the SELECT and FROM portions of the SELECT statement. Portions of the WHERE clause are dynamically built based on the user's selection. The end result will be similar to the SELECT statement in the example query, described above. For example, if the user opted to search library LIVEDATA, the WHERE clause would contain the following: SYS_DNAME='LIVEDATA' As this SQL statement is being constructed, it is written, line by line, to a source member (mentioned above). Why are we doing this? Because once the SQL statement is in a source member, we can convert it to a Query Manager Query (QM Query) object, using the CRTQMQRY command. If you're not familiar with QM Queries, you can think of them as a container for an SQL statement. A QM Query is invoked with the STRQMQRY command. Using STRQMQRY is powerful because it can dump the results of a SELECT statement into a file when using the OUTPUT(*OUTFILE) parameter, which is what we need to do. The search results will be placed in a file called COLUMNDATA. These results will be needed later to build individual SQL UPDATE statements to actually do the search/replace work against the database. An easier alternative to going through the QM Query hodge podge would be to execute the SQL statement right inside the program using embedded SQL. However, that method requires the SQL Development Kit product to be installed. Another alternative would be to use the SQL Call Level Interface (CLI). However, this method is somewhat difficult because it uses low-level APIs to run SQL statements and return data. The final alternative is just to let the RPG program read through the SYSCOLUMNS file, but generally it's easier to let SQL do that type of work. In summary, section two of the program constructs an SQL statement to search the system catalog for our criteria; saves the SQL statement to a source member; converts the source member into a QM Query; and executes the QM Query, which will retrieve all of the columns meeting our criteria and put them in a file called COLUMNDATA in QTEMP. Section three is where the UPDATE statements are built. First, COLUMNDATA is opened manually (by default RPG opens its files when the program starts, but the USROPN keyword on the F-spec signals to the compiler that the programmer is responsible for opening it). One reason why we don't want to open the file automatically when the program starts is that the file might not yet exist. Once the file is open, we read the data and build an individual SQL UPDATE statement for each table found to do the search and replace work. There is a slight problem here. Before building the SQL UPDATE statement, we need to know if it will be a simple UPDATE (one field per table) or a compound UPDATE (multiple fields). If it's a simple update, the syntax will be easy to build: Update Customers Set Company=1 Where Company=3; However, if a single table has multiple fields that need to be updated, we have to build a compound statement, such as the following:
Update Transfer
Set FrCono=CASE WHEN FrCono=3 Then 1 Else FrCono End,
ToCono=CASE WHEN ToCono=3 Then 1 Else ToCono End
Where FrCono=3
Or ToCono=3;
Therefore, we need to know beforehand which type of UPDATE statement to build. This is where the NOFIELDS procedure comes in. This code will count the number of columns each table has in it. So every time a record is read and a new table name is encountered, NOFIELDS is invoked to find out how many columns are in the table. NOFIELDS works by doing a "read ahead" against the COLUMNDATA file. It first saves the relative record number of the current record in the file. Next, it reads records sequentially, incrementing a counter for each additional field it encounters, until a new table name is found. When a new table (or end of file) is encountered, the loop is exited and the file is positioned back to the original record when the procedure was invoked. When a new table name is encountered, comments are created containing the table information and the UPDATE section of the statement is built: UPDATE LIB/TABLE. Next, the SET section and the WHERE section of the SQL are built simultaneously. When building expressions for the SET portion, the identified column is set equal to the constant or expression specified in the NEWVALUE parameter of the command. Likewise, the WHERE portion is built by using the identified column and the OPERATOR and OLDVALUE parameters of the command. Additionally, if a compound statement is being built, expressions in the SET section are separated by a comma and expressions in the WHERE are separated by an OR. Once a single table has been processed, its complete SQL UPDATE statement will exist in three different program variables: SQLUpdate, SQLSet, and SQLWhere. (Each section of the SQL statement has its own character variable so they can be constructed independently.) Concatenating these variables will yield the entire SQL statement. At this point these variables may contain large strings. However, we need these statements to be placed neatly in an 80-column format for placement in a source file. The source member we will be populating is called GLBSQLSRC. This is where the WRITETEXT procedure comes in. WRITETEXT will accept a portion of SQL statement (such as the SQLSet) and write it to the source member. It acts as a simple word processor, in that it takes the unformatted text and causes it to wrap when a certain number of columns has been exceeded (usually 80 for a source member). WRITETEXT looks for embedded spaces as legitimate points to break up the statement. Because of this, the BEFORE and AFTER values had their embedded spaces replaced with a special hex character in the EDITVALUE subroutine. This was done so that if the search or replace expression contained an embedded space (say, MILLENIUM FALCON), the WRITETEXT routine wouldn't split the expression, if, perchance, the expression became too long. If this type of split were allowed to happen, the SQL processor might erroneously interpret the expression with a large number of embedded spaces. Also notice, before the data is written, any occurrences of the special hex character are restored to a space using the XLATE op code. In summary, section three of the program performs the following functions: It reads the COLUMNDATA file created in section two; for each table in the COLUMNDATA file, the number of fields that need updating for that table are ascertained by the NOFIELDS procedure; a simple or compound UPDATE SQL statement is built for each table encountered; an entry for the identified column is made in a spool file report; and the WRITETEXT procedure "wraps" the SQL statement and writes it to the source file and member specified in the GLBCHGDTA command. A final step of the program is to execute the RUNSQLSTM command, if the user calls for it. A command string is built to execute the statement with no commitment control and a level 40 severity (which will allow processing to continue even if an error is encountered.) GLBDTACHG is useful for managing application wide data changes. Leveraging the benefits of the system column table, the utility creates SQL statements to take the work out of the search and replace process. Michael Sansoterra is a programmer/analyst for SilverLake Resources, an IT services firm based in Grand Rapids, Michigan. He can be reached at msansoterra@silver-lake.com.
|
Editors
Contact the Editors |
|||||||||||||||||||||||||||||||||||||||||||||
| Copyright © 1996-2008 Guild Companies, Inc. All Rights Reserved. |