Refacing Your Database, Part 2

April 19, 2016 Paul Tuohy

The whole purpose of refacing our database is to give proper names to tables and columns. So we have to spend some time ensuring that our names are right. In this article, the second in a series of three, we continue the refacing process by analyzing and correcting our new naming.

In my first article, we had reached the stage where we had extracted table and column definitions into our two conversion tables (TABLE_TRANSLATION and COLUMN_TRANSLATION), as shown below:

Analyzing the Names

Since the new column names were generated from the text descriptions of the original fields, there is a good chance that some of the generated names might not be quite right.

We can use the stored procedure ANALYZE_NAMES to help us check for some of the more common issues with name generation. The stored procedure does not have any parameters. It returns four result sets and is called as follows:

call analyze_names();

The first result set highlights any problems with the generated long name for the physical file, as shown the figure below. The problem with the name is that there is a double underscore between SALES and HISTORY.

We can correct the long table name using the SQL statement:

update table_translation 
   set long_table_name = 'SALES_HISTORY_FILE'
   where long_table_name = 'SALES__HISTORY_FILE';

The second result set highlights any problems with the generated long column names, as shown below. In this instance, there are two problems:

The contacts first and last name have ‘_S_’ in the middle of the name. This occurred because there was a single apostrophe (before the S) that got replaced by an underscore.
The name for the PMSTOH field is *MISSING. This is because the field did not have a text description, so one could not be generated.

We can correct these names using the SQL statements:

update column_translation
   set long_column = 'CONTACTS_FIRST_NAME' 
   where (library, table_name, current_column) = 
         ('SQLFACETST', 'SACUSMAS', 'CMCFIRST');

update column_translation
   set long_column = 'CONTACTS_LAST_NAME' 
   where (library, table_name, current_column) = 
         ('SQLFACETST', 'SACUSMAS', 'CMCLAST');

update column_translation
   set long_column = 'STOCK_ON_HAND' 
   where (library, table_name, current_column) = 
         ('SQLFACETST', 'SAPRODMAS', 'PMSTOH');

The third result set highlights any potential problems where a proper name (a field name without a prefix or a suffix) has different generated long column names, as shown in the next figure. In this instance, although two items are listed, it is not a problem. If anything, this was a mistake in the original DDS, so, there is nothing we need to do to correct the problem.

The fourth result set highlights any potential problems where generated long column names have different proper names (a field name without a prefix or a suffix), as shown below. This is caused by different entities being given the same description.

We can correct these names using the SQL statements:

update column_translation
   set long_column = 'TOTAL_SALES' 
   where (library, table_name, current_column) = 
         ('SQLFACETST', 'SACUSMAS', 'CMTOTSAL');

We can now call ANALYZE_NAMES again and keep going until all potential problems have been corrected.

The ANALYZE_NAMES Stored Procedure

The next piece of code shows the DDL to create the ANALYZE_NAMES stored procedure. If you come up with any extra result sets or tweaks, please let me know!

CREATE PROCEDURE ANALYZE_NAMES ( ) 
	DYNAMIC RESULT SETS 4 
	LANGUAGE SQL 
	SPECIFIC ANALYZE_NAMES 
	NOT DETERMINISTIC 
	READS SQL DATA 
	CALLED ON NULL INPUT 
	PROGRAM TYPE SUB 
	CONCURRENT ACCESS RESOLUTION DEFAULT 
	SET OPTION  ALWBLK = *ALLREAD , 
	ALWCPYDTA = *OPTIMIZE , 
	COMMIT = *NONE , 
	DECRESULT = (31, 31, 00) , 
	DFTRDBCOL = *NONE , 
	DYNDFTCOL = *NO , 
	DYNUSRPRF = *USER , 
	SRTSEQ = *HEX
	BEGIN 
  
DECLARE C1 CURSOR FOR 
SELECT 'Table Name' AS PROBLEM_TYPE , LIBRARY , LONG_TABLE_NAME 
FROM TABTRANS 
WHERE LONG_TABLE_NAME = '*MISSING' OR 
LONG_TABLE_NAME LIKE '%+_+_%' ESCAPE '+' OR 
LONG_TABLE_NAME LIKE '%+__+_%' ESCAPE '+' ; 
  
DECLARE C2 CURSOR FOR 
SELECT 'Column Name' AS PROBLEM_TYPE , LIBRARY , TABLE_NAME , 
       CURRENT_COLUMN , COLUMN_SEQUENCE , PROPER_COLUMN , 
       LONG_COLUMN 
FROM COLTRANS 
WHERE LONG_COLUMN = '*MISSING' OR 
LONG_COLUMN LIKE '%+_+_%' ESCAPE '+' OR 
LONG_COLUMN LIKE '%+__+_%' ESCAPE '+' 
ORDER BY LIBRARY , TABLE_NAME , CURRENT_COLUMN , COLUMN_SEQUENCE ; 
  
  
DECLARE C3 CURSOR FOR 
WITH T1 AS 
( SELECT DISTINCT PROPER_COLUMN , LONG_COLUMN FROM COLTRANS ) , 
T2 AS 
( SELECT PROPER_COLUMN AS FOR_ERROR , COUNT (* ) FROM T1 
GROUP BY PROPER_COLUMN HAVING COUNT ( * ) > 1 ) 
SELECT 'Duplicate Proper Name' AS PROBLEM_TYPE , LIBRARY , TABLE_NAME , 
       CURRENT_COLUMN , COLUMN_SEQUENCE , PROPER_COLUMN , LONG_COLUMN 
FROM COLTRANS A INNER JOIN T2 B ON A . PROPER_COLUMN = B . FOR_ERROR 
WHERE LONG_COLUMN >< '*NONE' 
ORDER BY LIBRARY , TABLE_NAME , CURRENT_COLUMN , COLUMN_SEQUENCE ; 
  
DECLARE C4 CURSOR FOR 
WITH T1 AS 
( SELECT DISTINCT PROPER_COLUMN , LONG_COLUMN FROM COLTRANS ) , 
T2 AS 
( SELECT LONG_COLUMN AS FOR_ERROR , COUNT ( * ) FROM T1 
GROUP BY LONG_COLUMN HAVING COUNT ( * ) > 1 ) 
SELECT 'Duplicate Long Name' AS PROBLEM_TYPE , LIBRARY , TABLE_NAME , 
       CURRENT_COLUMN , COLUMN_SEQUENCE , PROPER_COLUMN , LONG_COLUMN 
FROM COLTRANS A INNER JOIN T2 B ON A . LONG_COLUMN = B . FOR_ERROR 
WHERE LONG_COLUMN >< '*NONE' 
ORDER BY LIBRARY , TABLE_NAME , CURRENT_COLUMN , COLUMN_SEQUENCE ; 
  
OPEN C1 ; 
OPEN C2 ; 
OPEN C3 ; 
OPEN C4 ; 
  
SET RESULT SETS CURSOR C1 , CURSOR C2 , CURSOR C3 , CURSOR C4 ; 
  
  
END  ; 
  
LABEL ON SPECIFIC PROCEDURE ANALYZE_NAMES 
	IS 'ANALYZE Common Column Issues' ;

The Boring Bit

Now comes the tedious and boring bit. Someone has to plough through all the new long names and check whether or not they make sense. In the original text, there might have been typos, abbreviations, etc.

While ploughing through these sample lists (refer back the first figure), I spotted a couple of problems:

All of the table names have the word FILE in their description.
The customer ID is CUSTOMER_ID_NUMBER.

These can be fixed with the SQL statements:

update table_translation 
   set long_table_name = 
             replace(long_table_name, '_FILE', ''), 
       table_text = replace(table_text, 'File', 'Table');

update column_translation
       set long_column = 'CUSTOMER_ID' 
       where proper_column = 'CUSNO';

When finished, there is no harm in calling ANALYZE_NAMES again, in case any corrections may have caused problems in another area.

Re-representing Data

At this point, if all has gone to plan, we now have proper names in place for all our tables and columns. We could go ahead and generate our CREATE VIEW script(s) but there is one more thing we might want to consider. Do we want to represent all columns the way they are? An obvious example is the dates in the Sales History table (which we will look at in the next article), but what about some of the character columns like description, name, first name and last name? In a modern database, these would be defined as VARCHAR as opposed to CHAR.

There are two columns on the COLUMN_TRANSLATION table that can help us – USE_FUNCTION_NAME and EXTRA_PARAMETERS. These allow us to specify a scalar function to be used in the view when selecting data from the original table.

To re-represent the required columns as VARCHAR simply requires that we TRIM the columns when we select them. We update COLUMN_TRANSLATION with the following statement:

update column_translation
       set use_function_name = 'TRIM' 
       where proper_column in ('CFIRST', 'CLAST', 'NAME', 'DESC');

We will see how EXTRA_PARAMETERS is used in the next article and we will also finally get to generating our CREATE VIEW script.

Paul Tuohy is CEO of ComCon, an iSeries consulting company, and is one of the co-founders of System i Developer, which hosts the RPG & DB2 Summit conferences. He is an award-winning speaker who also speaks regularly at COMMON conferences, and is the author of “Re-engineering RPG Legacy Applications,” “The Programmers Guide to iSeries Navigator,” and the self-study course called “iSeries Navigator for Programmers.” Send your questions or comments for Paul to Ted Holt via the IT Jungle Contact page.

Refacing Your Database, Part 1

Table of Contents

Content archive

Recent Posts

Subscribe

Pages

Search