Guru: Data-Centrism, Step Zero

Guru: Data-Centrism, Step Zero

September 10, 2018 Ted Holt

I hear a lot these days about the need for data-centric information systems. That is as it should be. The proper way to support an organization is to remove logic from application programs and put it into the database through such devices as constraints and triggers. However, before many shops can take the first step in that direction, they need to take what I call step zero.

Step zero in data-centric computing is to remove hard-coded data values from programs and put them into the database. Just as the database manager should enforce business rules (e.g. we don’t ship to customers who owe us money), so should data drive special logic for specific entities. In my work, I see hard-coded warehouse IDs, customer numbers, device IDs, library names, output queue names, user profiles, etc. I wish that hard coding were so rare that I couldn’t justify writing an article on the topic, but unfortunately it is common.

In the following paragraphs, I tell how to identify hard coding, how to move hard-coded data into the database, and how to simplify the programming required to access the data.

Identifying Hard Coding

I suspect most people know hard coding when they see it, but you’re ever in doubt, ask yourself one or more of the following questions:
1. If this software were in use in other organizations, would this value make sense?
2. Is this value the primary key, or part of the primary key, of a database table (physical file)?
3. If I were to add another of this type of value, would I have to modify and recompile any programs?
Here’s an RPG example with hard-coded warehouse IDs.
```
C              IF        WHSID = 1
C              EVAL      WHSNAM = 'NEW YOLK'
C              EVAL      LEADTIME = 14
C              EVAL      MFGWHS = 'N'
C              ELSE
C              IF        WHSID = 2
C              EVAL      WHSNAM = 'LOST ANGELES'
C              EVAL      LEADTIME = 7
C              EVAL      MFGWHS = 'Y'
C              ENDIF
C              ENDIF
```
Question 1: If this software were in use in other organizations, would this value make sense?

Answer: No. Other organizations might not have a warehouse 1 or 2. If they do have a warehouse by either ID, it’s probably not located in New Yolk or Lost Angeles, the lead times are probably different, and whether or not those warehouses are manufacturing warehouses or not is likely to be different.

Question 2: Is this value the primary key, or part of the primary key, of a database table (physical file)?

Answer: Maybe. If we have a warehouse master file, then yes, the warehouse ID is the key.

Question 3: If I were to add another of this type of value, would I have to modify and recompile any programs?

Answer: If our organization were to open a new warehouse, we’d have to modify this program and others that test for warehouse IDs.

The logical conclusion is that we should not be hard coding warehouse IDs.

Now consider this:
```
C              IF        CUSCLASS = 'A'
 . . . do whatever
C              ENDIF
```
In this application, customers are categorized as A (our best customers), B (occasional customers whom we don’t wish to lose), or C (customers who we wish would take their business elsewhere). Does it make sense to hard code the A? Yes, it does. All sorts of business have these types of customers, so to make this type of classification a feature of our software makes sense. Consider also that customer type is not a primary key, but an attribute of a customer.

Identifying New Attributes

Sometimes attributes are not so obvious. That is, an attribute exists, but we don’t think of it as such. For example:
```
C                   IF        WHSID = 1 OR
C                             WHSID = 5 OR
C                             WHSID = 12  
 . . . calculate lead time one way
C                   ELSE                  
C                   IF        WHSID = 2 OR
C                             WHSID = 6   
 . . . calculate lead time a different way
C                   ELSE                  
 . . . calculate lead time using yet another way
C                   ENDIF                 
C                   ENDIF                 
```
This code differs from the first example in that different warehouse IDs drive different calculations. Rather than assign a lead time, we use different methods to calculate lead times. That is, we have three calculation methods. We need to assign codes, such as A, B, C or 1, 2, 3 — anything that makes sense — and add a lead-time calculation method column (field) to the appropriate warehouse table. The revised code looks like this:
```
C      WHSID        CHAIN     WHSMAST
 . . . handle not found
C                   IF        LTMETHOD = 'A' 
 . . . calculate lead time one way
C                   ELSE                  
C                   IF        LTMETHOD = 'B' 
 . . . calculate lead time a different way
C                   ELSE                  
 . . . calculate lead time using yet another way
C                   ENDIF                 
C                   ENDIF                 
```
Moving Into The Database

In relational databases, data is stored in tables (physical files). When moving hard-coded data into a table, you have three options:
- If there is no suitable table for the data, create one.
- If there is a suitable table for the data, add new columns (fields) to hold the attributes.
- If there is a suitable table, but adding new columns is impractical, create an auxiliary table with the same key.
Let’s consider each of these in turn.

You have created many tables, so there is no need for me to tell you how to create one, but for completeness, here’s the command to create a warehouse master table:
```
create table warehouses
 ( ID                           dec(3)
      primary key,
   Name                     varchar(20),
   LeadTime                     dec(3)
      not null with default 0,
   IsManufacturing for MfgWhs  char(1)
      not null with default 'N'
      check (IsManufacturing in ('Y', 'N'))
 )

insert into warehouses values
   ( 1, 'New Yolk', 14, 'Y'),
   ( 2, 'Lost Angeles', 7, 'N')
```
You can also use DDS to create a physical file, but I prefer and recommend SQL.

If you already have a warehouse master, consider adding new columns.
```
alter table warehouses
   add column Name                     varchar(20)
   add column LeadTime                     dec(3)
                not null with default 0
   add column IsManufacturing for MfgWhs  char(1)
                not null with default 'N'
                check (IsManufacturing in ('Y', 'N') )
```
Sometimes adding new columns is impractical. This is often the case in shops that run a software package. Modifying the software vendor’s table is not an option. In such a case, create an auxiliary table. Use the same SQL statement you would use to create a new master file, as in the example above. The two tables will have the same key fields. You may not always have a record in the auxiliary table for every record in the master table, so joins between the two will typically be outer joins with the master table in primary position.

With the data in the appropriate table, the program in the first example can be revised to do a random read (in RPG terms, a CHAIN) to the table to retrieve the appropriate data values.

Simplify The Programming

Here’s another common use of hard-coded data, this time in a CL program. Users Nosmo King, Malcolm Oron, and Donald Truck have requested certain enhancements to a report.
```
DCL        VAR(&USER) TYPE(*CHAR) LEN(10)
RTVJOBA    USER(&USER)
IF         COND(&USER = 'NKING') THEN(OVRPRTF +
             FILE(SUPERPRTF) PRTTXT('Nosmo King') +
             OUTQ(PRODCTL3))
ELSE       CMD(IF COND(&USER = 'MORON') THEN(OVRPRTF +
             FILE(SUPERPRTF) OUTQ(ACCTSPAY)))
ELSE       CMD(IF COND(&USER = 'DTRUCK') THEN(OVRPRTF +
             FILE(SUPERPRTF) PRTTXT('Donald Truck') +
             COPIES(2)))
CALL       PGM(SUPERPGM)
DLTOVR     FILE(SUPERPRTF)
```
I’ve seen a lot of this. Chances are good that you have too. How can we move this override information into the database? Here’s one way.

First, let’s create a table of override commands.
```
create table overrides
  ( UserID    char(10),
    ReportID  char(10),
    Override  varchar(350) not null with default,
    primary key (UserID, ReportID))

insert into overrides values                                     
('NKING','SUPERPRTF','PRTTXT(''Nosmo King'') OUTQ(PRODCTL3)'), 
('MORON','SUPERPRTF','OUTQ(ACCTSPAY)'), 
('DTRUCK','SUPERPRTF','PRTTXT(''Donald Truck'') COPIES(2)') 
```
The overrides are in the database. How do I access them?

I could add Declare File (DCLF) and Receive File (RCVF) commands to all such CL programs, but that’s for the birds. I’m not even going to show you the code to make that happen. A better way is to write a program — such as this GETOVR example — to retrieve the override.
```
ctl-opt  option(*srcstmt: *nodebugio);                  
                                                        
dcl-f  overrides  keyed  rename(overrides: OvrRec);     
                                                        
dcl-pr  GETOVR  extpgm('GETOVR');                 
   inUser     char(10)    const;                        
   inReport   char(10)    const;                        
   ouOverride char(256);                                
end-pr  GETOVR;                                      
                                                        
dcl-pi  GETOVR;                                      
   inUser     char(10)    const;                        
   inReport   char(10)    const;                        
   ouOverride char(256);                                
end-pi  GETOVR;                                      
                                                        
chain (inUser: inReport)  OvrRec;                       
if %found();                                            
   ouOverride = Override;                               
else;                                                   
   ouOverride = *blanks;                                
endif;                                                  
                                                        
return;     
```
Modify the CL programs to call this program.
```
DCL  VAR(&USER) TYPE(*CHAR) LEN(10)
DCL  VAR(&OVERPARMS) TYPE(*CHAR) LEN(350)
DCL  VAR(&OVERRIDE) TYPE(*CHAR) LEN(350)
DCL  VAR(&OLENGTH) TYPE(*DEC) LEN(15 5) VALUE(350)

RTVJOBA USER(&USER)

CALL GETOVR (&USER SUPERPRTF &OVERPARMS)
IF COND(&OVERPARMS *NE ' ') THEN(DO)
   CHGVAR   &OVERRIDE  ('OVRPRTF SUPERPRTF' *BCAT &OVERPARMS)
   CALL QCMDEXC (&OVERRIDE &OLENGTH)
ENDDO
CALL PGM(SUPERPGM)
IF COND(&OVERPARMS *NE ' ') +
   THEN(DLTOVR FILE(SUPERPRTF))
```
It is much easier to add a CALL than I/O commands.

I did not invent this idea. I’ve seen it used many times. In fact, some people get pretty fancy with it. For instance, the systems I have used defined separate fields for all the override parameters, instead of one long command string. It is also common to allow generic values or special values like *ALL in the key fields. But even the barebones code in my example gets the data out of the program and into the database, where it belongs.

If your shop is already data-centric, spectacular! If you’re still moving in that direction, magnificent! If your shop has yet to take step zero, get busy!

RELATED STORY

Generic Processing, Continued
Share this:
Reddit
Facebook
LinkedIn
Twitter
Email
Tags: Tags: 400guru, CL, FHG, Four Hundred Guru, IBM i, RPG, SQL
Sponsored by
WorksRight Software
Do you need area code information?
Do you need ZIP Code information?
Do you need ZIP+4 information?
Do you need city name information?
Do you need county information?
Do you need a nearest dealer locator system?

We can HELP! We have affordable AS/400 software and data to do all of the above. Whether you need a simple city name retrieval system or a sophisticated CASS postal coding system, we have it for you!

The ZIP/CITY system is based on 5-digit ZIP Codes. You can retrieve city names, state names, county names, area codes, time zones, latitude, longitude, and more just by knowing the ZIP Code. We supply information on all the latest area code changes. A nearest dealer locator function is also included. ZIP/CITY includes software, data, monthly updates, and unlimited support. The cost is $495 per year.

PER/ZIP4 is a sophisticated CASS certified postal coding system for assigning ZIP Codes, ZIP+4, carrier route, and delivery point codes. PER/ZIP4 also provides county names and FIPS codes. PER/ZIP4 can be used interactively, in batch, and with callable programs. PER/ZIP4 includes software, data, monthly updates, and unlimited support. The cost is $3,900 for the first year, and $1,950 for renewal.

Just call us and we’ll arrange for 30 days FREE use of either ZIP/CITY or PER/ZIP4.

WorksRight Software, Inc.
Phone: 601-856-8337
Fax: 601-856-9432
Email: software@worksright.com
Website: www.worksright.com
Share this:
Reddit
Facebook
LinkedIn
Twitter
Email
IBM Showcases New Crop of ‘Fresh Faces’ Zend Server 2018 Brings PHP 7.2 To IBM i
6 thoughts on “Guru: Data-Centrism, Step Zero”
- Tony says:
  
  September 10, 2018 at 10:01 am
  
  The first two pictures are the same, I guess a typo when inserting the image into the article.
  
  Reply
- Dan Devoe says:
  
  September 10, 2018 at 1:47 pm
  
  Excellent article!
  
  And I needed to chuckle, because one of the areas where the data is hard-coded, there appears to be a typo – which is yet another reason why you don’t want to hard-code… 🙂
  
  EVAL WHSNAM = ‘NEW YOLK’
  
  Reply
- Hassan Farooqi says:
  
  September 10, 2018 at 3:41 pm
  
  Nice! The fixed format RPG tells us the reason why the programmer committed the “Cardinal Sin” of hard-coding. This mentality has resulted in the downfall of RPG, and subsequently that of IBM i itself.
  
  The reason for this mentality is that many RPG developer lack higher education and therefore broader vision to foresee the hazards of their laziness.
  
  I have seen high level officials fired after the hard-coding resulted in stoppage of business (level 0 emergency).
  
  The ideal thing is to go beyond step 0 that Ted suggested. There are Q&A tables, and then there are logic tables with SQL statements on it. These SQL statements are picked up from the table and embedded in the program, depending on the condition from Q&A table.
  
  There is lot to be done, and not all shops do all of the things. Having worked in dozens of good shops can give you an idea of how things can be, should be, and IS done (in some shops).
  
  Reply
- Christopher Burns, Sr. says:
  
  September 13, 2018 at 8:26 am
  
  Excellent expose about different forms of technical debt that many programmers would just gloss over. Another example that I run into at legacy shops is the compile time tables. My STDXREF (Standard Cross Reference) tool is a simple little open-source vehicle for one-to-one lookups with either alpha or numeric arguments (and results). One point of maintenance and one point of lookup. Sort of a “table file” on steroids. It really helped me burn down compile time tables and hard coded IF or SELECT blocks when doing modernization projects. It actually reminds me of the table-driven programming topic you’ve done at COMMON in the past. I added it to my GitHub collection earlier this year (after freshening it up from its 2005 birthday suit).
  
  Reply
- Rick says:
  
  September 19, 2018 at 4:39 pm
  
  Great reminder of stuff that not everyone thinks about. Makes life much more flexible as business rules and users change.
  
  Reply
- Jim says:
  
  October 15, 2018 at 7:54 am
  
  I find cases where data is hardcoded (state names, product categories are a few example) hardcoded for tables or arrays in dozens of programs. While these programs were mostly written without using ILE, they still don’t need this hardcoding.
  
  Reply
Leave a Reply Cancel reply