Guru: Data-Centrism, Step Zero
September 10, 2018 Ted Holt
I hear a lot these days about the need for data-centric information systems. That is as it should be. The proper way to support an organization is to remove logic from application programs and put it into the database through such devices as constraints and triggers. However, before many shops can take the first step in that direction, they need to take what I call step zero.
Step zero in data-centric computing is to remove hard-coded data values from programs and put them into the database. Just as the database manager should enforce business rules (e.g. we don’t ship to customers who owe us money), so should data drive special logic for specific entities. In my work, I see hard-coded warehouse IDs, customer numbers, device IDs, library names, output queue names, user profiles, etc. I wish that hard coding were so rare that I couldn’t justify writing an article on the topic, but unfortunately it is common.
In the following paragraphs, I tell how to identify hard coding, how to move hard-coded data into the database, and how to simplify the programming required to access the data.
Identifying Hard Coding
I suspect most people know hard coding when they see it, but you’re ever in doubt, ask yourself one or more of the following questions:
- If this software were in use in other organizations, would this value make sense?
- Is this value the primary key, or part of the primary key, of a database table (physical file)?
- If I were to add another of this type of value, would I have to modify and recompile any programs?
Here’s an RPG example with hard-coded warehouse IDs.
C IF WHSID = 1 C EVAL WHSNAM = 'NEW YOLK' C EVAL LEADTIME = 14 C EVAL MFGWHS = 'N' C ELSE C IF WHSID = 2 C EVAL WHSNAM = 'LOST ANGELES' C EVAL LEADTIME = 7 C EVAL MFGWHS = 'Y' C ENDIF C ENDIF
Question 1: If this software were in use in other organizations, would this value make sense?
Answer: No. Other organizations might not have a warehouse 1 or 2. If they do have a warehouse by either ID, it’s probably not located in New Yolk or Lost Angeles, the lead times are probably different, and whether or not those warehouses are manufacturing warehouses or not is likely to be different.
Question 2: Is this value the primary key, or part of the primary key, of a database table (physical file)?
Answer: Maybe. If we have a warehouse master file, then yes, the warehouse ID is the key.
Question 3: If I were to add another of this type of value, would I have to modify and recompile any programs?
Answer: If our organization were to open a new warehouse, we’d have to modify this program and others that test for warehouse IDs.
The logical conclusion is that we should not be hard coding warehouse IDs.
Now consider this:
C IF CUSCLASS = 'A' . . . do whatever C ENDIF
In this application, customers are categorized as A (our best customers), B (occasional customers whom we don’t wish to lose), or C (customers who we wish would take their business elsewhere). Does it make sense to hard code the A? Yes, it does. All sorts of business have these types of customers, so to make this type of classification a feature of our software makes sense. Consider also that customer type is not a primary key, but an attribute of a customer.
Identifying New Attributes
Sometimes attributes are not so obvious. That is, an attribute exists, but we don’t think of it as such. For example:
C IF WHSID = 1 OR C WHSID = 5 OR C WHSID = 12 . . . calculate lead time one way C ELSE C IF WHSID = 2 OR C WHSID = 6 . . . calculate lead time a different way C ELSE . . . calculate lead time using yet another way C ENDIF C ENDIF
This code differs from the first example in that different warehouse IDs drive different calculations. Rather than assign a lead time, we use different methods to calculate lead times. That is, we have three calculation methods. We need to assign codes, such as A, B, C or 1, 2, 3 — anything that makes sense — and add a lead-time calculation method column (field) to the appropriate warehouse table. The revised code looks like this:
C WHSID CHAIN WHSMAST . . . handle not found C IF LTMETHOD = 'A' . . . calculate lead time one way C ELSE C IF LTMETHOD = 'B' . . . calculate lead time a different way C ELSE . . . calculate lead time using yet another way C ENDIF C ENDIF
Moving Into The Database
In relational databases, data is stored in tables (physical files). When moving hard-coded data into a table, you have three options:
- If there is no suitable table for the data, create one.
- If there is a suitable table for the data, add new columns (fields) to hold the attributes.
- If there is a suitable table, but adding new columns is impractical, create an auxiliary table with the same key.
Let’s consider each of these in turn.
You have created many tables, so there is no need for me to tell you how to create one, but for completeness, here’s the command to create a warehouse master table:
create table warehouses ( ID dec(3) primary key, Name varchar(20), LeadTime dec(3) not null with default 0, IsManufacturing for MfgWhs char(1) not null with default 'N' check (IsManufacturing in ('Y', 'N')) ) insert into warehouses values ( 1, 'New Yolk', 14, 'Y'), ( 2, 'Lost Angeles', 7, 'N')
You can also use DDS to create a physical file, but I prefer and recommend SQL.
If you already have a warehouse master, consider adding new columns.
alter table warehouses add column Name varchar(20) add column LeadTime dec(3) not null with default 0 add column IsManufacturing for MfgWhs char(1) not null with default 'N' check (IsManufacturing in ('Y', 'N') )
Sometimes adding new columns is impractical. This is often the case in shops that run a software package. Modifying the software vendor’s table is not an option. In such a case, create an auxiliary table. Use the same SQL statement you would use to create a new master file, as in the example above. The two tables will have the same key fields. You may not always have a record in the auxiliary table for every record in the master table, so joins between the two will typically be outer joins with the master table in primary position.
With the data in the appropriate table, the program in the first example can be revised to do a random read (in RPG terms, a CHAIN) to the table to retrieve the appropriate data values.
Simplify The Programming
Here’s another common use of hard-coded data, this time in a CL program. Users Nosmo King, Malcolm Oron, and Donald Truck have requested certain enhancements to a report.
DCL VAR(&USER) TYPE(*CHAR) LEN(10) RTVJOBA USER(&USER) IF COND(&USER = 'NKING') THEN(OVRPRTF + FILE(SUPERPRTF) PRTTXT('Nosmo King') + OUTQ(PRODCTL3)) ELSE CMD(IF COND(&USER = 'MORON') THEN(OVRPRTF + FILE(SUPERPRTF) OUTQ(ACCTSPAY))) ELSE CMD(IF COND(&USER = 'DTRUCK') THEN(OVRPRTF + FILE(SUPERPRTF) PRTTXT('Donald Truck') + COPIES(2))) CALL PGM(SUPERPGM) DLTOVR FILE(SUPERPRTF)
I’ve seen a lot of this. Chances are good that you have too. How can we move this override information into the database? Here’s one way.
First, let’s create a table of override commands.
create table overrides ( UserID char(10), ReportID char(10), Override varchar(350) not null with default, primary key (UserID, ReportID)) insert into overrides values ('NKING','SUPERPRTF','PRTTXT(''Nosmo King'') OUTQ(PRODCTL3)'), ('MORON','SUPERPRTF','OUTQ(ACCTSPAY)'), ('DTRUCK','SUPERPRTF','PRTTXT(''Donald Truck'') COPIES(2)')
The overrides are in the database. How do I access them?
I could add Declare File (DCLF) and Receive File (RCVF) commands to all such CL programs, but that’s for the birds. I’m not even going to show you the code to make that happen. A better way is to write a program — such as this GETOVR example — to retrieve the override.
ctl-opt option(*srcstmt: *nodebugio); dcl-f overrides keyed rename(overrides: OvrRec); dcl-pr GETOVR extpgm('GETOVR'); inUser char(10) const; inReport char(10) const; ouOverride char(256); end-pr GETOVR; dcl-pi GETOVR; inUser char(10) const; inReport char(10) const; ouOverride char(256); end-pi GETOVR; chain (inUser: inReport) OvrRec; if %found(); ouOverride = Override; else; ouOverride = *blanks; endif; return;
Modify the CL programs to call this program.
DCL VAR(&USER) TYPE(*CHAR) LEN(10) DCL VAR(&OVERPARMS) TYPE(*CHAR) LEN(350) DCL VAR(&OVERRIDE) TYPE(*CHAR) LEN(350) DCL VAR(&OLENGTH) TYPE(*DEC) LEN(15 5) VALUE(350) RTVJOBA USER(&USER) CALL GETOVR (&USER SUPERPRTF &OVERPARMS) IF COND(&OVERPARMS *NE ' ') THEN(DO) CHGVAR &OVERRIDE ('OVRPRTF SUPERPRTF' *BCAT &OVERPARMS) CALL QCMDEXC (&OVERRIDE &OLENGTH) ENDDO CALL PGM(SUPERPGM) IF COND(&OVERPARMS *NE ' ') + THEN(DLTOVR FILE(SUPERPRTF))
It is much easier to add a CALL than I/O commands.
I did not invent this idea. I’ve seen it used many times. In fact, some people get pretty fancy with it. For instance, the systems I have used defined separate fields for all the override parameters, instead of one long command string. It is also common to allow generic values or special values like *ALL in the key fields. But even the barebones code in my example gets the data out of the program and into the database, where it belongs.
If your shop is already data-centric, spectacular! If you’re still moving in that direction, magnificent! If your shop has yet to take step zero, get busy!