• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Guru: In Pursuit Of Reasonable Data

    March 9, 2020 Ted Holt

    My awareness of the indifference of computers toward the reasonableness of data dawned when I enrolled in the university. The band director told me to sign up for band and jazz band for zero semester hours. He said the stupid computer wouldn’t know that no one takes a course for zero hours, wouldn’t flag me to obtain permission from the dean to carry an overload, yet would grant me the credit. He was right.

    Love ‘em or hate ‘em, we live with them, we learn to get around them when they limit us, and we deal with them when they make mistakes. Except that you and I know that computers don’t make mistakes. As someone has said, “He who blames the computer is making two mistakes, one of which is blaming the computer.”

    For a computer to do its job properly, two things are necessary: reliable, dependable, bug-free software, and reasonable data. I wrote about the latter recently, and I wish to continue with that topic today.

    Domains

    You have undoubtedly heard the general database terms that IBM encourages us to use these days: table instead of physical file, row instead of record, column instead of field. Let me introduce another relational database term: domain. To quote a textbook I found online, “a domain is a set of acceptable values that a column is allowed to contain.” I don’t believe anyone could give a better definition of domain.

    Enforcing a set of acceptable values is easy when those acceptable values are few in number. We need only add an appropriate constraint, and the database will not accept invalid data. Notice the ID and Type columns in the following table definition.

    create or replace table EmployeeMaster
       for system name EmpMast
      ( ID      dec(5) primary key check (ID > 0),
        Name    varchar(16),
        Type    char(1)    check (Type in ('S', 'H', 'T', 'P')),
        Salary  dec(9,2),
        Wage    dec(5,2)
       )
    

    The acceptable values for the ID and Type columns can easily be listed. Given enough time, I could list all the positive five-digit whole numbers. I could list the four permitted values of Type much more quickly.

    But what about a column for which the number of acceptable values is impractical or impossible to list? What we do about that?

    We look for a pattern.

    Let’s say that we work for a company that maintains inventory. Each inventory item has an ID number. The first question to ask ourselves is “what is the domain of item numbers?” Or to say it in plain English, “What does an item number look like?”

    In our pretend company, an item number is one of the letters A, B, or C, followed by four digits.

    Some valid item numbers Some invalid item numbers
    A2157 2A157
    B3200 C32
    C0000 BR549

    The following check constraint enforces this pattern:

    create or replace table . . .
     item char(5)  
       check (translate(item, 'AA000000000','BC123456789') = 'A0000'),
    . . .
    

    The TRANSLATE function returns the first value after the characters in the second value have replaced the corresponding characters in the third value. In this example, all Bs and Cs are converted to As and digits 1 through 9 are converted to zeros. Everything else, including As and zeros, remains unconverted. If the converted value is A0000, the item number is in the domain.

    Suppose item numbers have two patterns. Some follow the convention just mentioned. Others consist of two of the letters A, B and C in any combination followed by three digits. Here’s the check constraint:

    item char(5) 
     check (translate(item, 'AA000000000','BC123456789')
                in ('A0000', 'AA000'))
    

    TRANSLATE converts the characters to the corresponding characters and the result can be either of two values.

    Let’s consider one more example. Let’s check a name for reasonableness. What’s the domain for a name of a person?

    In English, names consist of letters and two punctuation marks: hyphen and apostrophe. Some people have names with an embedded blank. There’s no way to check for all possible names, but we can check for a pattern. Here’s one possibility:

    name char(25)
    check (translate(name, ' ',
      'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-''')
        = ' '),
    

    TRANSLATE replaces all uppercase letters, all lowercase letters, hyphens and apostrophes with blanks. Since the second parameter is shorter than the third parameter, the system pads the second parameter with enough blanks to make it as long as the third parameter. If the name has a character that is not found in the third parameter, TRANSLATE will not convert that character to blank and TRANSLATE will return a non-blank value.

    This doesn’t completely enforce reasonableness, of course. It would allow a name like JXnD’pt—-k, but the computer can only do so much.

    Is the Best Yet to Come?

    Some other relational database management systems allow the use of regular expressions in constraints. For example, you might do this to check for the A0000 pattern:

    item char(5) check (regexp_like(item,'[A-Z][0-9]{4}'))
    

    While DB2 for i supports regular expressions in other contexts, it doesn’t let us use them in constraints — yet. I suspect that it’s only a matter of time until IBM adds this support. Regular expressions would allow us to be even more specific about pattern matches than TRANSLATE does, but TRANSLATE will have to do for now.

    Midrange shops have traditionally enforced the reasonableness of data through validation routines in file maintenance and data entry programs. This is not an effective strategy for keeping an accurate database. If it were, I would not have seen so much bad data over the past decades. We need constraints and triggers in addition to the validation routines.

    RELATED STORIES

    Guru: Better Check Constraints

    Database Design – 2nd Edition – Chapter 7 The Relational Data Model (Adrienne Watt)

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags: Tags: 400guru, FHG, Four Hundred Guru, IBM i

    Sponsored by
    Raz-Lee Security

    Start your Road to Zero Trust!

    Firewall Network security, controlling Exit Points, Open DB’s and SSH. Rule Wizards and graphical BI.

    Request Demo

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    As I See It: In the Shadow of The West Wing Thoroughly Modern: Giving IBM i Developers A Helping Hand

    Leave a Reply Cancel reply

TFH Volume: 30 Issue: 17

This Issue Sponsored By

  • ProData Computer Services
  • Fresche Solutions
  • Fresche Solutions
  • WorksRight Software
  • Raz-Lee Security

Table of Contents

  • How GraphQL Can Improve IBM i APIs
  • Thoroughly Modern: Giving IBM i Developers A Helping Hand
  • Guru: In Pursuit Of Reasonable Data
  • As I See It: In the Shadow of The West Wing
  • Big Blue Cuts Deals On Entry Power Systems Iron

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • FAX/400 And CICS For i Are Dead. What Will IBM Kill Next?
  • Fresche Overhauls X-Analysis With Web UI, AI Smarts
  • Is It Time To Add The Rust Programming Language To IBM i?
  • Is IBM Going To Raise Prices On Power10 Expert Care?
  • IBM i PTF Guide, Volume 27, Number 20
  • POWERUp 2025 –Your Source For IBM i 7.6 Information
  • Maxava Consulting Services Does More Than HA/DR Project Management – A Lot More
  • Guru: Creating An SQL Stored Procedure That Returns A Result Set
  • As I See It: At Any Cost
  • IBM i PTF Guide, Volume 27, Number 19

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle