• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Guru: The Uncertainty of Redundant Row Selection

    February 8, 2021 Ted Holt

    Is redundancy good or bad? I say it depends. According to Nassim Nicholas Taleb, “Redundancy is ambiguous because it seems like a waste if nothing unusual happens. Except that something unusual happens — usually.” I have seen some unusual behavior when joining database tables, but try as I might, I can’t figure out what that unusual behavior depends on. Let me show you what I mean.

    First, we need some data for illustration. Let’s say that our company uses an ERP system that was designed for make-to-stock manufacturing. What the factory builds goes into the warehouse, and customer orders are filled from the stock in the warehouse. There is no link between sales orders and manufacturing orders.

    However, the Powers That Be decided years ago, before we came onto the scene, that the business should operate on a make-to-order basis, which requires manufacturing orders to be linked to sales orders. That presented a problem for IT (or MIS or Data Processing or whatever they were calling business computing in those days). Some enterprising programmer suggested they store sales order and sales order line number in an unused text field for notes in the manufacturing order, and it’s been that way since.

    Here are the tables.

    create table sohdr
      ( OrderNbr    dec(5)  primary key,
        CustNbr     dec(5),
        OrderDate   dec(8));
        
    create table sodtl
      ( OrderNbr    dec(5),
        LineNbr     dec(2),
        ItemNbr     char(6),
        Quantity    dec(3),
        Price       dec(5,2),
       primary key (OrderNbr, LineNbr));
       
    create table mo
      ( OrderNbr    dec(5)   primary key,
        Item        char(6),
        Quantity    dec(3),
        SchedDate   dec(8),
        Notes       char(30));
        
    insert into sohdr values
    ( 100, 220, 20210115),
    ( 101, 318, 20210115),
    ( 102, 190, 20210116);
            
    insert into sodtl values
    ( 100,   1, 'AB-101',  12,   1.00),        
    ( 100,   2, 'DA-033',  10,  20.00),        
    ( 101,   1, 'AB-101',   6,   1.25),
    ( 102,   1, 'JH-712',   9,   2.00); 
    
    insert into mo values
    ( 751, 'AB-101', 12, 20210118, '0010001');
    

    There is a one-to-many relationship between sales order header and sales order detail, and a one-to-one relationship between sales order detail and manufacturing order.

    Now it’s time to query the database. Someone we serve wants to know the customer we’re selling to, the items they’re purchasing, and the amount of money per item that we expect to get from the customer for sales order number 100.

    -- Query 1
    select h.OrderNbr, h.CustNbr, d.ItemNbr,
           d.Quantity * d.Price as Extended
      from sohdr as h
      join sodtl as d
        on h.OrderNbr = d.OrderNbr
     where h.OrderNbr = 100;
    

    The query selects one row (record) from the sales order header table and one or more rows from the sales order detail. The desired row is indicated by the where clause.

    Of course, the query will also only select detail rows for order 100, due to the nature of the equi-join operation. We don’t have to explicitly tell SQL that we only want detail rows for order 100, but we could, like this:

    -- Query 2
    select h.OrderNbr, h.CustNbr, d.ItemNbr,
           d.Quantity * d.Price as Extended
      from sohdr as h
      join sodtl as d
        on h.OrderNbr = d.OrderNbr
     where h.OrderNbr = 100
       and d.OrderNbr = 100;
    

    What do you think of the new WHERE clause? Not only did I test the header for order 100, I tested the detail table as well. I call this a redundant test. If you know of a formal term that’s already used for an unnecessary test against a secondary table, please let me know.

    Does the redundant test improve the query? When I ran this code, I got worse performance with the redundant test than I did with Query 1. Visual Explain showed an additional logic step for Query 2. That strikes me as strange.

    Next, I tried moving the redundant test into the JOIN.

     -- Query 3
    select h.OrderNbr, h.CustNbr, d.ItemNbr,
           d.Quantity * d.Price as Extended
      from sohdr as h
      join sodtl as d
        on h.OrderNbr = d.OrderNbr
       and d.OrderNbr = 100
     where h.OrderNbr = 100;

    Performance of Query 3 was even worse than that of Query 2. Query 1, the simplest of the three, performed best.

    Now let’s add the manufacturing order to the mix. The recipient of our efforts has requested that we add the manufacturing order number and the date the product is to be built. They also only want to know about line 1 of the sales order.

     -- Query 4
    select h.OrderNbr, h.CustNbr, d.ItemNbr,
           d.Quantity * d.Price as Extended,
           m.OrderNbr, m.SchedDate
      from sodtl as d
      join sohdr as h
        on h.OrderNbr = d.OrderNbr
      join mo as m
        on d.OrderNbr = dec(substr(m.Notes,1,5),5,0)
       and d.LineNbr  = dec(substr(m.Notes,6,2),2,0)
     where d.OrderNbr = 100
       and d.LineNbr  = 1;
    

    How do you like that join from sales order detail to manufacturing order? I wish I could say that that sort of join is rare, but I’ve seen it many, many times over the years.

    What will a redundant test do for this query?

     -- Query 5
    select h.OrderNbr, h.CustNbr, d.ItemNbr,
           d.Quantity * d.Price as Extended,
           m.OrderNbr, m.SchedDate
      from sodtl as d
      join sohdr as h
        on h.OrderNbr = d.OrderNbr
      join mo as m
        on d.OrderNbr = dec(substr(m.Notes,1,5),5,0)
       and d.LineNbr  = dec(substr(m.Notes,6,2),2,0)
     where d.OrderNbr = 100
       and d.LineNbr  = 1
       and m.Notes    = '0010001';
    

    Believe it or not, performance improves!

    I had a situation like this in the not too distant past. When I told Run SQL Statements to run the query without the redundant test, the system churned until I finally canceled the query in disgust. When I added the redundant test, the result set popped up immediately.

    We human beings, it seems to me, are comforted by hard-and-fast rules. We like to hear “Always do this!” and “Never do that!” But life has led me to believe that the best we can hope for in most cases are guidelines. This certainly applies to query optimization. So here are your guidelines for redundant row-selection tests.

    • A redundant test may improve the performance of a query.
    • A redundant test may degrade the performance of a query.

    I wish I could give you something better, friend, but I don’t have it. The good news is that you don’t have to tune every query. Just tune the ones that don’t run fast enough. A redundant test may not help, but it’s worth a shot.

    RELATED STORIES

    Redundant Join Criteria: Good or Bad Idea?

    It’s Worth a Shot

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags: Tags: 400guru, FHG, Four Hundred Guru, IBM i

    Sponsored by
    Midrange Dynamics North America

    With MDRapid, you can drastically reduce application downtime from hours to minutes. Deploying database changes quickly, even for multi-million and multi-billion record files, MDRapid is easy to integrate into day-to-day operations, allowing change and innovation to be continuous while reducing major business risks.

    Learn more.

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Feeling Insecure About The Weak Security At Most IBM i Shops Thoroughly Modern: Understanding Your IBM i Web Application Needs With Application Discovery

    One thought on “Guru: The Uncertainty of Redundant Row Selection”

    • Anoop says:
      February 12, 2021 at 10:56 am

      Hi Ted,
      I expect this difference in behaviour because of a simple reason. For columns where indexes are likely to exist, even temporary ones, there is no benefit to adding a redundant where clause.

      For query5, an index is unlikely to be created on substrings of m.Notes. So, the entire mo table has to scanned and parsed to find a matching. When m.Notes = ‘0010001’ was added, the lookup became a whole lot easier as it could just scan through any temp indexes created for m.Notes column.

      If feasible, you can create indexes on substr(m.Notes,6,2) and verify the performance.

      Regards,
      Anoop

      Reply

    Leave a Reply Cancel reply

TFH Volume: 31 Issue: 10

This Issue Sponsored By

  • Precisely
  • Fresche Solutions
  • Trinity Guard
  • IBM
  • Raz-Lee Security

Table of Contents

  • IBM i Groups Cautiously Adding In-Person Events To Schedules
  • Thoroughly Modern: Understanding Your IBM i Web Application Needs With Application Discovery
  • Guru: The Uncertainty of Redundant Row Selection
  • Feeling Insecure About The Weak Security At Most IBM i Shops
  • The Humongous Investment In IBM i People

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • Big Blue Raises IBM i License Transfer Fees, Other Prices
  • Keep The IBM i Youth Movement Going With More Training, Better Tools
  • Remain Begins Migrating DevOps Tools To VS Code
  • IBM Readies LTO-10 Tape Drives And Libraries
  • IBM i PTF Guide, Volume 27, Number 23
  • SEU’s Fate, An IBM i V8, And The Odds Of A Power13
  • Tandberg Bankruptcy Leaves A Hole In IBM Power Storage
  • RPG Code Generation And The Agentic Future Of IBM i
  • A Bunch Of IBM i-Power Systems Things To Be Aware Of
  • IBM i PTF Guide, Volume 27, Numbers 21 And 22

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle