• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Guru: Another Red Flag – Joining On Expressions

    August 30, 2021 Ted Holt

    One would think that a given datum, for example a sales order number, would be defined identically in the various database tables in which it is found within an organization, but one might be wrong. I have on many occasions faced the challenge of joining two or more tables on unmatched data types.

    But it gets worse than that. Sometimes joins involve expressions, which may consist of mathematical operations and/or invocations of functions, both intrinsic and user-written. As with the word DISTINCT in a SELECT, the presence of expression in a join sends up a little red flag that tells me that I should review the query. Who knows? There may be a performance problem that needs to be addressed.

    Ideally all joins would be between identical data types. Consider two tables: sales order header and sales order detail.

    create table sordh
      ( OrderNbr     dec   (5),
        CustomerID   dec   (5),
        DueDate      date,
        primary key (OrderNbr));
    
    insert into sordh values
    ( 320, 20260, '2021-07-15'),
    ( 321, 33750, '2021-07-12'),
    ( 322, 19455, '2021-07-25');
        
    create table sordd
      ( OrderNbr     dec   (5),
        LineNbr      dec   (3),
        Item         char  (6),
        Quantity     dec   (3),
        primary key ( OrderNbr, LineNbr ));
    
    insert into sordd values
    ( 320, 1, 'AB-101', 12),
    ( 320, 2, 'BC-202',  1),
    ( 321, 1, 'XZ-411',  2),
    ( 322, 1, 'JR-399',  6);
    

    These two tables obviously join on order number, which is defined as five digits packed decimal in both tables.

    Now let’s bring in the manufacturing order table.

    create table mord
      ( OrderNbr     dec   (5),
        Item         char  (6),
        Quantity     dec   (3),
        DueDate      date,
        Comment      char (24),
        primary key ( OrderNbr )); 
    
    insert into mord values
    ( 725, 'C-9091', 2, '2021-07-01', 'SABGR7AA00005 DE00320001'),
    ( 726, 'K-3432', 1, '2021-07-01', 'TLD  7AB00003X  00320001'),
    ( 727, 'D-8850', 1, '2021-07-05', 'MENO 7TB00121 4 00321001'),
    ( 728, 'F-2855', 1, '2021-06-30', 'PLOTZ6  00009   00320002');
    

    Let’s invent a bit of history. This organization uses an ERP package that was designed for a make-to-stock operation. As such, there is no way to tie a manufacturing order to a sales order line. When the business decided to make to order, the enterprising IT department staff decided to put order number and order line number in the last eight positions of the COMMENT field.

    Here’s the join between manufacturing order and sales order header.

    select m.OrderNbr, m.Item, m.DueDate, soh.CustomerID
      from mord as m
      left join sordh as soh
        on substr(m.Comment, 17, 5) = soh.OrderNbr;
    
    Manufacturing order Item Due date Customer ID
    725 C-9091 2021-07-01 20260
    726 K-3432 2021-07-01 20260
    727 D-8850 2021-07-05 33750
    728 F-2855 2021-06-30 20260

    This is pretty much guaranteed to perform less than stellar. The system will have to read every row of MORD to see what’s in bytes 17 through 21 of the COMMENT column.

    One way that should accelerate this query is to create an index over that piece of COMMENT.

    create index mord1 on mord (substr(Comment, 17, 5));
    

    I say should because I can’t be sure the query engine will use the index. I’ve even seen the query engine ignore indexes that I had built upon the recommendation of the index advisor.

    There’s another way, the one we used to use before indexes could be built over expressions. We can create a cross-reference table.

    Let’s say that sales order numbers are stored in three formats in our imaginary system. You’ve just seen two of them. The third is in a shipments table. We don’t like the way our ERP handles shipping, so we use someone else’s shipping software, and it has a spot for an eight-character sales order number.

    create table shipments
      ( ID          dec(7),
        Date        date,
        Carrier     char(3),
        SalesOrd    char(8),
        primary key (ID));
    
    create index shipment1 on shipments (cast (SalesOrd as dec(8)));
    
    insert into shipments values
    ( 184, '2021-07-15', 'ABC', '00000320'),
    ( 185, '2021-07-15', 'ABC', '00000322');
    

    Here’s a join between shipments and sales order headers.

    select sh.ID, sh.Date, sh.Carrier,
           so.OrderNbr, so.CustomerID, so.DueDate
      from shipments  as sh
      left join sordh as so on sh.SalesOrd = so.OrderNbr
     order by sh.ID;
    
    Shipment ID Ship date Carrier Sales order Customer ID Due date
    184 2021-07-15 ABC 320 20260 2021-07-15
    185 2021-07-15 ABC 322 19455 2021-07-25

    All is well and good. There are no expressions in that join. Or are there? The sales order number is five digits in one table and eight characters in the other. There is an implicit conversion.

    OK, so back to the cross-reference table. Let’s create a table that has one row per sales order. Each row has the sales order number in the three formats that we use.

    create table soxref
      ( OrderNbr   dec(5),
        OrderC8    char(8),
        OrderC5    char(5),
        primary key (OrderNbr));
    
    create index soxref1 on soxref (OrderC8);
    create index soxref2 on soxref (OrderC5);
    
    insert into soxref values
    ( 320, '00000320', '00320'),
    ( 321, '00000321', '00321'),
    ( 322, '00000322', '00322');
    

    Let’s look at that shipments join again.

    select sh.ID, sh.Date, sh.Carrier,
           so.OrderNbr, so.CustomerID, so.DueDate
      from shipments as sh
      left join soxref as x  on sh.SalesOrd = x.OrderC8
      left join sordh as so  on x.OrderNbr = so.OrderNbr
     order by sh.ID;
    
    Shipment ID Ship date Carrier Sales order Customer ID Due date
    184 2021-07-15 ABC 320 20260 2021-07-15
    185 2021-07-15 ABC 322 19455 2021-07-25

    Adding the cross-reference table to the query removed all data conversions from the joins.

    But, is there really a need for a cross-reference table? We indexed the sales order column in each table. Isn’t that enough?

    Yes, it is. But creating the cross-reference, in addition to eliminating the data conversions, opens up some other possibilities.

    One, we can duplicate frequently-requested data from the other tables into the cross reference. For example, suppose it is very common to include customer number in queries. We might add customer number to the cross reference to eliminate the need to include the sales order header in a query when no other header columns are needed. I don’t advocate this technique. I try to adhere strongly to normalization to keep anomalies out of the database. However, if used properly, duplication can be effective.

    Two, we can add additional columns that are functionally dependent on order number, but don’t have a place in the database. In this example, the sales order header comes from a software vendor. Unless this vendor has a way for us to add our own data to the database, it would probably be a mistake for us to add additional fields to the SORDH table. We could add our own data to the cross reference instead.

    This brings me to a really good example. In some of the shops where I’ve worked there was a calendar file, also called a date dimension file. Such a file (or table) has one record (row) for each day. The file has fields (columns) for all kinds of values, such as:

    • the date in various date formats and data types
    • day of the week in one or more formats
    • flags that indicate whether the date is a work day, weekend day, holiday, and so forth
    • the value returned by the DAYS function
    • the first and last days of the month
    • accounting period information
    • and so on

    Here’s the definition of a simple calendar table.

    create or replace table Calendar
     ( BaseDate        date,
       YYMD            numeric(8),
       MDY_Edited      char(8),
       MDYY_Edited     char(10),
       MDY             numeric(6),
       MDYY            numeric(8),
       DayOfWeek       numeric(1),
       DOWAbbr         char(3),
       DayOfWeekName   varchar(9),
       primary key (BaseDate));
       
    create index Calendar1 on Calendar (yymd);   
    

    Here are a few rows by way of example.

    BaseDate YYMD MDY Edited MDYY Edited MDY MDYY Day Abbrev Day name
    2021-08-27 20210827 08/27/21 08/27/2021 82721 8272021 6 FRI Friday
    2021-08-28 20210828 08/28/21 08/28/2021 82921 8282021 7 SAT Saturday
    2021-08-29 20210829 08/29/21 08/29/2021 82921 8292021 1 SUN Sunday
    2021-08-30 20210830 08/30/21 08/30/2021 83021 8302021 2 MON Monday

    Using such a table removes date-conversion functions from queries, not only in the JOIN clause, but also in the SELECT, WHERE, and perhaps other clauses.

    When I started work on this article a few months ago, I intended to devote a good bit of space to this example because of the usefulness of having a table of dates in the database. However, in the meantime, Kent Milligan published an article on this topic on his blog. Rather than repeat what he’s said, let me refer you to his article for more information about joining with a calendar table. While I’m at it, let me encourage you to subscribe to Kent’s blog if you haven’t done so already.

    I’d also like to refer you to this article, in which Aaron Bertrand writes about using a calendar table in a Microsoft SQL Server database. He goes wild with the possibilities.

    But back to the topic at hand. Using expressions to join tables certainly works. The query engine is powerful and can retrieve the results you need. I’m not saying that you should never join with expressions. If performance is acceptable, then leave the query alone and find a more useful task to spend your time on.

    And if performance is unacceptable, but creating an index is out of the question, hold your nose and let the query run.

    RELATED STORIES

    Guru: DISTINCT Can Hide A Performance Problem

    SQL Implicit Cast of Character Strings and Numeric Values

    Speed Dating Your Legacy Dates

    Creating a date dimension or calendar table in SQL Server

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags: Tags: 400guru, FHG, Four Hundred Guru, IBM i, SQL

    Sponsored by
    Raz-Lee Security

    Start your Road to Zero Trust!

    Firewall Network security, controlling Exit Points, Open DB’s and SSH. Rule Wizards and graphical BI.

    Request Demo

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    As I See It: Anatomy of Failure Software Change Management Starts – And Ends – With Security

    One thought on “Guru: Another Red Flag – Joining On Expressions”

    • Jonathan Heinz says:
      August 30, 2021 at 3:31 am

      Hello
      just to let you know, the link to Aaron Bertrand article points to Kent Milligan’s article instead.

      …John

      Reply

    Leave a Reply Cancel reply

TFH Volume: 31 Issue: 55

This Issue Sponsored By

  • IBM
  • PTC Implementer
  • RPG & DB2 Summit
  • ARCAD Software
  • WorksRight Software

Table of Contents

  • GraphQL’s Emerging Role in Modernization of Monolithic Applications
  • Software Change Management Starts – And Ends – With Security
  • Guru: Another Red Flag – Joining On Expressions
  • As I See It: Anatomy of Failure
  • Forecast: Systems Spending Steady, Up For Services And Software

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • FAX/400 And CICS For i Are Dead. What Will IBM Kill Next?
  • Fresche Overhauls X-Analysis With Web UI, AI Smarts
  • Is It Time To Add The Rust Programming Language To IBM i?
  • Is IBM Going To Raise Prices On Power10 Expert Care?
  • IBM i PTF Guide, Volume 27, Number 20
  • POWERUp 2025 –Your Source For IBM i 7.6 Information
  • Maxava Consulting Services Does More Than HA/DR Project Management – A Lot More
  • Guru: Creating An SQL Stored Procedure That Returns A Result Set
  • As I See It: At Any Cost
  • IBM i PTF Guide, Volume 27, Number 19

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle