• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Guru: DISTINCT Can Hide A Performance Problem

    September 28, 2020 Ted Holt

    When I see the word DISTINCT in an SQL query, a little red flag goes up inside my head. Not literally, of course. But it does make me pause and scrutinize the query more closely. I have found that poorly designed queries sometimes include the word DISTINCT as a final act of redemption to forcibly return the proper result set.

    The purpose of DISTINCT is to remove duplicate rows from a result set. As the DB2 for i SQL reference puts it:

    The keyword DISTINCT is not considered an argument of the function, but rather a specification of an operation that is performed before the function is applied. If DISTINCT is specified, redundant duplicate values are eliminated.

    And that’s great. DISTINCT is powerful. And useful. And abused.

    To illustrate the abuse, we need a couple of tables. Here is some information about manufacturing operations:

    select OrderNbr, Operation, JobOn, EmpID
      from mfgopers
     order by OrderNbr, Operation
    
    ORDERNBR OPERATION JOBON EMPID
    1001 10 2020-09-28 09:00:04 3
    1001 20 2020-09-28 09:12:38 4
    1001 25 2020-09-28 10:02:41 3
    1002 12 2020-09-28 09:01:10 1
    1003 10 2020-09-28 09:05:15 5

    And here’s some information about employees.

    select e.clock, e.name
      from emps as e
     order by e.clock;
    
    CLOCK NAME
    1 Billy Rubin
    2 Van Tastick
    3 Polly Fonnick
    4 Sal Monella
    5 Will D. Beaste

    Now to solve a real problem! Find the clock numbers and names of all the employees who worked on order 1001.

    select distinct op.empid, e.name
      from mfgopers as op
      join emps     as e   on op.empid = e.clock
     where op.ordernbr = 1001
     order by op.empid
    
    EMPID NAME
    3 Polly Fonnick
    4 Sal Monella

    The result set is accurate, so what’s the problem? Just this: the query joins before eliminating duplicates. For this little Mickey Mouse example, that’s no big deal, but this example is not a real-world problem. The queries that I see that use this technique often access half a dozen tables or more.

    Here’s the same query without DISTINCT.

    select op.empid, e.name
      from mfgopers as op
      join emps     as e   on op.empid = e.clock
     where op.ordernbr = 1001
     order by op.empid
    
    EMPID NAME
    3 Polly Fonnick
    3 Polly Fonnick
    4 Sal Monella

    What has typically happened is that the “designer” of the query realized he had almost what he wanted and saw DISTINCT as an easy way to weed out the duplicates.

    The better approach is to eliminate the duplicates before joining. Depending on the number of tables involved, the number of rows in each table, and the availability of indexes, this can make a significant difference in performance.

    One of my favorite ways to remove the duplicates is to put DISTINCT in a common table expression, like this:

    with SelectEmps as
      (select distinct op.empid
         from mfgopers as op
        where op.ordernbr = 1001)
    select s.empid, e.name
     from SelectEmps as s
     join emps       as e on s.empid = e.clock
    order by 1
    

    Depending on which tables the columns are selected from, you may be able to use a subquery.

    select e.clock, e.name
      from emps as e
     where e.clock in (select empid
                         from MfgOpers
                        where ordernbr = 1001)
     order by 1
    

    What happened to DISTINCT? The fact is that you can include it or not in the subquery.

    where e.clock in (select distinct empid
    

    I’ve always been told that DISTINCT doesn’t matter when using the IN predicate with a subquery.

    The lesson here is that DISTINCT is not a quick fix for a bad join. When you see DISTINCT in a query, you may want to double-check that it’s not covering up a performance problem.

    RELATED STORY

    IBM Knowledge Center — Aggregate Functions

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags: Tags: 400guru, DB2 for i, FHG, Four Hundred Guru, IBM i, SQL

    Sponsored by
    Midrange Dynamics North America

    With MDRapid, you can drastically reduce application downtime from hours to minutes. Deploying database changes quickly, even for multi-million and multi-billion record files, MDRapid is easy to integrate into day-to-day operations, allowing change and innovation to be continuous while reducing major business risks.

    Learn more.

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    As I See It: IT And The Other Pandemic Is Information Overload Hurting IBM i Security?

    Leave a Reply Cancel reply

TFH Volume: 30 Issue: 59

This Issue Sponsored By

  • TL Ashford
  • ProData
  • Datanational Corporation
  • RPG & DB2 Summit
  • WorksRight Software

Table of Contents

  • Max Thread Room
  • Is Information Overload Hurting IBM i Security?
  • Guru: DISTINCT Can Hide A Performance Problem
  • As I See It: IT And The Other Pandemic
  • MAGiC to Host In-Person User Conference

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • Big Blue Raises IBM i License Transfer Fees, Other Prices
  • Keep The IBM i Youth Movement Going With More Training, Better Tools
  • Remain Begins Migrating DevOps Tools To VS Code
  • IBM Readies LTO-10 Tape Drives And Libraries
  • IBM i PTF Guide, Volume 27, Number 23
  • SEU’s Fate, An IBM i V8, And The Odds Of A Power13
  • Tandberg Bankruptcy Leaves A Hole In IBM Power Storage
  • RPG Code Generation And The Agentic Future Of IBM i
  • A Bunch Of IBM i-Power Systems Things To Be Aware Of
  • IBM i PTF Guide, Volume 27, Numbers 21 And 22

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle