• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Guru: DISTINCT Can Hide A Performance Problem

    September 28, 2020 Ted Holt

    When I see the word DISTINCT in an SQL query, a little red flag goes up inside my head. Not literally, of course. But it does make me pause and scrutinize the query more closely. I have found that poorly designed queries sometimes include the word DISTINCT as a final act of redemption to forcibly return the proper result set.

    The purpose of DISTINCT is to remove duplicate rows from a result set. As the DB2 for i SQL reference puts it:

    The keyword DISTINCT is not considered an argument of the function, but rather a specification of an operation that is performed before the function is applied. If DISTINCT is specified, redundant duplicate values are eliminated.

    And that’s great. DISTINCT is powerful. And useful. And abused.

    To illustrate the abuse, we need a couple of tables. Here is some information about manufacturing operations:

    select OrderNbr, Operation, JobOn, EmpID
      from mfgopers
     order by OrderNbr, Operation
    
    ORDERNBR OPERATION JOBON EMPID
    1001 10 2020-09-28 09:00:04 3
    1001 20 2020-09-28 09:12:38 4
    1001 25 2020-09-28 10:02:41 3
    1002 12 2020-09-28 09:01:10 1
    1003 10 2020-09-28 09:05:15 5

    And here’s some information about employees.

    select e.clock, e.name
      from emps as e
     order by e.clock;
    
    CLOCK NAME
    1 Billy Rubin
    2 Van Tastick
    3 Polly Fonnick
    4 Sal Monella
    5 Will D. Beaste

    Now to solve a real problem! Find the clock numbers and names of all the employees who worked on order 1001.

    select distinct op.empid, e.name
      from mfgopers as op
      join emps     as e   on op.empid = e.clock
     where op.ordernbr = 1001
     order by op.empid
    
    EMPID NAME
    3 Polly Fonnick
    4 Sal Monella

    The result set is accurate, so what’s the problem? Just this: the query joins before eliminating duplicates. For this little Mickey Mouse example, that’s no big deal, but this example is not a real-world problem. The queries that I see that use this technique often access half a dozen tables or more.

    Here’s the same query without DISTINCT.

    select op.empid, e.name
      from mfgopers as op
      join emps     as e   on op.empid = e.clock
     where op.ordernbr = 1001
     order by op.empid
    
    EMPID NAME
    3 Polly Fonnick
    3 Polly Fonnick
    4 Sal Monella

    What has typically happened is that the “designer” of the query realized he had almost what he wanted and saw DISTINCT as an easy way to weed out the duplicates.

    The better approach is to eliminate the duplicates before joining. Depending on the number of tables involved, the number of rows in each table, and the availability of indexes, this can make a significant difference in performance.

    One of my favorite ways to remove the duplicates is to put DISTINCT in a common table expression, like this:

    with SelectEmps as
      (select distinct op.empid
         from mfgopers as op
        where op.ordernbr = 1001)
    select s.empid, e.name
     from SelectEmps as s
     join emps       as e on s.empid = e.clock
    order by 1
    

    Depending on which tables the columns are selected from, you may be able to use a subquery.

    select e.clock, e.name
      from emps as e
     where e.clock in (select empid
                         from MfgOpers
                        where ordernbr = 1001)
     order by 1
    

    What happened to DISTINCT? The fact is that you can include it or not in the subquery.

    where e.clock in (select distinct empid
    

    I’ve always been told that DISTINCT doesn’t matter when using the IN predicate with a subquery.

    The lesson here is that DISTINCT is not a quick fix for a bad join. When you see DISTINCT in a query, you may want to double-check that it’s not covering up a performance problem.

    RELATED STORY

    IBM Knowledge Center — Aggregate Functions

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags: Tags: 400guru, DB2 for i, FHG, Four Hundred Guru, IBM i, SQL

    Sponsored by
    ARCAD Software

    [Webinar] Trends for 2026: ARCAD Software’s strategic vision

    Between the acceleration of artificial intelligence, constant pressure to modernize existing systems, and ever-increasing security requirements, 2026 is shaping up to be a decisive year for legacy platforms.

    At the start of this new year, this webinar offers strategic insight into the future of these critical environments, which are at the heart of information systems.

    Join Philippe Magne, CEO of ARCAD Software, as he shares his analysis of the major trends and structural issues facing organizations:

    • DevSecOps: What are the current trends in DevOps transformation?
    • Generative artificial intelligence: What are the concrete use cases and measurable benefits for application development and maintenance?
    • Critical application security: How to respond to growing and sophisticated threats?
    • Cloud and hybridization: How do legacy applications fit into current cloud strategies?

    Save your seat for March 24 at 11 AM EDT!

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    As I See It: IT And The Other Pandemic Is Information Overload Hurting IBM i Security?

    Leave a Reply Cancel reply

TFH Volume: 30 Issue: 59

This Issue Sponsored By

  • TL Ashford
  • ProData
  • Datanational Corporation
  • RPG & DB2 Summit
  • WorksRight Software

Table of Contents

  • Max Thread Room
  • Is Information Overload Hurting IBM i Security?
  • Guru: DISTINCT Can Hide A Performance Problem
  • As I See It: IT And The Other Pandemic
  • MAGiC to Host In-Person User Conference

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • No Joke: Big Memory And Flash Price Hikes Coming April 1
  • Strategic Topics To Think About For 2026, Part 2
  • Guru: IBM i Job Log Detective Brings Structure To Job Log Analysis In VS Code
  • IBM Launches Hybrid Cloud Backup Product With Cobalt Iron
  • IBM i PTF Guide, Volume 28, Number 10
  • Why You Need To Think About Offsite Data Protection
  • IBM Gets Bob 1.0 Off The Ground
  • You Store The Crown Jewels In A Safe, Not In A Bucket
  • More Power Systems Withdrawals, And Some From Red Hat, Too
  • Price Increases Are Here, Or Pending, And For Sure For Memory

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle