• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Guru: DISTINCT Can Hide A Performance Problem

    September 28, 2020 Ted Holt

    When I see the word DISTINCT in an SQL query, a little red flag goes up inside my head. Not literally, of course. But it does make me pause and scrutinize the query more closely. I have found that poorly designed queries sometimes include the word DISTINCT as a final act of redemption to forcibly return the proper result set.

    The purpose of DISTINCT is to remove duplicate rows from a result set. As the DB2 for i SQL reference puts it:

    The keyword DISTINCT is not considered an argument of the function, but rather a specification of an operation that is performed before the function is applied. If DISTINCT is specified, redundant duplicate values are eliminated.

    And that’s great. DISTINCT is powerful. And useful. And abused.

    To illustrate the abuse, we need a couple of tables. Here is some information about manufacturing operations:

    select OrderNbr, Operation, JobOn, EmpID
      from mfgopers
     order by OrderNbr, Operation
    
    ORDERNBR OPERATION JOBON EMPID
    1001 10 2020-09-28 09:00:04 3
    1001 20 2020-09-28 09:12:38 4
    1001 25 2020-09-28 10:02:41 3
    1002 12 2020-09-28 09:01:10 1
    1003 10 2020-09-28 09:05:15 5

    And here’s some information about employees.

    select e.clock, e.name
      from emps as e
     order by e.clock;
    
    CLOCK NAME
    1 Billy Rubin
    2 Van Tastick
    3 Polly Fonnick
    4 Sal Monella
    5 Will D. Beaste

    Now to solve a real problem! Find the clock numbers and names of all the employees who worked on order 1001.

    select distinct op.empid, e.name
      from mfgopers as op
      join emps     as e   on op.empid = e.clock
     where op.ordernbr = 1001
     order by op.empid
    
    EMPID NAME
    3 Polly Fonnick
    4 Sal Monella

    The result set is accurate, so what’s the problem? Just this: the query joins before eliminating duplicates. For this little Mickey Mouse example, that’s no big deal, but this example is not a real-world problem. The queries that I see that use this technique often access half a dozen tables or more.

    Here’s the same query without DISTINCT.

    select op.empid, e.name
      from mfgopers as op
      join emps     as e   on op.empid = e.clock
     where op.ordernbr = 1001
     order by op.empid
    
    EMPID NAME
    3 Polly Fonnick
    3 Polly Fonnick
    4 Sal Monella

    What has typically happened is that the “designer” of the query realized he had almost what he wanted and saw DISTINCT as an easy way to weed out the duplicates.

    The better approach is to eliminate the duplicates before joining. Depending on the number of tables involved, the number of rows in each table, and the availability of indexes, this can make a significant difference in performance.

    One of my favorite ways to remove the duplicates is to put DISTINCT in a common table expression, like this:

    with SelectEmps as
      (select distinct op.empid
         from mfgopers as op
        where op.ordernbr = 1001)
    select s.empid, e.name
     from SelectEmps as s
     join emps       as e on s.empid = e.clock
    order by 1
    

    Depending on which tables the columns are selected from, you may be able to use a subquery.

    select e.clock, e.name
      from emps as e
     where e.clock in (select empid
                         from MfgOpers
                        where ordernbr = 1001)
     order by 1
    

    What happened to DISTINCT? The fact is that you can include it or not in the subquery.

    where e.clock in (select distinct empid
    

    I’ve always been told that DISTINCT doesn’t matter when using the IN predicate with a subquery.

    The lesson here is that DISTINCT is not a quick fix for a bad join. When you see DISTINCT in a query, you may want to double-check that it’s not covering up a performance problem.

    RELATED STORY

    IBM Knowledge Center — Aggregate Functions

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags: Tags: 400guru, DB2 for i, FHG, Four Hundred Guru, IBM i, SQL

    Sponsored by
    WorksRight Software

    Do you need area code information?
    Do you need ZIP Code information?
    Do you need ZIP+4 information?
    Do you need city name information?
    Do you need county information?
    Do you need a nearest dealer locator system?

    We can HELP! We have affordable AS/400 software and data to do all of the above. Whether you need a simple city name retrieval system or a sophisticated CASS postal coding system, we have it for you!

    The ZIP/CITY system is based on 5-digit ZIP Codes. You can retrieve city names, state names, county names, area codes, time zones, latitude, longitude, and more just by knowing the ZIP Code. We supply information on all the latest area code changes. A nearest dealer locator function is also included. ZIP/CITY includes software, data, monthly updates, and unlimited support. The cost is $495 per year.

    PER/ZIP4 is a sophisticated CASS certified postal coding system for assigning ZIP Codes, ZIP+4, carrier route, and delivery point codes. PER/ZIP4 also provides county names and FIPS codes. PER/ZIP4 can be used interactively, in batch, and with callable programs. PER/ZIP4 includes software, data, monthly updates, and unlimited support. The cost is $3,900 for the first year, and $1,950 for renewal.

    Just call us and we’ll arrange for 30 days FREE use of either ZIP/CITY or PER/ZIP4.

    WorksRight Software, Inc.
    Phone: 601-856-8337
    Fax: 601-856-9432
    Email: software@worksright.com
    Website: www.worksright.com

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    As I See It: IT And The Other Pandemic Is Information Overload Hurting IBM i Security?

    Leave a Reply Cancel reply

TFH Volume: 30 Issue: 59

This Issue Sponsored By

  • TL Ashford
  • ProData
  • Datanational Corporation
  • RPG & DB2 Summit
  • WorksRight Software

Table of Contents

  • Max Thread Room
  • Is Information Overload Hurting IBM i Security?
  • Guru: DISTINCT Can Hide A Performance Problem
  • As I See It: IT And The Other Pandemic
  • MAGiC to Host In-Person User Conference

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • What You Will Find In IBM i 7.6 TR1 and IBM i 7.5 TR7
  • Three Things For IBM i Shops To Consider About DevSecOps
  • Big Blue Converges IBM i RPG And System Z COBOL Code Assistants Into “Project Bob”
  • As I See It: Retirement Challenges
  • IBM i PTF Guide, Volume 27, Number 41
  • Stacking Up Power11 Entry Server Performance To Older Iron
  • Big Blue Boosts IBM i Support In Instana, Adds Tracing
  • It Is Time To Tell Us What You Are Thinking And Doing
  • IBM i PTF Guide, Volume 27, Number 40
  • The GenAI Boom Is Only Slightly Louder Than The Dot Com Boom

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle