• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Distinctly Speeding Up DISTINCT

    May 26, 2010 Ted Holt

    The DISTINCT keyword is a great example of the power of the SQL SELECT statement. Add this powerful word to any SELECT command and voilà! Duplicate rows magically disappear. Comments from readers have alerted me to the fact that some people find use of this feature confusing. Here are a brief discussion of DISTINCT and a performance tip.

    Given a table (physical file) of sales order information and a table of customer information, how do I go about finding the account numbers of customers who have orders in the database? We could try this:

    select s.companyno, s.customerno
      from salesordh as s
     order by 1, 2
    

    And we would get something like this:

    COMPANYNO  CUSTOMERNO
         1       34567
         1       34567
         1       45678
         1       56789
         1       77777
         2       12345
         2       23456
         2       56789
    

    Marvelous, but do I really need to have each customer listed once per order? In this short example, only one customer—34567–has more than one order. In a production database, containing hundreds or thousands of orders, there could be many duplicates. We can eliminate duplicates by adding the word DISTINCT:

    select distinct s.companyno, s.customerno
      from salesordh as s
     order by 1, 2
    

    And we would get this instead:

    COMPANYNO  CUSTOMERNO
         1       34567
         1       45678
         1       56789
         1       77777
         2       12345
         2       23456
         2       56789
    

    Let’s take it a step farther. Suppose I also want the customer’s name. For that, I must go to the customer master file. Here’s the same query, but I’ve added the customer master file.

    select distinct s.companyno, s.customerno, c.cusnam
      from salesordh as s
      join customers as c
        on c.companyno = s.companyno
       and c.customerno = s.customerno
     order by 1, 2
    

    This works. I get this:

    COMPANYNO  CUSTOMERNO  CUSNAM
         1       34567     Polly Unsaturated
         1       45678     Molly Coddle
         1       56789     R. U. Furreal
         1       77777     Herman Nootix
         2       12345     Cal E. Phornya
         2       23456     Billy Rubin
         2       56789     Sally Mander
    

    But notice something. Notice that all three selected fields are in the customer master file. Notice also that two of the fields–company number and customer number–are the key fields that uniquely identify each customer. In other words, why join, which creates multiple rows for each customer, then throw away duplicates? It’s unnecessary. Here’s the rewritten query:

    select c.companyno, c.customerno, c.cusnam
      from customers as c
     where exists
        (select * 
           from salesordh as s
          where s.companyno  = c.companyno
            and s.customerno = c.customerno)
     order by 1, 2
    

    DISTINCT is gone. The main query reads CUSTOMERS only. The sales order table has been relegated to a subquery. The resulting data set is the same, but performance should be better.

    To sum it up, use DISTINCT when you’re retrieving data from one table (or view) only. When two or more tables and/or views are involved, and all selected columns are from one table, you can usually get a performance advantage by converting the query to include a subquery and throwing away DISTINCT.



                         Post this story to del.icio.us
                   Post this story to Digg
        Post this story to Slashdot

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags:

    Sponsored by
    ARCAD Software

    Embrace VS Code for IBM i Development

    The IBM i development landscape is evolving with modern tools that enhance efficiency and collaboration. Ready to make the move to VS Code for IBM i?

    Join us for this webinar where we’ll showcase how VS Code can serve as a powerful editor for native IBM i code and explore the essential extensions that make it possible.

    In this session, you’ll discover:

    • How ARCAD’s integration with VS Code provides deep metadata insights, allowing developers to assess the impact of their changes upfront.
    • The role of Git in enabling seamless collaboration between developers using tools like SEU, RDi, and VS Code.
    • Powerful extensions for code quality, security, impact analysis, smart build, and automated RPG conversion to Free Form.
    • How non-IBM i developers can now contribute to IBM i projects without prior knowledge of its specifics, while ensuring full control over their changes.

    The future of IBM i development is here. Let ARCAD be your guide!

    Watch Now

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Sponsored Links

    IBS:  Free e-book: The Six Margin Killers in Wholesale Distribution
    WorksRight Software:  ZIP codes, area codes, Canadian postal codes, CASS certification, and more
    COMMON:  Join us at the Fall 2010 Conference & Expo, Oct. 4 - 6, in San Antonio, Texas

    IT Jungle Store Top Book Picks

    Easy Steps to Internet Programming for AS/400, iSeries, and System i: List Price, $49.95
    The iSeries Express Web Implementer's Guide: List Price, $49.95
    The System i RPG & RPG IV Tutorial and Lab Exercises: List Price, $59.95
    The System i Pocket RPG & RPG IV Guide: List Price, $69.95
    The iSeries Pocket Database Guide: List Price, $59.00
    The iSeries Pocket SQL Guide: List Price, $59.00
    The iSeries Pocket Query Guide: List Price, $49.00
    The iSeries Pocket WebFacing Primer: List Price, $39.00
    Migrating to WebSphere Express for iSeries: List Price, $49.00
    Getting Started With WebSphere Development Studio Client for iSeries: List Price, $89.00
    Getting Started with WebSphere Express for iSeries: List Price, $49.00
    Can the AS/400 Survive IBM?: List Price, $49.00
    Chip Wars: List Price, $29.95

    Saudi Distributor Taps VAI for Logistics Improvement nuBridges Eases i/OS Integration for Tokenized Data

    Leave a Reply Cancel reply

Volume 10, Number 17 -- May 26, 2010
THIS ISSUE SPONSORED BY:

WorksRight Software
SEQUEL Software
Botz & Associates, Inc.

Table of Contents

  • Five Steps To Monitoring Your Server Log on IBM i
  • Distinctly Speeding Up DISTINCT
  • Hey! What Happened To My Last Used Dates

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • Liam Allan Shares What’s Coming Next With Code For IBM i
  • From Stable To Scalable: Visual LANSA 16 Powers IBM i Growth – Launching July 8
  • VS Code Will Be The Heart Of The Modern IBM i Platform
  • The AS/400: A 37-Year-Old Dog That Loves To Learn New Tricks
  • IBM i PTF Guide, Volume 27, Number 25
  • Meet The Next Gen Of IBMers Helping To Build IBM i
  • Looks Like IBM Is Building A Linux-Like PASE For IBM i After All
  • Will Independent IBM i Clouds Survive PowerVS?
  • Now, IBM Is Jacking Up Hardware Maintenance Prices
  • IBM i PTF Guide, Volume 27, Number 24

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle