• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Guru: Phonetic Functions In SQL, Part 1

    September 17, 2018 Paul Tuohy

    In my next two articles I am going to discuss the use of phonetic functions in SQL. You can use phonetic functions to select or order rows based on the phonetic sound of a string as opposed to the actual characters in the string. The obvious use of phonetic functions is with names, but they can be used with any string columns.

    I must admit that this touches on one of my pet peeves — the spelling of my surname. I have lost count of the number of times I have had to spell my name two, three, or four times for someone on the phone — and they still get it wrong. (Why is it that when I spell TUO, they hear TOU?) I have also lost count of the number of times I have had to dig out an account number because I “don’t appear to have an account.” All of this could be avoided if the person at the other end was using a “fuzzy” search with a phonetic function.

    In order to demonstrate the use of phonetic functions, I created the following table:

    CREATE OR REPLACE TABLE PHONETIC ( 
        TESTNO INTEGER NOT NULL DEFAULT ,
        BASE   VARCHAR(20) NOT NULL DEFAULT,
        NAME   VARCHAR(20) NOT NULL DEFAULT );
    

    There will be a number of rows for a test (identified by TESTNO). Each row, for a test, will have the same BASE value and a different value for NAME.

    Before getting into the nitty gritty, you should be aware that phonetic functions may not give you what you expect. There are so many variables to take into account — language, national pronunciation, and regional pronunciation, to name a few. Therefore, you should not treat the result of a phonetic function in the same way as you would standard column values.

    SOUNDEX

    The starting point for phonetic functions is SOUNDEX: it is a scalar function which returns a four-character phonetic code for a string. Although a standard function in every database, the SOUNDEX algorithm was actually patented in 1918. If anyone has ever searched a genealogy database, there was probably a SOUNDEX search option available.

    Let’s start by looking at what SOUNDEX gives us with some of the various spellings of my name, using the following SELECT statement.

    SELECT BASE, NAME,
            CASE 
                WHEN SOUNDEX(BASE) = SOUNDEX(NAME) THEN 'Match'
                ELSE 'No Match'
            END SOUNDEX,
            SOUNDEX(BASE) BASE_SX,
            SOUNDEX(NAME) NAME_SX
    FROM PHONETIC
    WHERE TESTNO = 1
    ORDER BY NAME;
    

    The result set is as follows:

    This is just a small sample of different spellings of my name that I have received in correspondence over the years. The phonetic sound of my name is 2E (which I got as well). See why I get peeved? So many different spellings of my name return the same SOUNDEX phonetic code.

    Another use of SOUNDEX is when it comes to ordering rows. This statement orders rows by NAME:

    SELECT NAME 
    FROM PHONETIC
    WHERE TESTNO = 4
    ORDER BY NAME;
    

    The “problem” with the resulting sequence is that the “Smiths” are separated (phonetically) by other rows.

    If, on the other hand, we had ordered the rows using SOUNDEX:

    SELECT NAME 
    FROM PHONETIC
    WHERE TESTNO = 4
    ORDER BY SOUNDEX(NAME);
    

    The resulting rows will be ordered by their phonetic code:

    Apart from the fact it is based on U.S. English, the major problem with SOUNDEX is that the first character of the string is always returned as the first character of the phonetic code. This means that we lose certain combinations, such as words beginning with ‘PH’. The result set for test 2 highlights this problem:

    The first three rows show as “No Match” although they are phonetically the same. If you look at the SOUNDEX codes for the two columns you will note that the only difference is with the first character.

    DIFFERENCE

    One approach to the SOUNDEX problem is us the DIFFERENCE function. The DIFFERENCE function returns a value from 0 to 4 representing the difference between the sounds of two strings based on applying the SOUNDEX function to the strings. A value of 4 is an exact match. If we apply DIFFERENCE to the test 2 data, as follows:

    SELECT BASE, NAME,
            CASE 
                WHEN SOUNDEX(BASE) = SOUNDEX(NAME) THEN 'Match'
                ELSE 'No Match'
            END SOUNDEX,
            CASE DIFFERENCE( BASE, NAME)
                WHEN 4 THEN 'Hit'
                WHEN 3 THEN 'Ballpark'
                WHEN 2 THEN 'Middle'
                WHEN 1 THEN 'Faint Hope'
                ELSE 'No Hope'
            END DIFFERENCE,  
            SOUNDEX(BASE) BASE_SX,
            SOUNDEX(NAME) NAME_SX,
            DIFFERENCE( BASE, NAME) DIFFVALUE
    FROM PHONETIC
    WHERE TESTNO = 2
    ORDER BY NAME;
    

    We see that DIFFERENCE gives us a value of 3 for the three rows that SOUNDEX missed:

    Unfortunately, DIFFERENCE is just performing a calculation between the resulting SOUNDEX codes. This means that you end up with some surprising results, as shown with the result set for test 3.

    Basic Functionality

    The standard SOUNDEX and DIFFERENCE functions provide basic functionality when it comes to phonetic processing. But what if you want to use a language other than U.S. English or really take care of that static first character in SOUNDEX? Why, you write your own function, which we will look at in the next article.

    Paul Tuohy, IBM Champion and author of Re-engineering RPG Legacy Applications, is a prominent consultant and trainer for application modernization and development technologies on the IBM Midrange. He is currently CEO of ComCon, a consultancy firm in Dublin, Ireland, and partner at System i Developer. He hosts the RPG & DB2 Summit twice per year with partners Susan Gantner and Jon Paris.

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags: Tags: 400guru, FHG, Four Hundred Guru, IBM i, SQL

    Sponsored by
    WorksRight Software

    Do you need area code information?
    Do you need ZIP Code information?
    Do you need ZIP+4 information?
    Do you need city name information?
    Do you need county information?
    Do you need a nearest dealer locator system?

    We can HELP! We have affordable AS/400 software and data to do all of the above. Whether you need a simple city name retrieval system or a sophisticated CASS postal coding system, we have it for you!

    The ZIP/CITY system is based on 5-digit ZIP Codes. You can retrieve city names, state names, county names, area codes, time zones, latitude, longitude, and more just by knowing the ZIP Code. We supply information on all the latest area code changes. A nearest dealer locator function is also included. ZIP/CITY includes software, data, monthly updates, and unlimited support. The cost is $495 per year.

    PER/ZIP4 is a sophisticated CASS certified postal coding system for assigning ZIP Codes, ZIP+4, carrier route, and delivery point codes. PER/ZIP4 also provides county names and FIPS codes. PER/ZIP4 can be used interactively, in batch, and with callable programs. PER/ZIP4 includes software, data, monthly updates, and unlimited support. The cost is $3,900 for the first year, and $1,950 for renewal.

    Just call us and we’ll arrange for 30 days FREE use of either ZIP/CITY or PER/ZIP4.

    WorksRight Software, Inc.
    Phone: 601-856-8337
    Fax: 601-856-9432
    Email: software@worksright.com
    Website: www.worksright.com

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    As I See It: Estimable Hiring IBM Tweaks Power Iron, Pulls Software, Adds Proactive Support

    One thought on “Guru: Phonetic Functions In SQL, Part 1”

    • Wim says:
      November 8, 2018 at 11:30 am

      Very cool mr 2e! Thanks.

      Reply

    Leave a Reply Cancel reply

TFH Volume: 28 Issue: 61

This Issue Sponsored By

  • Maxava
  • HelpSystems
  • ARCAD Software
  • LUG
  • WorksRight Software

Table of Contents

  • Adventures In IBM i Encryption
  • IBM Tweaks Power Iron, Pulls Software, Adds Proactive Support
  • Guru: Phonetic Functions In SQL, Part 1
  • As I See It: Estimable Hiring
  • Where There Is A (Steve) Will, There’s An (IBM i) Way

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • Profound Says New Agentic AI Dev Tool Delivers Huge Productivity Boost
  • FalconStor Doubles Down On IBM Power With Habanero Offsite Data Protection
  • Guru: Taming The CRTSRVPGM Command – Options That Can Save Your Sanity
  • Izzi Taps Virtutem To Modernize Infor LX Environments With Valence
  • IBM i PTF Guide, Volume 28, Numbers 1 Through 3
  • 2025: An IBM i Year In Review
  • A Tale Of Two Server Markets
  • Guru: CRTSRVPGM Parameters That Can Save or Sink You
  • As I See It: What’s Past is Prologue
  • IBM i PTF Guide, Volume 27, Numbers 49 Through 52

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle