• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Guru: Phonetic Functions In SQL, Part 1

    September 17, 2018 Paul Tuohy

    In my next two articles I am going to discuss the use of phonetic functions in SQL. You can use phonetic functions to select or order rows based on the phonetic sound of a string as opposed to the actual characters in the string. The obvious use of phonetic functions is with names, but they can be used with any string columns.

    I must admit that this touches on one of my pet peeves — the spelling of my surname. I have lost count of the number of times I have had to spell my name two, three, or four times for someone on the phone — and they still get it wrong. (Why is it that when I spell TUO, they hear TOU?) I have also lost count of the number of times I have had to dig out an account number because I “don’t appear to have an account.” All of this could be avoided if the person at the other end was using a “fuzzy” search with a phonetic function.

    In order to demonstrate the use of phonetic functions, I created the following table:

    CREATE OR REPLACE TABLE PHONETIC ( 
        TESTNO INTEGER NOT NULL DEFAULT ,
        BASE   VARCHAR(20) NOT NULL DEFAULT,
        NAME   VARCHAR(20) NOT NULL DEFAULT );
    

    There will be a number of rows for a test (identified by TESTNO). Each row, for a test, will have the same BASE value and a different value for NAME.

    Before getting into the nitty gritty, you should be aware that phonetic functions may not give you what you expect. There are so many variables to take into account — language, national pronunciation, and regional pronunciation, to name a few. Therefore, you should not treat the result of a phonetic function in the same way as you would standard column values.

    SOUNDEX

    The starting point for phonetic functions is SOUNDEX: it is a scalar function which returns a four-character phonetic code for a string. Although a standard function in every database, the SOUNDEX algorithm was actually patented in 1918. If anyone has ever searched a genealogy database, there was probably a SOUNDEX search option available.

    Let’s start by looking at what SOUNDEX gives us with some of the various spellings of my name, using the following SELECT statement.

    SELECT BASE, NAME,
            CASE 
                WHEN SOUNDEX(BASE) = SOUNDEX(NAME) THEN 'Match'
                ELSE 'No Match'
            END SOUNDEX,
            SOUNDEX(BASE) BASE_SX,
            SOUNDEX(NAME) NAME_SX
    FROM PHONETIC
    WHERE TESTNO = 1
    ORDER BY NAME;
    

    The result set is as follows:

    This is just a small sample of different spellings of my name that I have received in correspondence over the years. The phonetic sound of my name is 2E (which I got as well). See why I get peeved? So many different spellings of my name return the same SOUNDEX phonetic code.

    Another use of SOUNDEX is when it comes to ordering rows. This statement orders rows by NAME:

    SELECT NAME 
    FROM PHONETIC
    WHERE TESTNO = 4
    ORDER BY NAME;
    

    The “problem” with the resulting sequence is that the “Smiths” are separated (phonetically) by other rows.

    If, on the other hand, we had ordered the rows using SOUNDEX:

    SELECT NAME 
    FROM PHONETIC
    WHERE TESTNO = 4
    ORDER BY SOUNDEX(NAME);
    

    The resulting rows will be ordered by their phonetic code:

    Apart from the fact it is based on U.S. English, the major problem with SOUNDEX is that the first character of the string is always returned as the first character of the phonetic code. This means that we lose certain combinations, such as words beginning with ‘PH’. The result set for test 2 highlights this problem:

    The first three rows show as “No Match” although they are phonetically the same. If you look at the SOUNDEX codes for the two columns you will note that the only difference is with the first character.

    DIFFERENCE

    One approach to the SOUNDEX problem is us the DIFFERENCE function. The DIFFERENCE function returns a value from 0 to 4 representing the difference between the sounds of two strings based on applying the SOUNDEX function to the strings. A value of 4 is an exact match. If we apply DIFFERENCE to the test 2 data, as follows:

    SELECT BASE, NAME,
            CASE 
                WHEN SOUNDEX(BASE) = SOUNDEX(NAME) THEN 'Match'
                ELSE 'No Match'
            END SOUNDEX,
            CASE DIFFERENCE( BASE, NAME)
                WHEN 4 THEN 'Hit'
                WHEN 3 THEN 'Ballpark'
                WHEN 2 THEN 'Middle'
                WHEN 1 THEN 'Faint Hope'
                ELSE 'No Hope'
            END DIFFERENCE,  
            SOUNDEX(BASE) BASE_SX,
            SOUNDEX(NAME) NAME_SX,
            DIFFERENCE( BASE, NAME) DIFFVALUE
    FROM PHONETIC
    WHERE TESTNO = 2
    ORDER BY NAME;
    

    We see that DIFFERENCE gives us a value of 3 for the three rows that SOUNDEX missed:

    Unfortunately, DIFFERENCE is just performing a calculation between the resulting SOUNDEX codes. This means that you end up with some surprising results, as shown with the result set for test 3.

    Basic Functionality

    The standard SOUNDEX and DIFFERENCE functions provide basic functionality when it comes to phonetic processing. But what if you want to use a language other than U.S. English or really take care of that static first character in SOUNDEX? Why, you write your own function, which we will look at in the next article.

    Paul Tuohy, IBM Champion and author of Re-engineering RPG Legacy Applications, is a prominent consultant and trainer for application modernization and development technologies on the IBM Midrange. He is currently CEO of ComCon, a consultancy firm in Dublin, Ireland, and partner at System i Developer. He hosts the RPG & DB2 Summit twice per year with partners Susan Gantner and Jon Paris.

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags: Tags: 400guru, FHG, Four Hundred Guru, IBM i, SQL

    Sponsored by
    Focal Point Solutions Group

    CLOUD SOLUTIONS

    From Production, Test and Development, Disaster and Backup environments, as well as hosting customer-owned servers, we offer a variety of Cloud Solutions to accommodate all sizes of business and industry. FPSG is in a unique position to provide services for multinational corporations and SMBs with data centers located throughout North America and Europe.

    Does your IBM midrange, AIX, iSeries, Intel, or Linux environment need an improved Cloud strategy?

    EXPLORE OUR CUSTOM SOLUTIONS

    • Production Solutions
    • Disaster Recovery
    • Backup and Recovery Services
    • Data Centers
    • Security & Compliance Services
    • Application Hosting

    MANAGED SERVICES

    Focal Point offers a variety of custom-managed services from daily operations management and monitoring to high availability, backup, and disaster recovery support services. If your enterprise requirements reside in IBM midrange, AIX, iSeries, Intel, or Linux, FPSG can help support your team with our managed services experts. No User downtime for production backups.

    • System Monitoring
    • Server & SAN
    • High Availability/Disaster Recovery Monitoring
    • Managed Backup Services
    • Security Administration & Monitoring
    • Cloud Environment Monitoring

    Our experts combine decades of experience with industry-leading innovation to customize effective solutions for your organization, budget, and infrastructure.

    It’s not uncommon for organizations to miss important requirements when implementing and planning for new technologies. Our Managed Technical Services ensure total compliance, with no compromise or loss of data. Our skilled specialists will design and execute a non-disruptive and comprehensive solution, allowing your IT team to concentrate on the day-to-day activities – resulting in a more agile and cost-effective infrastructure.

    Watch our IntellaFLASH™ Video to learn more

    Let’s Discuss Your Custom Solution Needs

    Follow us on LinkedIn

    focalpointsg.com | 813.513.7402

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    As I See It: Estimable Hiring IBM Tweaks Power Iron, Pulls Software, Adds Proactive Support

    One thought on “Guru: Phonetic Functions In SQL, Part 1”

    • Wim says:
      November 8, 2018 at 11:30 am

      Very cool mr 2e! Thanks.

      Reply

    Leave a Reply Cancel reply

TFH Volume: 28 Issue: 61

This Issue Sponsored By

  • Maxava
  • HelpSystems
  • ARCAD Software
  • LUG
  • WorksRight Software

Table of Contents

  • Adventures In IBM i Encryption
  • IBM Tweaks Power Iron, Pulls Software, Adds Proactive Support
  • Guru: Phonetic Functions In SQL, Part 1
  • As I See It: Estimable Hiring
  • Where There Is A (Steve) Will, There’s An (IBM i) Way

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • DRV Brings More Automation to IBM i Message Monitoring
  • Managed Cloud Saves Money By Cutting System And People Overprovisioning
  • Multiple Security Vulnerabilities Patched on IBM i
  • Four Hundred Monitor, June 22
  • IBM i PTF Guide, Volume 24, Number 25
  • Plotting A Middle Age Career Change To IBM i
  • What Is Code Transformation Even?
  • Guru: The CALL I’ve Been Waiting For
  • A Frank Solstice
  • The Inevitable Wave Of Power9 Withdrawals Begins

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2022 IT Jungle

loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.