• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Guru: Phonetic Functions In SQL, Part 1

    September 17, 2018 Paul Tuohy

    In my next two articles I am going to discuss the use of phonetic functions in SQL. You can use phonetic functions to select or order rows based on the phonetic sound of a string as opposed to the actual characters in the string. The obvious use of phonetic functions is with names, but they can be used with any string columns.

    I must admit that this touches on one of my pet peeves — the spelling of my surname. I have lost count of the number of times I have had to spell my name two, three, or four times for someone on the phone — and they still get it wrong. (Why is it that when I spell TUO, they hear TOU?) I have also lost count of the number of times I have had to dig out an account number because I “don’t appear to have an account.” All of this could be avoided if the person at the other end was using a “fuzzy” search with a phonetic function.

    In order to demonstrate the use of phonetic functions, I created the following table:

    CREATE OR REPLACE TABLE PHONETIC ( 
        TESTNO INTEGER NOT NULL DEFAULT ,
        BASE   VARCHAR(20) NOT NULL DEFAULT,
        NAME   VARCHAR(20) NOT NULL DEFAULT );
    

    There will be a number of rows for a test (identified by TESTNO). Each row, for a test, will have the same BASE value and a different value for NAME.

    Before getting into the nitty gritty, you should be aware that phonetic functions may not give you what you expect. There are so many variables to take into account — language, national pronunciation, and regional pronunciation, to name a few. Therefore, you should not treat the result of a phonetic function in the same way as you would standard column values.

    SOUNDEX

    The starting point for phonetic functions is SOUNDEX: it is a scalar function which returns a four-character phonetic code for a string. Although a standard function in every database, the SOUNDEX algorithm was actually patented in 1918. If anyone has ever searched a genealogy database, there was probably a SOUNDEX search option available.

    Let’s start by looking at what SOUNDEX gives us with some of the various spellings of my name, using the following SELECT statement.

    SELECT BASE, NAME,
            CASE 
                WHEN SOUNDEX(BASE) = SOUNDEX(NAME) THEN 'Match'
                ELSE 'No Match'
            END SOUNDEX,
            SOUNDEX(BASE) BASE_SX,
            SOUNDEX(NAME) NAME_SX
    FROM PHONETIC
    WHERE TESTNO = 1
    ORDER BY NAME;
    

    The result set is as follows:

    This is just a small sample of different spellings of my name that I have received in correspondence over the years. The phonetic sound of my name is 2E (which I got as well). See why I get peeved? So many different spellings of my name return the same SOUNDEX phonetic code.

    Another use of SOUNDEX is when it comes to ordering rows. This statement orders rows by NAME:

    SELECT NAME 
    FROM PHONETIC
    WHERE TESTNO = 4
    ORDER BY NAME;
    

    The “problem” with the resulting sequence is that the “Smiths” are separated (phonetically) by other rows.

    If, on the other hand, we had ordered the rows using SOUNDEX:

    SELECT NAME 
    FROM PHONETIC
    WHERE TESTNO = 4
    ORDER BY SOUNDEX(NAME);
    

    The resulting rows will be ordered by their phonetic code:

    Apart from the fact it is based on U.S. English, the major problem with SOUNDEX is that the first character of the string is always returned as the first character of the phonetic code. This means that we lose certain combinations, such as words beginning with ‘PH’. The result set for test 2 highlights this problem:

    The first three rows show as “No Match” although they are phonetically the same. If you look at the SOUNDEX codes for the two columns you will note that the only difference is with the first character.

    DIFFERENCE

    One approach to the SOUNDEX problem is us the DIFFERENCE function. The DIFFERENCE function returns a value from 0 to 4 representing the difference between the sounds of two strings based on applying the SOUNDEX function to the strings. A value of 4 is an exact match. If we apply DIFFERENCE to the test 2 data, as follows:

    SELECT BASE, NAME,
            CASE 
                WHEN SOUNDEX(BASE) = SOUNDEX(NAME) THEN 'Match'
                ELSE 'No Match'
            END SOUNDEX,
            CASE DIFFERENCE( BASE, NAME)
                WHEN 4 THEN 'Hit'
                WHEN 3 THEN 'Ballpark'
                WHEN 2 THEN 'Middle'
                WHEN 1 THEN 'Faint Hope'
                ELSE 'No Hope'
            END DIFFERENCE,  
            SOUNDEX(BASE) BASE_SX,
            SOUNDEX(NAME) NAME_SX,
            DIFFERENCE( BASE, NAME) DIFFVALUE
    FROM PHONETIC
    WHERE TESTNO = 2
    ORDER BY NAME;
    

    We see that DIFFERENCE gives us a value of 3 for the three rows that SOUNDEX missed:

    Unfortunately, DIFFERENCE is just performing a calculation between the resulting SOUNDEX codes. This means that you end up with some surprising results, as shown with the result set for test 3.

    Basic Functionality

    The standard SOUNDEX and DIFFERENCE functions provide basic functionality when it comes to phonetic processing. But what if you want to use a language other than U.S. English or really take care of that static first character in SOUNDEX? Why, you write your own function, which we will look at in the next article.

    Paul Tuohy, IBM Champion and author of Re-engineering RPG Legacy Applications, is a prominent consultant and trainer for application modernization and development technologies on the IBM Midrange. He is currently CEO of ComCon, a consultancy firm in Dublin, Ireland, and partner at System i Developer. He hosts the RPG & DB2 Summit twice per year with partners Susan Gantner and Jon Paris.

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags: Tags: 400guru, FHG, Four Hundred Guru, IBM i, SQL

    Sponsored by
    UCG Technologies – Vault400

    Do the Math When Looking at IBM i Hosting for Cost Savings

    COVID-19 has accelerated certain business trends that were already gaining strength prior to the start of the pandemic. E-commerce, telehealth, and video conferencing are some of the most obvious examples. One example that may not be as obvious to the general public but has a profound impact on business is the shift in strategy of IBM i infrastructure from traditional, on-premises environments to some form of remote configuration. These remote configurations and all of their variations are broadly referred to in the community as IBM i hosting.

    “Hosting” in this context can mean different things to different people, and in general, hosting refers to one of two scenarios. In the first scenario, hosting can refer to a client owned machine that is housed in a co-location facility (commonly called a co-lo for short) where the data center provides traditional system administrator services, relieving the client of administrative and operational responsibilities. In the second scenario, hosting can refer to an MSP owned machine in which partition resources are provided to the client in an on-demand capacity. This scenario allows the client to completely outsource all aspects of Power Systems hardware and the IBM i operating system and database.

    The scenario that is best for each business depends on a number of factors and is largely up for debate. In most cases, pursuing hosting purely as a cost saving strategy is a dead end. Furthermore, when you consider all of the costs associated with maintaining and IBM i environment, it is typically not a cost-effective option for the small to midsize market. The most cost-effective approach for these organizations is often a combination of a client owned and maintained system (either on-prem or in a co-lo) with cloud backup and disaster-recovery-as-a-service. Only in some cases of larger enterprise companies can a hosting strategy start to become a potentially cost-effective option.

    However, cost savings is just one part of the story. As IBM i expertise becomes scarce and IT resources run tight, the only option for some firms may be to pursue hosting in some capacity. Whatever the driving force for pursing hosting may be, the key point is that it is not just simply an option for running your workload in a different location. There are many details to consider and it is to the best interest of the client to work with an experienced MSP in weighing the benefits and drawbacks of each option. As COVID-19 rolls on, time will tell if IBM i hosting strategies will follow the other strong business trends of the pandemic.

    When we say do the math in the title above, it literally means that you need to do the math for your particular scenario. It is not about us doing the math for you, making a case for either staying on premises or for moving to the cloud. There is not one answer, but just different levels of cost to be reckoned which yield different answers. Most IBM i shops have fairly static workloads, at least measured against the larger mix of stuff on the public clouds of the world. How do you measure the value of controlling your own IT fate? That will only be fully recognized at the moment when it is sorely missed the most.

    CONTINUE READING ARTICLE

    Please visit ucgtechnologies.com/IBM-POWER9-systems for more information.

    800.211.8798 | info@ucgtechnologies.com

    Article featured in IT Jungle on April 5, 2021

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    As I See It: Estimable Hiring IBM Tweaks Power Iron, Pulls Software, Adds Proactive Support

    One thought on “Guru: Phonetic Functions In SQL, Part 1”

    • Wim says:
      November 8, 2018 at 11:30 am

      Very cool mr 2e! Thanks.

      Reply

    Leave a Reply Cancel reply

TFH Volume: 28 Issue: 61

This Issue Sponsored By

  • Maxava
  • HelpSystems
  • ARCAD Software
  • LUG
  • WorksRight Software

Table of Contents

  • Adventures In IBM i Encryption
  • IBM Tweaks Power Iron, Pulls Software, Adds Proactive Support
  • Guru: Phonetic Functions In SQL, Part 1
  • As I See It: Estimable Hiring
  • Where There Is A (Steve) Will, There’s An (IBM i) Way

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • Query Supervisor Gives Database Engineers New Power
  • IBM Unveils New and Improved IBM i Services
  • 3 Takeaways from the 2021 PowerTech Security Report
  • Four Hundred Monitor, April 14
  • IBM i PTF Guide, Volume 23, Number 15
  • Big Blue Unveils Spring 2021 IBM i Technology Refreshes
  • Thoroughly Modern: Innovative And Realistic Approaches To IBM i Modernization
  • Guru: Web Services, DATA-INTO and DATA-GEN, Part 2
  • Back To The Future With A New IBM i Logo
  • IBM i PTF Guide, Volume 23, Number 14

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2021 IT Jungle

loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.