• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Guru: Phonetic Functions In SQL, Part 2

    September 24, 2018 Paul Tuohy

    In my previous article I discussed the use of the standard SOUNDEX and DIFFERENCE functions for phonetic processing. I also identified the major problems with SOUNDEX (based on U.S. English and the first character is static) and DIFFERENCE (loose because it is based on SOUNDEX).

    In this article I will discuss how to tackle these problems by writing a customized phonetic function to use in place of or in conjunction with SOUNDEX. The good news is that you do not have to become an expert in phonetics — others have already done the job for you. There are a number of algorithms available on the web, such as caverphone, cologne_phonetic, metaphone, double_metaphone and refined_soundex.

    I will be using the METAPHONE algorithm to demonstrate how you can write your own phonetic function.

    As described in the first article, this is the table I created in order to demonstrate the use of phonetic functions:

    CREATE OR REPLACE TABLE PHONETIC ( 
        TESTNO INTEGER NOT NULL DEFAULT ,
        BASE   VARCHAR(20) NOT NULL DEFAULT,
        NAME   VARCHAR(20) NOT NULL DEFAULT );
    

    There will be a number of rows for a test (identified by TESTNO). Each row, for a test, will have the same BASE value and a different value for NAME.

    The METAPHONE Function

    The Metaphone algorithm was created by Lawrence Philips and was published in an article in “Computer Language” in December, 1990. Since then, Lawrence Philips has gone on to develop double_metaphone (for which the code is freely available) and Metaphone 3 (which is a commercial product sold as source code).

    Although not a standard function in DB2, you will find a Metaphone function in some databases and as a standard function in some programming languages (such as PHP). Metaphone is different from SOUNDEX in that, instead of translating a string to a code, Metaphone returns a string of consonant sounds.

    I have done my best to translate the Metaphone algorithm into SQL PL, but, please note, that the code has not been fully tested. Let the user beware!

    This is the code to create the METAPHONE function:

    CREATE OR REPLACE FUNCTION METAPHONE(stringInput varchar(70))
    RETURNS varchar (25)
    	LANGUAGE SQL 
    	SPECIFIC  METAPHONE 
    	DETERMINISTIC 
    	CONTAINS SQL 
    	RETURNS NULL ON NULL INPUT 
    	NO EXTERNAL ACTION 
    	NOT FENCED 
    	SET OPTION  ALWBLK = *ALLREAD , 
    	ALWCPYDTA = *OPTIMIZE , 
    	COMMIT = *NONE , 
    	DECRESULT = (31, 31, 00) , 
    	DFTRDBCOL = *NONE , 
    	DYNDFTCOL = *NO , 
    	DYNUSRPRF = *USER , 
    	SRTSEQ = *HEX   
    
    /*
    Metaphone Function
    	
    Created by Lawrence Philips. 
    Published in an article in "Computer Language" December 1990 issue.
    	
    */
    
    BEGIN
    declare returnString    varchar(25);
    declare processString   varchar(70);
    declare currentChar     char(1);
    declare twoChar         char(2);
    declare threeChar       char(3);
    declare previousChar    char(1);
    declare nextChar        char(1);
    declare addChar         char(1);
    declare totalLength     smallint;
    declare processCount    smallint;
    
    set processString = ltrim(lower(stringInput));
    set totalLength = length(processString);
    set processCount = 1;
    set returnString = '';
    
    --Process beginning exceptions
    set twoChar = left(processString,2);
    if (twoChar in ('ae', 'gn', 'kn', 'pn', 'wr')) then
    	set processString = right(processString , totalLength—1);
    	set totalLength = totalLength—1;
    elseif (twoChar = 'wh') then	
    	set processString 	= 'w' concat right(processString , totalLength—2);
    	set totalLength = totalLength—1;
    end if;
    
    set currentChar = left(processString,1);
    
    if (currentChar	= 	'x') then 	
    	set processString = 's' concat right(processString , totalLength—1);
    end if;
    
    if (currentChar in ('a','e','i','o','u')) then
    	set processString 	= right(processString , totalLength—1);
    	set totalLength = totalLength—1;
    	set	returnString	= currentChar;
    end if;
    
    set currentChar = ' ';
    while processCount <= totalLength do
        set previousChar = currentChar;
    	set currentChar = substring(processString,processCount,1);
        set addChar = '';	
        
        if (processCount < totalLength) then
            set nextChar = substring(processString,processCount + 1,1);
        else 
            set nextChar = '';
        end if;
        
    	if (nextChar <> currentChar or currentChar = 'c') then
    		
    		set twoChar = substring(processString,processCount,2);
    		set threeChar = substring(processString,processCount,3);
    
    		if (currentChar in ('f','j','l','m','n','r')) then	
    			set addChar =  currentChar;
    		elseif (currentChar = 'q') then set addChar =  'k';
    		elseif (currentChar = 'v') then set addChar =  'f';
    		elseif (currentChar = 'x') then set addChar =  'ks';
    		elseif (currentChar = 'z') then set addChar =  's';
    		elseif (currentChar = 'b') then
                set addChar =  'b';
                if (processCount = totalLength and
                    prevoiusChar = 'm') then
    				    set addChar =  '';
                end if;
    		elseif (currentChar = 'c') then
    			if 	((twoChar = 'ch') or
                     (threeChar = 'cia')) then
    				    set addChar =  'x';
    			elseif ((twoChar in ('ci','ce','cy')) and 
                        (previousChar <> 's') ) then 
    					set addChar =  's';
            	else set addChar =  'k';
                end if;                
    		elseif (currentChar = 'd') then
    			if (threeChar in ('dge','dgy','dgi')) then
    				set addChar =  'j';
    			else set addChar =  't';
                end if;
    		elseif (currentChar = 'g') then 
                set addChar =  'k';
                if ( ( (twoChar = 'gh') and
                       (processCount < (totalLength—1)) and
                       (threeChar not in ('gha','ghe','ghi','gho','ghu') )
                      ) or
                      (twoChar = 'gn') or
                      ( (previousChar = 'd') and
                        (twoChar not in ('ga','ge','gi','go','gu') )
                      )
                    ) then
                    set addChar =  '';
    			elseIf ( (previousChar <> 'g') or
                         (twoChar in ('ge','gi','gy') )
                        ) then	
                    set addChar =  'f';
                end if;
    		elseif (currentChar = 'h') then
                if not ( ((previousChar in ('a','e','i','o','u')) and
                          (twoChar not in ('ha','he','hi','ho','hu'))
                         ) or
                         (previousChar in ('c','s','p','t','g'))
                        ) then
    		       set addChar =  'h';
                end if;
    		elseif ((currentChar = 'k') and	
    				(previousChar <> 'c') ) then
    				set addChar =  'k';
    		elseif (currentChar = 'p') then	
    			if (twoChar = 'ph') then
    				set addChar =  'f';
    			else
    				set addChar =  'p';
                end if;
    		elseif (currentChar = 's') then
    			if ((threeChar in ('sia','sio')) or 
                    (twoChar = 'sh') ) then
    				set addChar =  'x';
    			else 
                    set addChar =  's';
                end if;
    		elseif (currentChar = 't') then
                set addChar = 't';
    			if (threeChar in ('tia','tio')) then
    				set addChar =  'x';
    			elseif (twoChar = 'th') then
    				set addChar =  '0';
    			elseif (threeChar = 'tch') then
    				set addChar =  't';
                end if;    
    		elseif ((currentChar = 'w') and
    				(twoChar in ('wa','we','wi','wo','wu')) ) then
    					set addChar =  'w';
    		elseif ((currentChar = 'y') and
    				(twoChar in ('ya','ye','yi','yo','yu')) ) then
    					set addChar =  'y';
            end if;
        end if;
    
        set returnString = returnString concat trim(addChar);
        set processCount = processCount + 1;
    end while;
    
    return returnstring;
    END
    

    Without going through the code in detail, the basic principle is that the input string is processed character by character and character combinations are “translated” to a consonant sound (e.g. ‘z’ becomes ‘s’, ‘ph’ becomes ‘f’, all vowels are dropped).

    Using the METAPHONE Function

    This statement compares the result of the METAPHONE function with SOUNDEX and DIFFERENCE:

    SELECT BASE, NAME,
            CASE 
                WHEN SOUNDEX(BASE) = SOUNDEX(NAME) THEN 'Match'
                ELSE 'No Match'
            END SOUNDEX,
            CASE DIFFERENCE( BASE, NAME)
                WHEN 4 THEN 'Hit'
                WHEN 3 THEN 'Ballpark'
                WHEN 2 THEN 'Middle'
                WHEN 1 THEN 'Faint Hope'
                ELSE 'No Hope'
            END DIFFERENCE,  
            CASE 
                WHEN METAPHONE(BASE) = METAPHONE(NAME) THEN 'Match'
                ELSE 'No Match'
            END METAPHONE,
            SCHEMAPT.PT_METAPHONE(BASE) BASE_MP,          
            SCHEMAPT.PT_METAPHONE(NAME) NAME_MP,          
            SOUNDEX(BASE) BASE_SX,
            SOUNDEX(NAME) NAME_SX,
            DIFFERENCE( BASE, NAME) DIFFVALUE
    FROM PHONETIC
    WHERE TESTNO = 2
    ORDER BY NAME;
    

    Our METAPHONE function provides a match on all rows, including those that caused problems for SOUNDEX. The result set also shows the difference in the values returned by the METAPHONE and SOUNDEX functions.

    There Is Always A BUT

    Although the METAPHONE function solves some of the issues with SOUNDEX, you quickly realize that METAPHONE also stricter that SOUNDEX. When we run the same statement against test 1 (my name), we see that SOUNDEX returns more hits:

    Roll Your Own

    There is no perfect solution! But, since you are writing the code, you can program in any combination you wish. There is nothing to stop you from changing or adding to the algorithm. A change to two lines would include more hits in the previous example

    And what about languages other than English? All of the extra phonetic algorithms have open source projects for different languages — most of which are available on Git hub.

    Although phonetic functions are not something that I use on an everyday basis, they are something that I include as part of a “fuzzy” search wherever names are listed. I hope you find a use for them — they are fun!

    Paul Tuohy, IBM Champion and author of Re-engineering RPG Legacy Applications, is a prominent consultant and trainer for application modernization and development technologies on the IBM Midrange. He is currently CEO of ComCon, a consultancy firm in Dublin, Ireland, and partner at System i Developer. He hosts the RPG & DB2 Summit twice per year with partners Susan Gantner and Jon Paris.

    RELATED STORY

    Guru: Phonetic Functions In SQL, Part 1

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags: Tags: DB2, FHG, Four Hundred Guru, IBM i, Metaphone, PHP

    Sponsored by
    Maxava

    Migrate IBM i with Confidence

    Tired of costly and risky migrations? Maxava Migrate Live minimizes disruption with seamless transitions. Upgrading to Power10 or cloud hosted system, Maxava has you covered!

    Learn More

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    IBM Global Financing Deals To Push Power E950 And E980 Sales Domino On IBM i Poised For A Comeback

    Leave a Reply Cancel reply

TFH Volume: 28 Issue: 63

This Issue Sponsored By

  • Fresche Solutions
  • ProData Computer Services
  • ASNA
  • COMMON
  • WorksRight Software

Table of Contents

  • IBM’s Own Positioning Of Power Systems Revealed
  • Domino On IBM i Poised For A Comeback
  • Guru: Phonetic Functions In SQL, Part 2
  • IBM Global Financing Deals To Push Power E950 And E980 Sales
  • Disaster Recovery Can Cover Your ASPs

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • POWERUp 2025 –Your Source For IBM i 7.6 Information
  • Maxava Consulting Services Does More Than HA/DR Project Management – A Lot More
  • Guru: Creating An SQL Stored Procedure That Returns A Result Set
  • As I See It: At Any Cost
  • IBM i PTF Guide, Volume 27, Number 19
  • IBM Unveils Manzan, A New Open Source Event Monitor For IBM i
  • Say Goodbye To Downtime: Update Your Database Without Taking Your Business Offline
  • i-Rays Brings Observability To IBM i Performance Problems
  • Another Non-TR “Technology Refresh” Happens With IBM i TR6
  • IBM i PTF Guide, Volume 27, Number 18

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle