Guru: Phonetic Functions In SQL, Part 2
September 24, 2018 Paul Tuohy
In my previous article I discussed the use of the standard SOUNDEX and DIFFERENCE functions for phonetic processing. I also identified the major problems with SOUNDEX (based on U.S. English and the first character is static) and DIFFERENCE (loose because it is based on SOUNDEX).
In this article I will discuss how to tackle these problems by writing a customized phonetic function to use in place of or in conjunction with SOUNDEX. The good news is that you do not have to become an expert in phonetics — others have already done the job for you. There are a number of algorithms available on the web, such as caverphone, cologne_phonetic, metaphone, double_metaphone and refined_soundex.
I will be using the METAPHONE algorithm to demonstrate how you can write your own phonetic function.
As described in the first article, this is the table I created in order to demonstrate the use of phonetic functions:
CREATE OR REPLACE TABLE PHONETIC (
TESTNO INTEGER NOT NULL DEFAULT ,
BASE VARCHAR(20) NOT NULL DEFAULT,
NAME VARCHAR(20) NOT NULL DEFAULT );
There will be a number of rows for a test (identified by TESTNO). Each row, for a test, will have the same BASE value and a different value for NAME.
The METAPHONE Function
The Metaphone algorithm was created by Lawrence Philips and was published in an article in “Computer Language” in December, 1990. Since then, Lawrence Philips has gone on to develop double_metaphone (for which the code is freely available) and Metaphone 3 (which is a commercial product sold as source code).
Although not a standard function in DB2, you will find a Metaphone function in some databases and as a standard function in some programming languages (such as PHP). Metaphone is different from SOUNDEX in that, instead of translating a string to a code, Metaphone returns a string of consonant sounds.
I have done my best to translate the Metaphone algorithm into SQL PL, but, please note, that the code has not been fully tested. Let the user beware!
This is the code to create the METAPHONE function:
CREATE OR REPLACE FUNCTION METAPHONE(stringInput varchar(70))
RETURNS varchar (25)
LANGUAGE SQL
SPECIFIC METAPHONE
DETERMINISTIC
CONTAINS SQL
RETURNS NULL ON NULL INPUT
NO EXTERNAL ACTION
NOT FENCED
SET OPTION ALWBLK = *ALLREAD ,
ALWCPYDTA = *OPTIMIZE ,
COMMIT = *NONE ,
DECRESULT = (31, 31, 00) ,
DFTRDBCOL = *NONE ,
DYNDFTCOL = *NO ,
DYNUSRPRF = *USER ,
SRTSEQ = *HEX
/*
Metaphone Function
Created by Lawrence Philips.
Published in an article in "Computer Language" December 1990 issue.
*/
BEGIN
declare returnString varchar(25);
declare processString varchar(70);
declare currentChar char(1);
declare twoChar char(2);
declare threeChar char(3);
declare previousChar char(1);
declare nextChar char(1);
declare addChar char(1);
declare totalLength smallint;
declare processCount smallint;
set processString = ltrim(lower(stringInput));
set totalLength = length(processString);
set processCount = 1;
set returnString = '';
--Process beginning exceptions
set twoChar = left(processString,2);
if (twoChar in ('ae', 'gn', 'kn', 'pn', 'wr')) then
set processString = right(processString , totalLength—1);
set totalLength = totalLength—1;
elseif (twoChar = 'wh') then
set processString = 'w' concat right(processString , totalLength—2);
set totalLength = totalLength—1;
end if;
set currentChar = left(processString,1);
if (currentChar = 'x') then
set processString = 's' concat right(processString , totalLength—1);
end if;
if (currentChar in ('a','e','i','o','u')) then
set processString = right(processString , totalLength—1);
set totalLength = totalLength—1;
set returnString = currentChar;
end if;
set currentChar = ' ';
while processCount <= totalLength do
set previousChar = currentChar;
set currentChar = substring(processString,processCount,1);
set addChar = '';
if (processCount < totalLength) then
set nextChar = substring(processString,processCount + 1,1);
else
set nextChar = '';
end if;
if (nextChar <> currentChar or currentChar = 'c') then
set twoChar = substring(processString,processCount,2);
set threeChar = substring(processString,processCount,3);
if (currentChar in ('f','j','l','m','n','r')) then
set addChar = currentChar;
elseif (currentChar = 'q') then set addChar = 'k';
elseif (currentChar = 'v') then set addChar = 'f';
elseif (currentChar = 'x') then set addChar = 'ks';
elseif (currentChar = 'z') then set addChar = 's';
elseif (currentChar = 'b') then
set addChar = 'b';
if (processCount = totalLength and
prevoiusChar = 'm') then
set addChar = '';
end if;
elseif (currentChar = 'c') then
if ((twoChar = 'ch') or
(threeChar = 'cia')) then
set addChar = 'x';
elseif ((twoChar in ('ci','ce','cy')) and
(previousChar <> 's') ) then
set addChar = 's';
else set addChar = 'k';
end if;
elseif (currentChar = 'd') then
if (threeChar in ('dge','dgy','dgi')) then
set addChar = 'j';
else set addChar = 't';
end if;
elseif (currentChar = 'g') then
set addChar = 'k';
if ( ( (twoChar = 'gh') and
(processCount < (totalLength—1)) and
(threeChar not in ('gha','ghe','ghi','gho','ghu') )
) or
(twoChar = 'gn') or
( (previousChar = 'd') and
(twoChar not in ('ga','ge','gi','go','gu') )
)
) then
set addChar = '';
elseIf ( (previousChar <> 'g') or
(twoChar in ('ge','gi','gy') )
) then
set addChar = 'f';
end if;
elseif (currentChar = 'h') then
if not ( ((previousChar in ('a','e','i','o','u')) and
(twoChar not in ('ha','he','hi','ho','hu'))
) or
(previousChar in ('c','s','p','t','g'))
) then
set addChar = 'h';
end if;
elseif ((currentChar = 'k') and
(previousChar <> 'c') ) then
set addChar = 'k';
elseif (currentChar = 'p') then
if (twoChar = 'ph') then
set addChar = 'f';
else
set addChar = 'p';
end if;
elseif (currentChar = 's') then
if ((threeChar in ('sia','sio')) or
(twoChar = 'sh') ) then
set addChar = 'x';
else
set addChar = 's';
end if;
elseif (currentChar = 't') then
set addChar = 't';
if (threeChar in ('tia','tio')) then
set addChar = 'x';
elseif (twoChar = 'th') then
set addChar = '0';
elseif (threeChar = 'tch') then
set addChar = 't';
end if;
elseif ((currentChar = 'w') and
(twoChar in ('wa','we','wi','wo','wu')) ) then
set addChar = 'w';
elseif ((currentChar = 'y') and
(twoChar in ('ya','ye','yi','yo','yu')) ) then
set addChar = 'y';
end if;
end if;
set returnString = returnString concat trim(addChar);
set processCount = processCount + 1;
end while;
return returnstring;
END
Without going through the code in detail, the basic principle is that the input string is processed character by character and character combinations are “translated” to a consonant sound (e.g. ‘z’ becomes ‘s’, ‘ph’ becomes ‘f’, all vowels are dropped).
Using the METAPHONE Function
This statement compares the result of the METAPHONE function with SOUNDEX and DIFFERENCE:
SELECT BASE, NAME,
CASE
WHEN SOUNDEX(BASE) = SOUNDEX(NAME) THEN 'Match'
ELSE 'No Match'
END SOUNDEX,
CASE DIFFERENCE( BASE, NAME)
WHEN 4 THEN 'Hit'
WHEN 3 THEN 'Ballpark'
WHEN 2 THEN 'Middle'
WHEN 1 THEN 'Faint Hope'
ELSE 'No Hope'
END DIFFERENCE,
CASE
WHEN METAPHONE(BASE) = METAPHONE(NAME) THEN 'Match'
ELSE 'No Match'
END METAPHONE,
SCHEMAPT.PT_METAPHONE(BASE) BASE_MP,
SCHEMAPT.PT_METAPHONE(NAME) NAME_MP,
SOUNDEX(BASE) BASE_SX,
SOUNDEX(NAME) NAME_SX,
DIFFERENCE( BASE, NAME) DIFFVALUE
FROM PHONETIC
WHERE TESTNO = 2
ORDER BY NAME;
Our METAPHONE function provides a match on all rows, including those that caused problems for SOUNDEX. The result set also shows the difference in the values returned by the METAPHONE and SOUNDEX functions.

There Is Always A BUT
Although the METAPHONE function solves some of the issues with SOUNDEX, you quickly realize that METAPHONE also stricter that SOUNDEX. When we run the same statement against test 1 (my name), we see that SOUNDEX returns more hits:

Roll Your Own
There is no perfect solution! But, since you are writing the code, you can program in any combination you wish. There is nothing to stop you from changing or adding to the algorithm. A change to two lines would include more hits in the previous example
And what about languages other than English? All of the extra phonetic algorithms have open source projects for different languages — most of which are available on Git hub.
Although phonetic functions are not something that I use on an everyday basis, they are something that I include as part of a “fuzzy” search wherever names are listed. I hope you find a use for them — they are fun!
Paul Tuohy, IBM Champion and author of Re-engineering RPG Legacy Applications, is a prominent consultant and trainer for application modernization and development technologies on the IBM Midrange. He is currently CEO of ComCon, a consultancy firm in Dublin, Ireland, and partner at System i Developer. He hosts the RPG & DB2 Summit twice per year with partners Susan Gantner and Jon Paris.

