Guru: Making Sense of Codes
October 9, 2017 Ted Holt
I have most likely never seen your database, yet I can tell you with confidence that it is full of codes. We can’t live without them. Codes give us shortcuts for all sorts of types and categories. They consume less storage than the values they represent. They help us keep the database clean and consistent within itself.
But they surely can be hard to read. Some codes are obvious. M for male and F for female, for instance. My experience is that most are not so. Look at this and see how much sense you can make of it.
select d.ID, d.Name, d.Breed, d.Sex, d.Size from dogs as d
Time and time again, I have been amazed and impressed that the people I have served were able to work from reports and screens like this. How human beings can memorize so many code values and account numbers is astonishing.
Nevertheless, adding clarity when required is not difficult, especially if you use SQL.
The best way (in my opinion, of course) to explain codes is to create lookup tables. In this example, you could create a table with American Kennel Club breed codes and descriptions.
create table breeds ( ID integer primary key, Name varchar(24) ); insert into breeds values ( 108, 'Labrador retriever'), ( 408, 'Dachshund'), ( 503, 'Chihuahua'), ( 810, 'Collie')
You can join the lookup table to retrieve the breed description.
select d.ID, d.Name, d.Breed, b.Name, d.Sex, d.Group, d.Size, d.Coat from dogs as d left join breeds as b on d.breed = b.ID
But what if you don’t have a lookup table and aren’t allowed to create one? What if there are only a few values? What if there are only two?
One alternative to a lookup table is the CASE statement. This is an easy way to explain code values. Here’s an example:
select d.ID, d.Name, d.Breed, b.Name, case d.Sex when 1 then '1=Dog' when 2 then '2=Bitch' else char(d.Sex) concat '=Unknown' end as Sex, case d.Size when 1 then '1=Miniature' when 2 then '2=Standard' when 3 then '3=Toy' else char(d.Size) concat '=Unknown' end as Size from dogs as d left join breeds as b on d.breed = b.ID
I could have put the sex and size descriptions in columns of their own. That’s what I usually do. But I like this format too. Either way, showing the code itself and the meaning of the code gives the user all pertinent information.
Another way to describe codes without permanent database tables is to use common table expressions. This technique lets you build descriptions “on the fly.” This query uses common table expressions to interpret sex and size codes.
with Sexes as ( select * from ( values (1, 'Dog'), (2, 'Bitch') ) as sx (ID, Description) ), Sizes as ( select * from ( values (1, 'Miniature'), (2, 'Standard'), (3, 'Toy') ) as sz (ID, Description) ) select d.ID, d.Name, d.Breed, b.Name, d.Sex, x.Description, d.Size, z.Description from dogs as d left join breeds as b on d.breed = b.ID left join Sexes as x on d.Sex = x.ID left join Sizes as z on d.Size = z.ID
One technique that database gurus generally frown upon is the One True Lookup Table (OTLT), which is a table of miscellaneous values. Here’s an example:
KEYTYPE KEYVALUE DATA ----------- --------- ----------------------- SEX M Male SEX F Female COLOR 1 Red COLOR 2 Blue COLOR 3 Green COLOR 4 Yellow COLOR 5 Purple TSHIRTSIZE XS Extra small TSHIRTSIZE S Small TSHIRTSIZE M Medium TSHIRTSIZE L Large TSHIRTSIZE XL Extra large TSHIRTSIZE XXL Extra extra large DRESSSIZE 2 Small DRESSSIZE 4 Small DRESSSIZE 6 Medium DRESSSIZE 8 Medium DRESSSIZE 10 Large ITEMTYPE 1 Finished good ITEMTYPE 2 Assembly ITEMTYPE 3 Purchased finished good ITEMTYPE 4 Raw material
Joining to such a table means not only joining on the proper code value, but on the value type as well. Such a table was probably not a bad design in the pre-SQL years, when reference data was accessed through random reads. Using such a table in an SQL query leads to all sorts of problems, which I will not address here. If this topic interests you, you can find plenty of discussion of OTLT on the Web.
What’s the best method for defining codes? I recommend creating lookup tables, even for only a few values. Yes, it means that you may have a lot of itsy-bitsy tables in the database, but relational databases are all about tables, and SQL handles them well. If a lookup table is out of the question, CASE and common table expressions provide a good alternative. The downside to these latter two techniques, however, is that you may have a lot of queries to modify when your organization defines another code value.
Regardless of how we interpret codes, we have the power to clarify data for the benefit of the people we serve. Let’s make life as easy for them as we can.
Although the overhead can be increased, I have used a UDF for a number of these lookups. The UDF eliminates the multiple joins and is easily read/understood.
For individual lookup tables:
Select ShirtStyle, getShirtColor(ShirtColorCode) from ShirtFile.
This works for your One True Lookup Table as mentioned. Just pass the key type.
Select ShirtStyle, getLookupValue(‘SHIRTCOLOR’, ShirtColorCode) from ShirtFile.
Unless I’m missing something, using one table for all the codes (OTLT) should not be any more difficult using SQL than any other method. You could use common table expressions to subset the lookup table:
With Breeds as
( select keyvalue as ID, data as Name
from LookupTable where keytype = ‘BREED’ ),
( select keyvalue as ID, data as Description
from LookupTable where keytype = ‘SEX’ ),
( select keyvalue as ID, data as Description
from LookupTable where keytype = ‘SIZE’ )
select d.ID, d.Name,
from dogs as d
left join breeds as b
on d.breed = b.ID
left join Sexes as x
on d.Sex = x.ID
left join Sizes as z
on d.Size = z.ID
Joining to a OTLT is not impossible, just messy. Consider that all the key values have to be character, so joining to a table with numeric keys always requires conversion. You can find plenty of pros and cons on the Web.