Guru: Three Ways To Manage Unmatched Data
April 24, 2017 Ted Holt
Heaven forbid that I would ignore a failed RPG CHAIN (random read) operation. I always take appropriate action. Which action I take depends on the situation. The same applies to outer joins that don’t find matching data in a secondary table. Here are three ways to deal with unmatched data in an outer join using SQL.
To illustrate, let’s use three tables from an overly simplified general ledger system. The first is a table of departments into which the business is divided. The second is a chart of accounts. The third is a transaction file that feeds the general ledger. Those tables look like this:
select ID, Name from dept
ID | NAME |
1 | Accounting |
2 | Sales |
3 | Shipping |
4 | Receiving |
select ID, Name from acct
ID | NAME |
1000 | Cash in bank |
1010 | Petty cash |
1020 | Accounts receivable |
2000 | Accounts payable |
2010 | Notes payable |
3000 | Owner’s equity |
4000 | Sales |
5000 | Salaries |
5010 | Employee benefits |
5020 | Taxes |
5030 | Office expenses |
select * from glxacts
ID | DATE | DEPT | ACCOUNT | AMOUNT |
1 | 2017-04-24 | 1 | 5030 | 100.00 |
2 | 2017-04-24 | 3 | 5030 | 150.00 |
3 | 2017-04-24 | 5 | 5090 | 300.00 |
4 | 2017-04-24 | 2 | 5030 | 200.00 |
5 | 2017-04-24 | 1 | 5000 | 1,000.00 |
6 | 2017-04-24 | 2 | 5000 | 2,000.00 |
7 | 2017-04-24 | 3 | 5000 | 1,500.00 |
8 | 2017-04-24 | 4 | 5000 | 1,250.00 |
Notice that the department ID and account number in transaction 3 do not match any rows in the department and chart of accounts tables.
Method 1: Deal With Null Values
By default, the database manager returns nulls for the columns of a secondary table when no row matches a row in the primary table. Here’s such a query:
select t.ID, t.date, t.dept, d.name as Department, t.account, a.name as Account, t.amount from glxacts as t left join dept as d on t.dept = d.ID left join acct as a on t.account = a.ID
ID | DATE | DEPT | DEPARTMENT | ACCOUNT | ACCOUNT | AMOUNT |
1 | 2017-04-24 | 1 | Accounting | 5030 | Office expenses | 100.00 |
2 | 2017-04-24 | 3 | Shipping | 5030 | Office expenses | 150.00 |
3 | 2017-04-24 | 5 | null | 5090 | null | 300.00 |
4 | 2017-04-24 | 2 | Sales | 5030 | Office expenses | 200.00 |
5 | 2017-04-24 | 1 | Accounting | 5000 | Salaries | 1,000.00 |
6 | 2017-04-24 | 2 | Sales | 5000 | Salaries | 2,000.00 |
7 | 2017-04-24 | 3 | Shipping | 5000 | Salaries | 1,500.00 |
8 | 2017-04-24 | 4 | Receiving | 5000 | Salaries | 1,250.00 |
The department name and account name are null for row 3 in the result set. There is nothing wrong with this. You must consider that the DEPARTMENT and ACCOUNT columns may have null values, and program accordingly. RPG programs require the use of null indicator variables, which are five-digit integers. A negative value means that the corresponding column is null.
Method 2: Fill In Default Values
Dealing with nulls can be a nuisance. You can use the COALESCE, VALUE, and IFNULL functions to convert nulls to non-null values. These three functions return the first non-null value in the list of arguments (parameters). If all of the arguments are null, these functions return the null value. I prefer COALESCE because it is the most standard of the three and because it can accept more than two arguments.
select t.ID, t.date, t.dept, coalesce(d.name, '**INVALID**') as Department, t.account, coalesce(a.name, '**INVALID**') as Account, t.amount from glxacts as t left join dept as d on t.dept = d.ID left join acct as a on t.account = a.ID
ID | DATE | DEPT | DEPARTMENT | ACCOUNT | ACCOUNT | AMOUNT |
1 | 2017-04-24 | 1 | Accounting | 5030 | Office expenses | 100.00 |
2 | 2017-04-24 | 3 | Shipping | 5030 | Office expenses | 150.00 |
3 | 2017-04-24 | 5 | **INVALID** | 5090 | **INVALID** | 300.00 |
4 | 2017-04-24 | 2 | Sales | 5030 | Office expenses | 200.00 |
5 | 2017-04-24 | 1 | Accounting | 5000 | Salaries | 1,000.00 |
6 | 2017-04-24 | 2 | Sales | 5000 | Salaries | 2,000.00 |
7 | 2017-04-24 | 3 | Shipping | 5000 | Salaries | 1,500.00 |
8 | 2017-04-24 | 4 | Receiving | 5000 | Salaries | 1,250.00 |
Method 3: Raise An Error
You may tell a query to cancel itself when it finds unmatched data. After all, why return multitudes of rows if the result set will be unusable?
select t.ID, t.date, t.dept, coalesce(d.name, raise_error('88001', 'Bad department ' concat varchar(t.dept) concat ' for ID ' concat varchar(t.ID ))) as Department, t.account, coalesce(a.name, raise_error('88002', 'Bad account ' concat varchar(t.account) concat ' for ID ' concat varchar(t.ID ))) as Account, t.amount from glxacts as t left join dept as d on t.dept = d.ID left join acct as a on t.account = a.ID
The RAISE_ERROR function forces an SQL statement to end abnormally. It has two arguments: the SQL state to be returned and the message text to describe the error. I have placed this function as the second argument in the COALESCE functions. If the department column of the transaction table matches the department ID in at least one row of the departments table, the result set contains the department name. Otherwise, the database manager cancels the query with a SQL state value of 88001. An unmatched account number cancels with SQL state 88002. The SQL code value in those cases is -438. In the job log you’ll see messages CPF503E:
User-defined function error on member GLXACTS. The external program or service program returned SQLSTATE 88001. The text message returned from the program is: Bad department 5 for ID 3.
And SQL0438:
Message Bad department 5 for ID 3 returned from SIGNAL, RESIGNAL, or RAISE_ERROR.
These are all good ways to deal with unmatched data. It’s good to have options.