Create A Generic Auditing Trigger With SQL
September 25, 2013 Hey, Mike
Note: The document accompanying this article is available for download here.
I am writing a trigger receiver program using RPG with embedded SQL that will go through the before and after record images and create an audit record in an audit file for each field that was changed on an update operation.
I am developing a “shell” trigger program that I’d like to be able to modify slightly, and use the shell as a basis to write these audit records for each file with a trigger on it.
Instead of writing multiple (IF this field changed THEN DO…) for each field in each file, I was wondering if I could use a SQL Descriptor (or some other way in SQL) to do part of this. I want to create a dynamic insert statement to insert the audit record, and have SQL be able to retrieve the contents of the field name in the trigger buffer, and not have to hardcode every column name.
For example, if I detect a change in a city field (name is ACCITY), I want to load the contents of the “before” buffer’s ACCITY field into the BEFOREDATA field in the inserted audit record, and also load the contents of the “after” buffer’s ACCITY field into the AFTERDATA field in the inserted audit record.
Do you have any ideas on how to do that? Would a SQL descriptor help me with this? I’m hoping there is an easy way to do this rather than hardcoding a bunch of IF/THEN statements for each field in a file that needs to have changes tracked.
I am currently working on a machine with V7R1 installed.
The good news is, being on IBM i 7.1, there is a way to implement this type of trigger. The bad news is, the performance of this technique will be relatively slow. Therefore this trigger sample will be useful for auditing changes on master tables that don’t change frequently. For auditing a high volume transaction table, I would skip this generic trigger method in favor of coding a trigger specifically for the oft-used table.
I’ll demonstrate how to code this trigger in SQL. I chose not to use an RPG-based trigger because there is no way to dynamically iterate through the columns in the BEFORE/AFTER buffers. SQL Descriptors are not available to use with the buffer data passed to the trigger.
The sample code provides a generic template that can be used to accommodate any table that has up to 100 columns. (This maximum can be increased.) Also, the base table is assumed to have columns that can be represented as single-byte character variables up to 32,000 characters in length. Columns that use binary, or large objects will require additional coding. In this code, columns with these problem data types, if present, are ignored. Additionally, depending on the CCSID, certain CHARACTER, NCHARACTER and GRAPHIC data types may need additional coding. IBM i 7.1 is required and installing the latest PTFs is highly recommended.
The trigger is defined on the “AFTER” event and uses “statement” granularity, meaning that the trigger is only executed after the underlying data change from a native I/O operation, UPDATE, or MERGE statement has completed. Further, recall that the transient tables provided by DB2 to the trigger contain the BEFORE and AFTER picture of every row updated by the statement. If 100 rows are updated by a single SQL statement, the trigger has a copy of each row’s before and after values.
When the trigger is invoked, cursors are opened against both the BEFORE and AFTER transient tables. The DESCRIBE CURSOR statements populate the BEFORE and AFTER SQL Descriptors with information about the table’s columns including number of columns, names, data types, nullability, etc. This information is used to process columns dynamically. Because cursors are relatively expensive and DESCRIBE CURSOR supplies the needed info on the table definition, I opted to make the trigger fire once for all rows (statement level) instead of repeatedly firing on individual rows.
When the cursors are opened, the before image of the first updated row is fetched into the SQL descriptor as is the after image. Once the SQL descriptors contain information from the FETCH, GET DESCRIPTOR is called once for each column in the table in order to return the current row’s column values. This is dynamically eliminating the need to write IF statements for each column.
Each column value is placed in a single byte character variable for both the BEFORE and AFTER rows. Once the BEFORE/AFTER values are stored in variables, their values can be compared. If a change is detected then an audit entry can be written in the desired format (plain text, XML, etc.). In the sample code, data changes are formatted with plain text and placed in a simple audit log table.
The sample trigger is built on the ancient QIWS/QCUSTCDT sample table so anyone can use it.
The trigger assumes a simple log table called QIWS.LOG exists to record the changes made to QIWS.QCUSTCDT:
CREATE TABLE QIWS.LOG ( KEY_DATA VARCHAR(256), DATA_CHANGES VARCHAR(30000), ADDSTAMP TIMESTAMP DEFAULT CURRENT_TIMESTAMP);
I used VARCHAR(30000) instead of a CLOB because CLOBs require commitment control. This trigger code assumes commitment control is inactive.
Table QCUSTCDT has a handful of columns including STATE and CITY. The primary key is CUSNUM. To show the trigger’s handiwork, for row CUSNUM=583990, assume the current STATE value is “MO” and the current CITY is “ISLE”. After running this UPDATE statement:
UPDATE QIWS.QCUSTCDT SET City='Island', State='MO' WHERE CUSNUM=583990;
The LOG table will hold an audit of the change like this:
Notes On Modifying The trigger Template
There are only a few items to change in this template when using it for your app’s tables.
First, I suggest placing an ORDER BY on both BEFORE/AFTER cursor SELECT statements. This is mostly superstition but I have no guarantee that DB2 has to return the rows in the same order. The columns in the ORDER BY should match the table’s PRIMARY KEY or UNIQUE constraint column(s). Therefore, in the sample, I specified an ORDER BY on column CUSNUM, the table’s sole unique identifier column.
One other place that needs attention is the code that records a modified row’s key values. The following IF statement’s IN predicate should be updated with the PRIMARY KEY or UNIQUE constraint column name(s).
-- Modify this to build one or more key columns -- to (if possible) uniquely identify the row -- that was changed IF @column_name IN ('CUSNUM') THEN SET @Key_Data=@Key_Data || COALESCE(@column_name,'(null)') || ': ' || COALESCE(@column_data_after,'(null)') ||' '; END IF;
Other than that, just change the program name and underlying table name and you’re good to go. Of course, this trigger only handles UPDATE operations. Auditing INSERT and DELETE operations is much simpler because they don’t have to compare all of the values.
Coding Points of Interest
All data types in the table are converted to character and placed in the @column_data_before and @column_data_after variables (both defined as VARCHAR(32000)). Normally, when retrieving a parameter or column value with GET DESCRIPTOR, the receiver variable’s data type is required to match the parameter or column data type. However, this can be overridden.
Just after the cursors are opened, the SET DESCRIPTOR statement is used to override the expected host variable data type for all columns to VARCHAR(32000) so that generic VARCHAR variables can receive and compare just about any data type value:
WHILE @column<=@columns DO GET DESCRIPTOR 'SQLDS_DATA_BEFORE' VALUE @column @column_name=NAME, @data_type=TYPE; -- Override host variable data type SET DESCRIPTOR 'SQLDS_DATA_BEFORE' VALUE @column TYPE=12, LENGTH=32000; SET DESCRIPTOR 'SQLDS_DATA_AFTER' VALUE @column TYPE=12, LENGTH=32000; END WHILE;
All of this data conversion is handled courtesy of DB2. Without this “override” step, the trigger would need multiple variables defined with the exact same data types as the table columns. Not exactly a “generic” solution.
One drawback of using SET DESCRIPTOR to override data types in this manner is that it erases some of the column information stored in the descriptor (column name, etc.). Therefore, the trigger is saving off the column information in a temp table so that it can be retrieved later. I covered this topic in more detail in my article Override Default Data Types in the SQL Descriptor.
Within the loop where the row data is fetched and processed, GET DESCRIPTOR is called for each column to pull in the column values into the generic variables for comparison. There are no FETCH statement into variables as you’d normally see. GET DESCRIPTOR returns the values one at a time.
For the purpose of this example, any column changes are dumped to a table called “LOG”. When NULL values are encountered, the special value “(null)” is placed in the audit log. Personally, I think XML is a better way to store this type of data changes but XML has an additional cost to compose and parse.
Since this is a set-based AFTER trigger, it is possible that data can be changed without the trigger firing. Consider the case where an UPDATE statement that targets 20 rows in a table is running without a transaction. Say the UPDATE statement updates 10 rows, encounters a decimal data error and then stops due to the error. In this case, 10 rows were changed, there is no rolling back of the partially updated data, and because of the error, the trigger never fired.
Journaling is often a good way to capture data change events. If you’re already using journaling, you may want to consider getting this “change” information from your journal receivers.
If you want to supercharge your auditing, you can enhance the trigger to add references to the new client special registers (CURRENT CLIENT APPLNAME, CURRENT CLIENT WRKSTNNAME, etc.), or get the IP address of the TCP/IP host running the script using the QSYS2.TCPIP_INFO view.
This trigger was built on an IBM i 7.1 machine with DB2 for i group PTF level 24 installed. I encountered the following issues that may be my ignorance or DB2 bugs:
In summary, this template can be used for about any table. Only the trigger name, table name, key column names, and CURSOR ORDER BY clauses need to be modified. Of course, you’ll need to change the auditing code to suit your own table(s).
To be fancy, you could use the technique shown in my article, Retrieve The Call Stack In DB2 For i, to dynamically retrieve the underlying table name to get the primary or unique key columns on the table. From there, it is possible to use a descriptor or system catalog to dynamically determine the key column names and thereby alleviate the need to hardcode the key columns. However, this adds additional overhead and complexity.
The beauty of this technique is that the SQL Descriptors allow the column comparison to be dynamic. If new columns are removed or added to the table, or if a data type is changed, the trigger is smart enough to compare the values. Also, since SQL is casting all of the data values to character, this generic template will handle most data types without lots of obnoxious code. The downside is that this dynamic approach to data process comes with a processing tax so consider carefully its appropriateness for your environment.