Stuff
OS/400 Edition
Volume 1, Number 18 -- October 10, 2002

Referential Integrity: Ensuring Your Database Contains Only Valid Data


by Jim Coker

Referential integrity is a feature that is included in DB2 Universal Database for the iSeries. While it has been present for several releases of OS/400, this article will discuss referential integrity for OS/400 V5R1. The underlying premise of referential integrity is that a developer can define business rules that govern data stored in a database. However, unlike coding these rules into a high-level language, such as RPG IV, the rules exist in the database itself and are in effect all of the time.

display

Types of Constraints

Referential integrity is made up of two basic elements: referential constraints and check constraints. Let's take a look at these constraints and how they differ from each other.

Referential constraints

A referential constraint is a rule that defines the relationship between two tables. When designing a relational database, the developer may deploy a one-to-many, or "parent-child," relationship. In such a relationship, the application must be designed to avoid creating database anomalies. For example, there can never be a valid entry in which "orphaned" records--"child" records with no associated "parent" records--appear. This relationship must be enforced at all times, regardless of whether the application is performing an insert, update, or delete operation to either of the files. This is precisely the mission of a referential constraint: to guarantee that no table operations violate the defined relationship between the tables. If an application attempts to create a situation that does not conform to the rule, a database error is generated and returned to the application without allowing the operation to be completed.

Check Constraints

Unlike referential constraints, a check constraint only enforces the values that go into one table. For example, there might be a restriction on the number of values that may be placed in a column. Perhaps a direction column can contain only N, NE, E, SE, S, SW, W, or NW. A check constraint could be used to make sure a value of EW is not inserted in the direction column. Or, instead of allowing one or more discrete values, the constraint may enforce a range of values. Perhaps the direction column can contain a value between 1 and 360 degrees. Or a percentage column might range from -100 to 100. A check constraint can also compare a column value with another column value or constant to make sure it is EQ, NE, GT, LT, GE, or LE. These are all situations in which a check constraint could be deployed.

Other Terms and Definitions

Some other basic terms and definitions are used when discussing referential integrity.

A primary key constraint is a unique constraint with some special attributes. In particular, it is a field or set of fields in a database file that must be unique, ascending, and cannot contain null values. The primary key constraint can be used as the parent key when adding a referential constraint. When you assign a unique key to a file, you are implementing a primary key constraint. An example of a primary key constraint would be defining the customer number field as the unique key of the customer master file.

A unique constraint is a field or set of fields in a database file that must be unique and ascending. Unlike a primary key constraint, a unique constraint may contain null values. An example might be assigning an invoice number to an order once it ships. An order that has not shipped would have a null value in the invoice number field. After it ships, it would be assigned a unique invoice number from a sequential list. The invoice number would then qualify as a unique constraint.

A parent key is a primary key constraint or unique constraint that is used to add a referential constraint to the dependent file. For example, assume that we have an order header file keyed by order number. The order number field would be the parent key in the parent file. Adding a referential constraint to the order detail file, keyed by order number and line number, would require using the parent key "order number" in the order header file.

A foreign key is a field or set of fields in which each non-null value must match a value in the parent key of the related parent file. The attributes of the foreign key must match the attributes of the parent key. A foreign key might be found in an order header file in which the customer number value appears in the customer master file.

A parent file is the file in a referential constraint relationship that contains the parent key. This would be the file that contains "one record" in the one-to-many relationship. For example, the order header would be the parent file in a referential constraint relationship with order detail.

A dependent file is the file in a referential constraint relationship that contains the foreign key. This file is dependent upon the parent file. For every non-null value in the foreign key of the dependent file, there must be a corresponding non-null value in the parent key of the parent file. In our example, "order detail" would be the dependent file in a referential constraint relationship with the order header file.

Why Use Constraints?

Many iSeries applications interface with heterogeneous systems or applications running on platforms such as Windows, Windows NT, Linux, or Unix. For example, it is not uncommon to pass data into DB2 from a shop-floor data collection device or perhaps receive data from a vendor application over an extranet. In many environments, it is no longer safe to assume that only your RPG code will be updating a database. Therefore, it is also no longer safe to presume that you can define your business rules in your RPG code and entrust that code to enforce the rules. There are several other tools and methods, both within the iSeries and outside of it, that permit updating the database. A person could use DFU or one of the third-party tools to go into the specific table and update one or more rows. Query Manager/400 (QMQUERY) will permit you to issue SQL INSERT, UPDATE, and DELETE statements against one or more tables. SQL/400 will do the same. Going outside of the iSeries, you don't have to look far to find a variety of client programs that can connect to DB2/400 with a simple and readily available ODBC or JDBC connection. Operations Navigator itself has an SQL interface that allows you to modify data.

The point is that there are many paths to DB2 data that do not go through traditional RPG programs. Bypassing that code means that the business rules are not necessarily being enforced through all of these other access methods. Referential integrity can be deployed to make sure the rules are, in fact, enforced 100 percent of the time, regardless of which method is used to insert, modify, or delete the rows in your database!

There Is No Free Lunch

So if these rules are being examined for each and every database transaction, what is that doing to my system performance and response time? The answer is that it depends. Several things such as the volume of transactions and the types of constraints defined will affect performance. If you define cascading deletes across nine related tables, you are going to see a lag in response time while the database determines how many rows must be deleted from each table. This will also multiply the performance hit of other database features, such as journaling. So keep in mind that, yes, there is a cost for referential integrity in terms of system performance.

On the other hand, you might experience an improvement in performance, since the rules and relationships are enforced at the system level in the database. Instructions executed at this level run more efficiently than similar logic placed in a high-level language. Just as you would weigh the pros and cons of creating additional indices over your database, you should also consider the factors associated with adding referential integrity.

My advice is that you take time to understand all of the effects of referential integrity--both good and bad--before you define it over your entire ERP system or some other mission-critical database.

Only the Beginning

Now that we have covered some basic groundwork for referential integrity, you're probably foaming at the mouth for more. Good. Look for my next article, in which I will dig a little deeper and step through a couple of examples.


Sponsored By
LANSA

LANSA Integrator Simplifies Business

Apria Healthcare, Celtic Insurance, John Wiley & Sons and others have used LANSA Integrator to automate their business operations and reduce transaction costs with their trading partners.

See what LANSA Integrator can do for your business.

www.lansa.com/products/integratoroverview.htm


THIS ISSUE
SPONSORED BY:

T.L. Ashford
Magic Software
LANSA
ASNA
Profound Logic Software
WorksRight Software


BACK ISSUES

TABLE OF CONTENTS
Performing Surgery on the Windows Registry

Back to Basics: Side-by-Side Subfiles

Referential Integrity: Ensuring Your Database Contains Only Valid Data

Qshell String Manipulation

Exploring Data-Type Acronyms

The Client Access Express Toolkit


Editors
Shannon O'Donnell
Kevin Vandever

Managing Editor
Shannon Pastore

Contributing Editors:
Howard Arner
Joe Hertvik
Ted Holt
David Morris
Richard Shaler

Publisher and
Advertising Director:

Jenny Thomas

Contact the Editors
Do you have a gripe, inside dope or an opinion?
Email the editors:
editors@itjungle.com



Last Updated: 10/10/02
Copyright © 1996-2008 Guild Companies, Inc. All Rights Reserved.