|
|||||||
|
|
![]() |
|
|
Database Normalization, Part 2 by Jim Coker The first article in this series discussed what is meant by the term database normalization and how it pertains to development on an iSeries. In this article, we will take our database design to second normal form, to eliminate redundant data, and then to third normal form, to address potential database anomalies. The last article gave an example of a poorly designed student database that was then modified to "first normal form." That design appears below. (For consistency between articles, I will use the same diagrams that were used in the first article.)
Table B1: Student Information (1NF)
Table B2: Student Class (1NF) In this article, we will take our database design to second normal form and then to third normal form. To begin, let's consider what's wrong with the 1NF design. There are two problems with the present design: update anomalies and delete anomalies. So what the heck are those? An anomaly indicates that there is inconsistent data in the database. Let's take a closer look. Update Anomalies Suppose several students are enrolled in a class. The class ID is WD, and the class title is "Web Development." Then the class is renamed to "Java Programming." The first challenge is that our application must go through all of the rows in the student class table to see if the class ID is WD and, where necessary, change the class title from "Web Development" to "Java Programming." If a row is not updated, there will be a problem. Some rows have a class ID of WD, with a class title of "Java Programming," while other rows have a class ID of WD, with the old class title of "Web Development." So there is now inconsistent data defined for class ID WD. That is what's known as an update anomaly. Delete Anomalies Let's use the same example outlined above. Suppose there are students enrolled in a class with a class ID of WD and a class title of "Web Development." Now suppose that every student were to drop out of the class. Removing the rows containing class ID WD created a new problem: There is no longer any trace of a class with a class ID of WD! By removing the students enrolled in the class, we inadvertently removed the class itself. That is what's called a delete anomaly. Second Normal Form, or 2NF We have a mess on our hands with these potential anomalies floating around in the database. So what shall we do about it? Go to second normal form, of course! Rule: Eliminate redundant data. If an attribute depends on only part of a multi-value key, remove it to a separate table. Every non-key attribute must be related to the key. If the key consists of more than one field, every non-key field must be related to the entire composite key. This rule might seem a little intimidating at first. But we will use an example to step you through it. Our first task is to eliminate redundant data. This is done by looking for any attribute that depends on only part of a multi-valued key. The student class table has a multi-value key that consists of Student ID and Class ID. However, it has some non-key attributes (or fields), class title and instructor, which are related only to Class ID and have nothing to do with Student ID. The 2NF rule instructs us to remove these attributes and put them in a separate table. So we will now create a third table in the database called Class.
Table C1: Student Information (2NF)
Table C2: Student Class (2NF)
Table C3: Class (2NF) Notice that we only moved the non-key attributes out of the student class table into the new table. Class ID remains part of the key in Student Class because there are, in fact, attributes (namely the grade earned in the class) that are based on the entire composite key of Student ID and Class ID. We have solved the problem of update anomalies because we can now be certain that there is only one class title for any given class ID, since there is exactly one row for that class ID in the new table. We have also addressed the issue of delete anomalies, since a class can still be defined in the new table even if there are no students enrolled in it. We are in second normal form, and finally everything is great, right? We have eliminated the redundant data and anomalies. Or have we? Taking another look at the 2NF design, there is still the possibility of an update or delete anomaly. Reviewing the attributes of the student information table, you see the name and address of the school. While the school information is clearly "related" to the student, it is not "dependent" on the student. Therefore, we can once again introduce inconsistent values for the school name. Also, the school cannot exist in the database unless there is at least one student attending it. Once again, we are faced with update and delete anomalies. Third Normal Form, or 3NF But wait! The reason we went to 2NF was to eliminate anomalies. So how is it that we are in 2NF and still have anomalies? That's because we only addressed those attributes based on part of a composite key. The remaining problems exist because the school information is not dependent on any of the key fields. To solve this problem, we need to take the design to third normal form. Rule: Eliminate columns not dependent on the key. If attributes do not contribute to a description of the key, remove them to a separate table. All non-key fields must be exclusively dependent on the key field. This means it is time to break the school information out to a separate table. The school address, city, state, and ZIP code are all related to the school, not to the student. Therefore, those fields do not contribute to a description of the student ID and, in turn, do not belong in the student information table. The following table shows the revised database in third normal form.
Table D1: Student Information (3NF)
Table D2: Student Class (3NF)
Table D3: Class (3NF)
Table D4: School (3NF) Looking at each of the four tables in the database, you find in each case that all non-key attributes are related to the entire key. There is a wonderful summary of what it takes to declare a table is in third normal form. In their book DB2: Design and Development Guide, Wiorkowski and Kull boiled it down to one statement: "The rules leading to and including the third normal form can be summed up in a single statement: Each attribute must be a fact about the key, the whole key, and nothing but the key." By measuring all tables in the database with this one test, we can confirm that the design is in third normal form, which, by definition, means we are also in second normal form and, in turn, first normal form. That is a lot of validation resulting from one easy phrase! Don't Give Up Now. We're Almost There! In this article, we started with a database that was in first normal form. We moved all attributes that were related to part of a composite key to a separate table. That eliminated some anomalies and took us to second normal form. We then moved all the attributes that were not related to any of the key fields to a separate table. That eliminated the remaining anomalies and left our design in third normal form. In many cases, it is acceptable to bring your design to 3NF, and that will provide adequate flexibility for a robust, maintainable application. In the third and final article in this series, I will reveal the less commonly used fourth and fifth normal forms. We will then examine some trade-offs and discuss how much normalization is appropriate for a given application design. Jim Coker is the IT director for Behr Process Corp. in Santa Ana, California. He has worked on the AS/400 and iSeries since 1989. E-mail: jcoker@behr.com
|
Editors
Contact the Editors |
| Copyright © 1996-2008 Guild Companies, Inc. All Rights Reserved. |