Newsletters Subscriptions Media Kit About Us Contact Search Home

Stuff
OS/400 Edition
Volume 2, Number 16 -- August 14, 2003

Improved XML Validation with Schemas


by David Morris

[The code for this article is available for download.]

A little over a year ago, I described how to validate XML documents with a document type definition, or DTD. At that point, the XML schema specification was new and few programmers were taking advantage of schemas for XML validation. The benefits of schemas over DTDs are clear, and now that XML schema support has stabilized, application programmers should start using them to validate XML data.

Developers are moving to XML schemas for validation because they are much more powerful than DTDs, and some Web services technologies, like SOAP, do not allow DTDs. A DTD validates the structure of an XML document but doesn't allow you to validate its content. For example, with a DTD you can make sure that an employee element contains a hire date element, but you cannot ensure that the hire date contains a valid date. With XML schema validation, you can check both structure and content to ensure date elements contain valid dates and that numeric elements contain numbers. XML schemas can even check for specific values.

Using Web services, FTP, e-mail, or similar technology to exchange information with a business partner is now common. XML schemas provide a clear, concise, and flexible definition of the data being sent. With an XML schema, programmers can be sure that the information they are sending meets certain criteria without resorting to trial and error, because the validation can be easily performed on the source system before sending the data.

An XML schema is itself an XML document. You use an XML parser to validate an XML document against an XML schema. Parser support for XML schemas is growing, and several parsers that run on the iSeries now support schema validation.

The Schema Standard

The World Wide Web Consortium, or W3C, developed the XML schema language to provide an XML-based description of the allowed structure, semantics, and content of XML documents. The XML schema language is described in three documents:

Developers are rapidly adopting the XML Schema 1.0 recommendation. Companies like IBM and Microsoft are using XML schemas throughout their products and development toolsets.

Work continues on the XML schema, and requirements are being collected by the XML Schema Working Group at the World Wide Web Consortium. Version 1.1 will fix some bugs, add some improvements, and, as much as possible, be backward-compatible with Version 1.0.

The Reason for Schemas

There are many advantages to using schemas. A schema provides a clear definition of what is acceptable in an XML document. With a schema, a developer can generate shared information and be sure that that information will be understood. There are plenty of tools to validate XML documents against an XML schema, and many of them are free. Because of this, XML schemas are useful for data exchanges that used to be handled by more costly EDI solutions.

Another reason for the popularity of XML schemas is the applications they support. Both JAXB (Java Architecture for XML Binding) and SOAP (Simple Object Access Protocol) use schemas. JAXB uses schemas to generate Java objects that represent XML content. SOAP uses schemas to ensure that data is correctly formatted, and actually forbids the use of a DTD.

Schema Build Blocks

An XML schema is itself an XML document. Like all XML documents, you should start an XML schema with a document declaration. I generally store XML documents on the iSeries with a Coded Character Set ID (CCSID) of 819 and specify ISO-8859-1 encoding, which is also known as Latin-1. You can create an empty document in Qshell using the touch command like touch -C 819 myxmldocument.xml. The document declaration looks like this.

<?xml version="1.0" encoding="ISO-8859-1"?>

XML schemas use namespaces. A namespace identifies parts of an XML document as belonging to a particular entity and prevents name collisions. For example, I might use a sequence element in an XML document containing a list of assembly instructions. XML schemas also use a sequence element. In this case, their meaning is entirely different. The XML schema sequence is associated with the XML schema namespace that uses the universal resource identifier (URI) http://www.w3.org/2001/XMLSchema, where my hypothetical assembly instruction sequence might be associated with the URI http://www.itjungle.com/examples/xml.

The following XML schema snippet shows the root element and namespace declaration for an XML schema. Elements beginning with xsd: are then associated with the http://www.w3.org/2001/XMLSchema namespace.

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

Everyone has learned that comments in code are good, and XML schemas carry on that tradition, with support for annotations. Annotations provide the capability to add comments. There are two types of annotations: documentation and appinfo. A documentation element describes data descriptions, and an appinfo element supplies application information. The following example shows a documentation element.

xsd:element ref="state">
  <xsd:annotation>
    <xsd:documentation>State or 
                   province abbreviation</xsd:documentation>
  </xsd:annotation>
</xsd:element>

Now that we have covered the introductory parts of an XML schema, it is time to look at the most important part of an XML schema, which is the element declaration. An element declaration defines the allowed content and attributes of an element. In its simplest form, an element declaration looks like this:

<xsd:element name="name" type="xsd:string" />

This element declaration defines a name element containing a string value. In addition to string, there are over 40 built-in types supported by XML schema, including date, duration, language, decimal, float, and boolean. An XML schema also allows you to derive your own simple types.

Simple types work in conjunction with facets, which allow you to further refine the allowed range of values. Examples of facets are length, maxLength, enumeration, totalDigits, and fractionDigits. The following derived simple type allows string values with up to 50 characters.

<simpleType name='customerName'>
  <restriction base='string'>
    <maxLength value='50'/>
  </restriction>
</simpleType>

As you may have guessed, there are also complex types. A complex type typically specifies an element that contains other elements. An employee element with id and name attributes that contain supervisor and homeAddress elements would look like this:

<xsd:element name="employee">
  <xsd:complexType>
    <xsd:sequence>
      <xsd:element ref="supervisor" />
      <xsd:element ref="homeAddress" />
    </xsd:sequence>
    <xsd:attribute name="id" type="xsd:string" use="required" />
    <xsd:attribute name="name" type="xsd:string" use="optional" />
  </xsd:complexType>
</xsd:element>

The employee element declared in the code example can have a supervisor and homeAddress, and they must appear in that order.

Another useful feature of the XML schema is the capability to define relationships between elements and to establish unique element values. This capability is similar to that provided by a relational database. The following defines a unique constraint that ensures an employee element is unique by its id attribute within an employee's element.

<xsd:element name="employees">
  <xsd:complexType>
    <xsd:sequence>
      <xsd:element maxOccurs="unbounded"
                 minOccurs="1" ref="employee" />
    </xsd:sequence>
  </xsd:complexType>
  <xsd:unique name="unique_employee">
    <xsd:selector xpath="employee" />
    <xsd:field xpath="@id" />
  </xsd:unique>
</xsd:element>

One other thing to note in this example is that I have specified that the employees element must contain at least one employee element.

The parts of XML schema that I have shown represent the majority of schema constructs that you will use. Keep in mind that there are many other constructs available, so refer to the XML schema specification as you build your schemas for further information.

Defining a Schema

The easiest way to see how XML schemas work is to look at an example. The following XML document contains information about employee securities purchases.

<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<espp xsi:noNamespaceSchemaLocation="espp.xsd" 
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <employee id="12345" name="David Morris">
    <securities>
      <security>
        <ticker>IBS</ticker>
        <name>International Bonds</name>
        <investmentAmount currency="USD">50.00</investmentAmount>
      </security>
    </securities>
    <address>
      <street>33 East Main</street>
      <city>Whitefish</city>
      <state>MT</state>
      <postalCode>59937</postalCode>
    </address>
  </employee>
  <employee id="11345" name="Bill Gates">
    <securities> 
      <security>
        <ticker>EOSF</ticker>
        <name>Emerging OS Fund</name>
        <investmentAmount currency="USD">75.00</investmentAmount>
      </security>
    </securities>
    <address>
      <street>1 Gates Way</street>
      <city>Seattle</city>
      <state>WA</state>
      <postalCode>89725</postalCode>
    </address>
  </employee>
</espp>

In this case, an employee benefits provider would receive this XML document and purchase securities on the employee's behalf. First, XML documents must be well-formed. For a review of what well-formed means, see "Getting Started with XML." Beyond not being well-formed, there are quite a few errors that can occur in an XML document like this that could prevent a successful exchange.

Here is the full XML schema for the employee-securities-purchase XML document. Using this schema, a developer can understand the structure of the restrictions placed on the content of the XML document.

Looking at this schema (espp.xsd), you may be wondering how to decipher the structure imposed on an XML document that refers to this schema. Without a graphical tool, it can be difficult to envision the structure imposed by a schema. With this schema, the root element is espp, which contains one or more employee elements. The XML and XML schema tools that are part of the WebSphere Development Studio client provide the capability to view a schema graphically and to create context-sensitive XML content based on an existing schema.

Beyond structure, this document imposes several restrictions on content. For example, the investmentAmount element must be a decimal value and have a currency attribute. I kept this example simple, but I could have also imposed length, decimal place, and value restrictions in this schema.

Building a Schema

Because an XML schema is itself an XML document, you can use any text or XML editor to create one. One of the best tools for editing XML schemas is the WebSphere Development Studio client. If you have an iSeries system running OS/400 V5R1 or later, it is likely that you are entitled to WDSc at no additional cost. For more information on getting and installing WDSc, see "WebSphere Development Studio Client for iSeries 5.0: Worth the Wait."

Among WDSc 5.0's XML tools is an excellent XML schema editor. The XML schema editor has three views: design, source, and graph. The design view provides a hierarchical view of an XML schema, along with context-sensitive help and validation to guide you through creating or maintaining an XML schema (see the main XML Schema design panel). The graph view allows you to see the relationships among schema components, and the source view allows you to directly edit schema source.

If all you have is an XML document, you can use the "create schema" option in WDSc. This option will generate a basic XML schema describing your XML document. You can use the generated schema as a starting point and refine it as needed using the design panel. To generate an XML schema from an existing document, double-click the XML document to open it in the XML editor. Now click the XML drop-down menu and select "generate XML schema."

Applying a Schema

Once you have an XML schema, you will want to associate it with an XML document and then parse the XML document to see if it is valid. There are two ways to associate an XML schema with an XML document. The first is to include a reference to the XML schema in the XML document. Here is an example:

<espp xsi:noNamespaceSchemaLocation="espp.xsd"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

Once you have associated an XML schema with an XML document, IDEs like WDSc will allow you to modify and verify the XML documents content based on the associated schema.

You can also associate a schema with an XML document when the XML document is parsed. Late last year I showed a Midrange Guru reader how to do this using JDOM and the Xerces parser, in "Ensuring Valid XML Content." There are many variations of this, but JDOM is still one of the easiest. Here is an updated version of that Java program that accepts the name of an XML document and an XML schema and parses the XML document.

To run this program, you have to install JDOM and Xerces on your system. I described the installation process for JDOM and Xerce in "Configuring Applications with XML." After you have these installed, run ValidDocument in Qshell (QSH) with the following commands:

cd /directorycontainingdemo
export -s CLASSPATH=.:/java/xml/jdom/build/jdom.jar:/java
   /xml/xerces/xercesImpl.jar:/java/xml/xerces/xml-apis.jar
java ValidDocument /tmp/myxmldoc.xml /tmp/myxmlschema.xsd

If your XML document is invalid according to the XML schema you pass to ValidDocument, you will receive error messages that describe the problem.

A Schema Demon

There is no question that XML is becoming a very important part of data exchanges and that XML schemas are the best way to ensure data integrity. XML schemas can be complex for a developer to understand, but in most cases they are interpreted by software that is oblivious to that complexity. The power of schemas makes them a much better choice than DTDs for validating data, and the number of tools supporting schemas has caught up with DTD tools.

If you are a developer, it is likely that you will be assigned to work on a project that uses XML schemas. Web services, JAXB, and other technologies rely on schemas, and the tool support is very good. When you do start working with XML schemas, take the time to order and install WDSc, which has excellent support for XML and XML schemas.


David Morris is a software architect at Plum Creek Timber Company, and started the iSeries-toolkit open-source project. E-mail: dmorris@itjungle.com.


Sponsored By
PROFOUND LOGIC SOFTWARE

Have you done the RPG-Alive thing yet?

Thousands of programmers have adopted RPG-Alive, and are now able to read and understand RPG code 2 to 3 times faster.

To try RPG-Alive on your system, visit http://www.RPGAlive.com/now.

"I am very happy with RPG-Alive! It's a terrific productivity booster!" says Brian Johnson of Help/Systems.

See other user testimonials at http://www.rpgalive.com/testimonials.html.


THIS ISSUE
SPONSORED BY:

Profound Logic Software
WorksRight Software


BACK ISSUES

TABLE OF
CONTENTS
Improved XML Validation with Schemas

CODE/400 Command Shell


Editors
Shannon O'Donnell
Kevin Vandever

Managing Editor
Shannon Pastore

Contributing Editors:
Howard Arner
Raymond Everhart
Joe Hertvik
Ted Holt
Marc Logemann
David Morris

Publisher and
Advertising Director:

Jenny Thomas

Advertising Sales Representative
Kim Reed

Contact the Editors
Do you have a gripe, inside dope or an opinion?
Email the editors:
editors@itjungle.com


Copyright © 1996-2008 Guild Companies, Inc. All Rights Reserved.