Guild Companies, Inc.  
 
Midrange Programmer - How-To Advice & Free Code
OS/400 Edition
Volume 1, Number 6 - March 28, 2002

Getting Started with XML

by David Morris

Extensible Markup Language is a dialect of Standardized Generalized Markup Language. The World Wide Web Consortium created this new dialect of SGML to provide a simple alternative to SGML for describing data exchanged between software applications. Although XML is a simplified version of SGML, it is powerful enough to describe almost any data, in a format that is understood by the majority of computers in use today.

One difference between XML and other markup languages is that XML is a meta-markup language. In other words, it describes information about the markup but does not describe the domain-specific implementation. This allows XML to be adapted to fulfill more requirements than other markup languages, such as HTML. The ability to adapt to new functions or services makes XML very powerful.

Documents marked up with XML have many benefits over documents that are stored as plain text or are marked up using a less capable markup such as HTML. Those benefits include:

  • Data marked up with XML is self-describing
  • XML relies on simple text, which eliminates most compatibility issues
  • You can extend XML to fit almost any domain

These are just a few of XML's benefits. Just like Java, XML is sometimes over-hyped and presented as the solution to all problems. It is not the solution to all problems, but it is a key technology that addresses many of today's computing problems.

Defining an XML Document

With XML, you define document elements and attributes that are used to mark up your information. An element can represent some piece of information, such as an address, telephone number, or a person's name. Attributes are associated with an element and identify information that is typically not printed or displayed. A payment type might be stored as an attribute for a payment element.

Example 1 shows how a payment transaction might be defined using XML:

<payment type="ET">
    <amount>500.00</amount>
    <unit>USD</unit>
</payment>

In the example, there are three elements: payment, amount, and unit. The payment root element contains the amount and unit elements and has a type attribute. This example also shows how XML elements are nested. Unlike HTML, XML tags must have both an opening and a closing tag. For example, the unit closing tag in this example, which is nested in the payment tag, has to be closed before the payment tag is closed. Another difference between HTML and XML is that XML is case sensitive. A <Unit> tag is not the same as a <unit> tag.

eXtending with XML

Unlike most other markup languages (including HTML), which restrict you to a fixed set of tags, XML allows you to create new tags. With HTML, you cannot create your own tags. XML is a meta-markup language, which means you can use XML to describe your own domain-specific elements. Domain-specific elements are elements that are useful in describing content related to a specific area. For example, there is a domain-specific Chemical Markup Language, which allows chemists to describe data in a way that facilitates the exchange of chemical information.

XML's extendibility makes it very powerful. In example 1, the payment root element contains an amount and a unit element that describes a payment. Example 2, which follows, shows how to add a received date element to the payment transaction, which demonstrates that you can extend XML to fit your needs:

<payment type="ET">
   <amount>500.00</amount>
   <unit>USD</unit>
   <received-date>
      <year>2001</year>
      <month>10</month>
      <day>29</day>
   </received-date>
</payment>

Now the example contains a received-date element. Notice that I broke down the received-date element into three separate year, month, and day elements. I chose to break these down because an XML element should contain data and not structure. Removing structure from elements allows maximum flexibility when working with XML documents.

Creating XML Documents

An XML document is textual in nature and uses standard ASCII characters. To create an XML document, you combine character data and markup tags. You can use a specialized XML editor, or simply start your favorite text editor; WordPad will work fine for most simple XML documents. There are also several specialized XML editors. The best XML editors I have used are SoftQuad's XMetal and IBM's WebSphere Studio Site Developer, which is currently in beta.

On the iSeries, you can use the EDTF command to type in XML documents or type them in on your PC and transfer them to your iSeries system. You can use just about any encoding, including Unicode for XML documents, but on the iSeries it is best to stick with the standard ASCII International Standards Organization Latin character sets and use their corresponding iSeries code pages. For example, in the United States you would use ISO-8859-1 with a coded character set ID of 819, for Cyrillic it is ISO-8859- 5, with a CCSID of 857.

For XML documents that you want to build in a program you have several choices. You can use the Unix file APIs to create XML documents from an ILE program. The open-source iSeries-toolkit has a Unix module that provides this type of support. Another option is to use a parser, like the XM L Interface for RPG and Procedural Languages. The latter allows your RPG applications to play along in this open data transportation game and will be covered in more detail in an upcoming article. A parser is useful when you want to validate the format of your XML document or manipulate individual elements.

A complete XML document should start with an XML declaration. The document declaration specifies the version of XML and the encoding. Enclose comments in an XML document between <!-- and -->. Example 3, which follows, is a complete XML document that describes a log delivery transaction.

<?xml version="1.0" encoding="ISO-8859-1" ? >

<!-- Sample log delivery XML document -->
<deliveries>
   <load scale-type="DTL">
      <scale-ticket>12345</scale-ticket>
      <weight>42168</weight>
      <weight-uom>LBS</weight-uom>
      <scale-uom>US</scale-uom>
      <delivered-date>
         <month>10</month>
         <day>29</day>
         <year>2001</year>
      </delivered-date>
      <log>
         <species>WESTERN LARCH</species>
         <grade>PEELER</grade>
         <large-end-diameter>15</large-end-diameter>
         <small-end-diameter>12</small-end-diameter>
         <length>32</length>
      </log>
      <log>
         <species>WESTERN LARCH</species>
         <grade>PEELER</grade>
         <large-end-diameter>13</large-end-diameter>
         <small-end-diameter>9</small-end-diameter>
         <length>32</length>
      </log>
   </load>
</deliveries>

All XML documents consist of XML text, which is character data and markup. Markup is everything but your content and includes start tags, end tags, comments, and entity references. Delimiters surround markup. The most commonly used are tag delimiters, which are less-than (<) and greater-than (>) symbols, and entity delimiters, which are the ampersand (&) and semi-colon (;).

These are the main components you will use when creating an XML document:

  • Elements comprised of tags like <log> and </log>
  • Attributes that add additional information like menuitem="Y"
  • Entity placeholders for text or binary files as &REPLACEMENT
  • Processing instructions to embed non-XML information
  • Comments that describe an XML document
  • Text, which supplies the most common form of XML content

All XML documents use some combination of these components. XML supplies strict rules that describe how and where these components may be used.

Notice that the XML document in example 3 does not describe presentation. The syntax provided by XML allows you to describe the content of a document. This capability allows you to describe the content of any document and, just as importantly, allows you to separate the content of a document from the document's presentation. Style is another term for the presentation format.

Adding Style to Your XML

After creating an XML document, you might want to display the contents in a Web browser. Unlike HTML, XML has no built-in style, so you have to combine your XML document with a stylesheet. Style allows you to describe formatting for data contained in an XML document. Style describes layout, fonts, color, and behavior for the elements in an XML document.

There are several popular ways to describe style for an XML document. The first, Cascading Style Sheets (CSS), is widely supported in browsers. The second, Extensible Stylesheet Language (XSL) uses an XML variant that consists of two parts. The first part of XSL is a language for transforming XML documents from one format to another. The second part of XSL provides a vocabulary for specifying formatting semantics.

In this brief overview, I won't get into the details of CSS and XSL. If you do need to present XML data in a browser, you will need to use one of these. CSS is mature and widely supported by browsers. However, CSS is very limited, particularly when you need to restructure data into a list or table. With XSL, browser support is very limited. The best way to use XSL at this point is to transform XML to HTML on your iSeries server using XSLT.

How's My Form?

One important feature of XML is that it provides built-in assurance that the form and content of information is correct and reliable. There are several ways that XML provides this assurance:

  • Documents must be well-formed, adhering to XML's syntax
  • Documents can conform to a DTD or Schema
  • XML can't fix or interpret malformed documents

Although XML provides the flexibility to create new tags and content as necessary, all XML documents and extensions must conform to XML's rules. There are two sets of rules: the first ensures that the basic syntax and structure of a document are correct; the second set of rules is provided by a Document Type Definition or Schema and applies domain-specific validation.

When a document's basic syntax and structure are correct, the document is considered to be well formed. Programs that process XML documents check for conformance to these basic rules and are allowed to identify errors, but the XML specification specifically prohibits the correction of errors or interpretation of any document that is not well formed. In addition, programs that process XML documents cannot ignore errors.

The following list describes some of XML's rules:

  • If included, an XML document declaration starts on line one
  • Every XML document must have one root element
  • All elements must have matching begin and end tags
  • Elements must be properly nested
  • XML is case sensitive
  • White space outside of elements is ignored
  • Attribute values must be enclosed in single (') or double (") quotes
  • Use &amp; for ampersand (&) and &lt; for the less than (<) symbol inside of XML markup

If an XML document does not follow these rules, it is not well formed and cannot be processed. Part of the job of an XML parser is to check an XML document to make sure that it is well formed.

Validating XML Documents

The next level of validation for XML documents uses a document type definition (DTD) or Schema to apply domain-specific validation. A DTD describes what elements and attributes are valid in an XML document and may specify the order of elements and other relationships. DTDs have been around a long time and are widely used and understood. Instead of XML, DTDs use Extended Backus-Naur Form (EBNF) to describe data, which has it roots in Standard Generalized Markup Language (SGML). I won't go into detail on DTDs, but example 4, which follows, shows a simple DTD that describes a recipe element and its contained elements. A document that meets the criteria of a DTD or Schema is well formed and valid. A validating parser performs these optional validity checks.

<!ELEMENT recipe (name, ingredient+, instruction+)>
<!ELEMENT name (#PCDATA) >
<!ELEMENT ingredient (quantity?, description) >
<!ELEMENT quantity (#PCDATA) >
<!ELEMENT description (#PCDATA) >
<!ELEMENT instruction (#PCDATA) >

Schemas are XML documents and are more powerful than DTDs. Unlike a DTD that can check the structure and order of tags, a schema can make sure that the data contained within an element conforms to certain rule. The Schema specification became a recommendation in 2001, so support for Schemas is still spotty. For recipes, DTDs are fine, but in time, Schemas should replace DTDs, particularly for the types of business applications that are popular on the iSeries. The Schema snippet in example 5, which follows, validates the log element from example 3.

<?xml version="1.0" encoding="ISO-8859-1" ? >
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<!-- Validating the log element -->
<xs:element name="log" type="logType"/> 
<xs:complexType name="logType"> 
   <xs:sequence> 
      <xs:element name="species" type="xs:string"/> 
      <xs:element name="grade" type="xs:string"/>
      <xs:element name="large-end-diameter" 
type="xs:positiveInteger"/>
      <xs:element name="small-end-diameter" 
type="xs:positiveInteger"/>
      <xs:element name="length" type="xs:positiveInteger"/>
   </xs:sequence> 
</xs:complexType>
</xs:schema> 

From the example, you can see that Schemas allow you to check many things. Like a DTD, Schemas allow you to check the structure of an XML document. Schemas also allow you to verify whether an element contains the right type of data, such as string, positive integer, decimal, and date. You can even check for a list or range of values. If you decide to validate data on the iSeries with Schemas, you are best off using the latest version of the Apache Software Foundation's Xerces parser.

XML on the iSeries

Even on the iSeries platform, which tends to wait for new technologies to mature before embracing them, XML is beginning to make headway in areas like Electronic Data Interchange and Web publishing. IBM is pushing Web Services as a way to process XML-based transactions. Products like WebSphere use XML extensively to store configuration data.

Many of the XML technologies, including Web Services, are not mature enough to be in widespread use. A few companies such as Microsoft, Sun, and IBM have standards-based solutions that are not entirely open. Evolving standards from the World Wide Web Consortium Web Services Activity groups should resolve some of the compatibility issues.

Java supplies the most complete and up-to-date XML support. On the iSeries, there is some built-in XML support for RPG and C, but you will find more Java-based examples and programs that run on the iSeries. The IBM Information Center's "XML Tools for OS/400" page describes the tools IBM provides for the iSeries.

<conclusion>

Use of XML is growing quickly. Many products like WebSphere use XML to support configuration data. Recently, new products and services are using XML to define data exchanged between disparate computers. Because the foundation of XML is simple text, XML is able to span the void between applications running on almost any computer system.

Unlike HTML, XML describes data and not presentation. Data is described more accurately than with HTML, so the intended use is clearer. XML defers presentation to stylesheets that provide quite a bit of formatting flexibility. HTML is no longer being extended so today's capabilities are all that will ever exist. XML formatting begins where HTML leaves off and XSL style sheets already provide more formatting options than HTML and CSS.

XML is a meta-markup language, which means XML describes information about markup. Because of this, you can extend XML to fit almost any problem domain. The potential of XML is still developing. Programmers are finding clever ways to use XML in their applications that make use of standard XML tools. On the iSeries, Java is the language of choice for programmers using XML, but RPG and COBOL programmers have decent support through the XML4PR parser and Unix APIs.

In future articles, I will take a more in-depth look at XML. I plan to provide more information on parsers, XML document validation, XSL and XSLT, Web Services, and other XML technologies. </conclusion>

David Morris is a software architect at Plum Creek Timber Company and started the iSeries-toolkit open-source project. He can be contacted by e-mail at dmmorris@itjungle.com.

Sponsored By
COMMON

Attend the COMMON IT Conference in Nashville, April 14-18

We've posted a lot of new information on the COMMON Users Group's Spring 2002 IT Education Conference & Expo, April 14-18 at Opryland in Nashville.

Go to: http://www.common.org/Conferences/conf.html

Go to the link above and read more about:

OUTSTANDING EDUCATION: We're offering 720 plus sessions and 37 topic groups (Courses of Study) in Nashville. Also, learn how to follow a Focused Education Roadmap (FER) so you can master a specific subject.

WIRELESS iSeries: We've included more information about our wireless iSeries tour and group of wireless sessions. LANSA -- an iSeries e-business vendor -- is helping provide equipment for this highly informative FER.

NEW OPEN LABS: In Open Labs, you learn-by-doing at your own pace. New labs include Client Access Express for Windows Programming Tools, and iSeries Access for Web and WebSphere.

NEW BANQUET: Our Spring 2002 Awards Banquet will feature Amanda Gore, one of Australia's best-known speakers. Ms. Gore's topic will be "Building Corporate Communities."

WEEK-AT-A-GLANCE: See how your five days will shape up.

REGISTER here: http://www.common.org/LANSA/confnew.html

After you complete the conference registration, stay online and make your hotel arrangements at Opryland. You'll want to stay here, because Opryland is our total conference facility. Under one roof, we'll have all session rooms, the Expo, hotel rooms and special events. Also, all registrants who make their Gaylord Opryland hotel reservations through the COMMON Housing Bureau are automatically entered in the "COMMON Cents!" raffle. Grand prize is $500 CASH. Go to: http://www.common.org/Conferences/Conferences_02/spring/housing.html. If you need more assistance, please call 800-270-8223 and we'll gladly help you.

THIS ISSUE
SPONSORED BY:
BCD Int'l
Client Server Dev.
COMMON
SoftLanding Systems
Profound Logic Software
Midrange Blue Book
BACK ISSUES
TABLE OF CONTENTS
Form Up to Learn About HTML Forms
Recursive Calls Using Subprocedures
Getting Started with XML
Working with Parameters and Variables in Qshell Scripts
Test Data Queue Applications Using Java
CODE/400's Code/Editor: Tips and Tricks
  Newsletters | Subscribe | Advertise | About Us | Contact | Search | Home  
  Last Updated: 3/27/02
Copyright © 1996-2008 Guild Companies, Inc. All Rights Reserved.