• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Ensuring Valid XML Content

    December 13, 2002 Timothy Prickett Morgan

    Hey, David:

    I will preface this email by saying that my knowledge of XML is limited, but I’ve been able to use the Xerces parser to move XML data to a physical file using QSH/Java.

    Now, if you’re receiving an XML file from a customer, how do you guarantee that the customer’s XML document references your external DTD? What would prevent the customer from embedding an “invalid” DTD in the XML document?

    — Chris

    The problem you describe is one of the limitations you have to deal with when you use a Document Type Definition (DTD) to validate XML documents. A DTD enables you to do some high-level checking of an XML document. With a DTD, you can check the basic structure of an XML document, but you cannot check the actual content.

    You cannot easily ignore a DTD embedded in XML content. In addition to providing validation, DTDs are used to supply default information. For example, you can define an entity in a DTD that provides a replacement value for entity references in an XML document.

    In this situation, an XML schema, which is more flexible than a DTD, might work better to validate the data you receive. Schemas can coexist with DTDs and allow you to check an XML document’s structure and content. The ability to check content allows you to type-check the elements and attributes contained in an XML document. For example, you can make sure that start_date elements contain a date, and age elements contain a positive integer. In addition, you can use a schema to check that an attribute’s values fall in a certain range or contain certain values.

    If you decide that DTDs are more trouble than they are worth, you might also want to consider using Simple Object Access Protocol (SOAP). The SOAP specification specifically prohibits the use of DTDs and is gaining popularity allowing you to sidestep this issue without appearing too rigid.

    I created an XML document with an internal DTD and validated it against a schema using the JDOM b8 parser. I like to use JDOM because it simplifies processing of XML documents. Under the covers, JDOM uses a parser that you can specify. Because it has the best schema support, I specified the Xerces 2.2.1 parser from the Apache Software Foundation.

    In the example, the XML document has an internal DTD with an entity that supplies a company name. Ignoring or removing the DTD from this document will make it so that the reference to company name (the personnel attribute entity reference that specifies “&company-name;”) will not be replaced with “Big Company.” Here is that XML document, which you should save as personal.xml.

    <?xml version="1.0" encoding="ISO-8859-1"?>
    <!DOCTYPE personnel [
      <!ENTITY company-name "Big Company">
      <!ELEMENT family (#PCDATA)>
      <!ELEMENT given (#PCDATA)>
      <!ELEMENT name (family,given)>
      <!ELEMENT nationality (#PCDATA)>
      <!ELEMENT person (name,nationality)>
      <!ATTLIST person id CDATA #IMPLIED>
      <!ELEMENT personnel (person+)>
      <!ATTLIST personnel company CDATA #IMPLIED>
      <!ATTLIST personnel >>
    
    <personnel company="&company-name;" 
      xsi:noNamespaceSchemaLocation="personal.xsd" 
      >"http://www.w3.org/2001/XMLSchema-instance">
      <person id="Big.Boss">
        <name><family>Boss</family> <given>Big</given></name>
        <nationality>Roman</nationality>
      </person>
    
      <person id="one.worker">
        <name><family>Worker</family> <given>One</given></name>
        <nationality>Greek</nationality>
      </person>
    
      <person id="two.worker">
        <name><family>Worker</family> <given>Two</given></name>
        <nationality>Phoenician</nationality>
      </person>
      
      <person id="three.worker">
        <name><family>Worker</family> <given>Three</given></name>
        <nationality>Greek</nationality>
      </person>
    
      <person id="four.worker">
        <name><family>Worker</family> <given>Four</given></name>
        <nationality>Greek</nationality>
      </person>
    </personnel>
    

    The schema supplied with the example checks to see that the XML document is properly structured. In addition to structure, the schema ensures that the ID and name element values are unique and that the nationality specified is either Greek or Roman. Here is the schema I used, which you should save as personal.xsd.

    <?xml version="1.0" encoding="UTF-8"?>
    <xs:schema >'http://www.w3.org/2001/XMLSchema'>
    
     <xs:element name="personnel">
      <xs:complexType>
       <xs:sequence>
         <xs:element ref="person" minOccurs='1'
                maxOccurs='unbounded'/>
       </xs:sequence>
       <xs:attribute name="company" type="xs:string"
                 use="required"/>
      </xs:complexType>
    
      <xs:unique name="unique1">
       <xs:selector xpath="person"/>
       <xs:field xpath="name/given"/>
       <xs:field xpath="name/family"/>
      </xs:unique>
      <xs:key name='empid'>
       <xs:selector xpath="person"/>
       <xs:field xpath="@id"/>
      </xs:key>
     </xs:element>
    
     <xs:element name="person">
      <xs:complexType>
       <xs:sequence>
         <xs:element ref="name"/>
         <xs:element ref="nationality"/>
       </xs:sequence>
       <xs:attribute name="id"  type="xs:ID" use='required'/>
      </xs:complexType>
     </xs:element>
    
     <xs:element name="name">
      <xs:complexType>
       <xs:all>
        <xs:element ref="family"/>
        <xs:element ref="given"/>
       </xs:all>
      </xs:complexType>
     </xs:element>
    
     <xs:element name="family" type='xs:string'/>
    
     <xs:element name="given" type='xs:string'/>
    
     <xs:element name="email" type='xs:string'/>
    
     <xs:element name="nationality">
        <xs:simpleType>
         <xs:restriction base = "xs:string">
           <xs:enumeration value="Roman"/>
           <xs:enumeration value="Greek"/>
         </xs:restriction>
        </xs:simpleType>
     </xs:element>
    </xs:schema>
    

    The Java program I wrote applies the schema to the XML document. Save the source in a demo directory as ValidDocument.java. Next, edit the program so that it specifies the correct location for the personal.xsd file. Here is the source for the program:

    package demo;
    
    import org.jdom.Document;
    import org.jdom.JDOMException;
    
    import org.jdom.input.SAXBuilder;
    
    import org.jdom.output.XMLOutputter;
    
    import java.io.File;
    import java.io.FileInputStream;
    import java.io.IOException;
    
    
    /**
     * Class ValidDocument provides an XML document validated
            against the personal.xsd schema.
     * @author David Morris
     */
    public class ValidDocument {
        // XML file to read
        File file;
    
        public ValidDocument(File file) {
            this.file = file;
        }
    
        public Document build()
            throws JDOMException, IOException {
            // Create new SAXBuilder, using default parser
            SAXBuilder builder = 
            new SAXBuilder("org.apache.xerces.parsers.SAXParser", true);
    
            // Uncommenting the following line ensures that
                  the document received stands alone
            // builder.setFeature("http://apache.org/xml/features/
                  nonvalidating/load-external-dtd", false);
            builder.setFeature("http://apache.org/xml/features/
                  validation/schema", true);
            builder.setProperty("http://apache.org/xml/properties/schema/
                  external-noNamespaceSchemaLocation",
                "file:///C:/projects/examples/src/xml/personal.xsd");
    
            Document doc = builder.build(new FileInputStream(file));
    
            return doc;
        }
    
        public Document validate(Document doc)
            throws JDOMException, IOException {
            return doc;
        }
    
        public static void main(String[] args) {
            try {
                File file = new File(args[0]);
                ValidDocument validDocument = new ValidDocument(file);
                Document doc = validDocument.build();
    
                // Output the document to System.out
                XMLOutputter outputter = new XMLOutputter();
                outputter.output(doc, System.out);
            }
            catch (Exception e) {
                e.printStackTrace();
            }
    
            System.exit(0);
        }
    }
    

    Before compiling or running this program, add xercesImpl.jar and jdom.jar to your CLASSPATH. On my system, I stored the documents and program in the /temp directory. I switched to the temp directory and used the following commands to compile and run the program:

    javac demo/ValidDocument.java
    java demo.ValidDocument
    

    Running the program resulted in the following output:

    Value ‘Phoenician’ is not facet-valid with respect to enumeration ‘[Roman, Greek]’

    I have covered a lot of ground here, but there is no easy way to deal with DTDs embedded in XML content.

    — David

    Sponsored By
    ADVANCED SYSTEMS CONCEPTS

    Business Analytics
    – Practical –
    – Cost Effective –
    – Easy to Deploy –

    SEQUEL FYI

    User Quote:
    “I love SEQUEL FYI because it lets me look at the data any way I need to see it, instantly. This is the easiest tool to manage complex product relationships that I have ever seen.”

    SEQUEL FYI offers outstanding OLAP business intelligence functionality for a fraction of the cost of comparable solutions.

    Read More > View Streaming Video

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags: Tags: mgo_rc, Volume 2, Number 93 -- December 13, 2002

    Sponsored by
    VISUAL LANSA 16 WEBINAR

    Trying to balance stability and agility in your IBM i environment?

    Join this webinar and explore Visual LANSA 16 – our enhanced professional low-code platform designed to help organizations running on IBM i evolve seamlessly for what’s next.

    🎙️VISUAL LANSA 16 WEBINAR

    Break Monolithic IBM i Applications and Unlock New Value

    Explore modernization without rewriting. Decouple monolithic applications and extend their value through integration with modern services, web frameworks, and cloud technologies.

    🗓️ July 10, 2025

    ⏰ 9 AM – 10 AM CDT (4 PM to 5 PM CEST)

    See the webinar schedule in your time zone

    Register to join the webinar now

    What to Expect

    • Get to know Visual LANSA 16, its core features, latest enhancements, and use cases
    • Understand how you can transition to a MACH-aligned architecture to enable faster innovation
    • Discover native REST APIs, WebView2 support, cloud-ready Azure licensing, and more to help transform and scale your IBM i applications

    Read more about V16 here.

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Readers Prefer ISDB Your System/36 Information is Very Helpful

    Leave a Reply Cancel reply

MGO Volume: 2 Issue: 93

This Issue Sponsored By

    Table of Contents

    • Reader Feedback and Insights: RPG Compiler Directives and V4
    • Ensuring Valid XML Content
    • Spacing between Concatenated Strings

    Content archive

    • The Four Hundred
    • Four Hundred Stuff
    • Four Hundred Guru

    Recent Posts

    • Liam Allan Shares What’s Coming Next With Code For IBM i
    • From Stable To Scalable: Visual LANSA 16 Powers IBM i Growth – Launching July 8
    • VS Code Will Be The Heart Of The Modern IBM i Platform
    • The AS/400: A 37-Year-Old Dog That Loves To Learn New Tricks
    • IBM i PTF Guide, Volume 27, Number 25
    • Meet The Next Gen Of IBMers Helping To Build IBM i
    • Looks Like IBM Is Building A Linux-Like PASE For IBM i After All
    • Will Independent IBM i Clouds Survive PowerVS?
    • Now, IBM Is Jacking Up Hardware Maintenance Prices
    • IBM i PTF Guide, Volume 27, Number 24

    Subscribe

    To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

    Pages

    • About Us
    • Contact
    • Contributors
    • Four Hundred Monitor
    • IBM i PTF Guide
    • Media Kit
    • Subscribe

    Search

    Copyright © 2025 IT Jungle