So Where Is PureXML for DB2/400?
March 12, 2007 Timothy Prickett Morgan
With much fan-fare last week, IBM rolled out its “Viper” database technology into the version of DB2 for its z/OS-based mainframes. Viper is not just a relational database, which can store information in the traditional format that is friendly to SQL and other query methods, but also can store information in an XML format as well–and do so natively.
IBM calls the native XML capability PureXML, and unlike past approaches, including prior releases and versions of DB2 for the mainframe as well as the Unix/Linux/Windows variant andcurrent approaches with DB2/400, PureXML allows XML documents to be stored without losing the hierarchical information that is inherent in them and still allows the elements of the XML documents to be queried. This technology has been in development for five years, and was first released in the DB2 V9 releases for Windows, Unix, and Linux in July 2006. IBM holds a patent on the PureXML approach, by the way.
According to Bernie Spang, director of data servers at IBM–data server being the new-fangled name for a database management system at Big Blue these days and not to be confused with a database hardware appliance–past approaches to dealing with XML data were not exactly ideal. The approach, he explained, is to store a whole XML document inside of a database field; while this works, SQL queries only work across cells, not within them, which means you could not easily index XML documents for fast searches and queries.
The other approach, which is called shredding, takes the XML document and parses it up into chunks and then stores the chunks inside different columns within the same row in the relational database management system. This approach allows for the elements of the XML document to be queried, but it is no longer a hierarchical XML document that can be used in production. Recomposing it on the fly out of the database obviously takes a lot more resources than hitting a single cell in the database to retrieve the whole XML document, as can be done in the other approach.
With PureXML, IBM has figured out a way to store the XML document, index the data within documents and across documents, and allow a single SQL query to scan documents. This capability is clearly important for the services oriented architecture (SOA) approach to application development, which uses XML as a kind of superglue between disparate and often incompatible systems and applications to create individual services and then composite applications built from those services.
“With SOA, you can’t just say, ‘Whatever. I just need an SQL data server,'” says Spang. “The decision about what data server you use is every bit as important as the application server you choose.”
Various industries are standardizing the XML documents they intend to use for services–ACCORD in financial services, HIPPA in healthcare, ARTS in retail, and so forth–and this PureXML capability is something they are looking for–especially since the same DB2 database can store plain old row and column data used to traditional transaction processing systems. But, perhaps more importantly, the DB2 Viper family of products is showing that it can handle XML data with anywhere from three to right times the performance of prior DB2 versions, which were using the two other methods mentioned above to cope with XML documents.
Now that PureXML is available on the mainframe as well as Unix, Linux, and Windows servers, this begs the obvious question: When will the DB2/400 database at the heart of i5/OS get the PureXML technology? I contacted IBM’s press relations people, and I received a cut response that “there’s no release date on that yet.” I asked if I could get a more concrete statement about whether it was in development or not, and if I could talk to the good people in the Rochester labs about if, when, and how the remaining strategic IBM server platform would get PureXML support. So far, my phone line has been real quiet. But, I will start poking around.
My guess is that one of the reasons why the future i5/OS V5R5 or i/OS V6R1–whatever IBM calls the future operating system version or release for the future Power6-based System i servers–has PureXML support, and if there are reasons to push that operating system release out to early 2008, then adding PureXML to the database is one good reason to take a little extra time. Then again, for all I know, IBM is happy for customers to keep using DB2 XML Extender, a tool for grafting XML support onto DB2/400 that has been around for several years.