Composing An XML Document From Relational Data, Part 1
August 8, 2012 Hey, Mike
My boss asked me if I know of a cheap or free tool that can automatically export a DB2 (for i) table to an XML file. I told him that I could write a Java program pretty easily that could accomplish this for a specific table, but I’m not aware of any way to “generically” do it for any table.
select * from deleteme TheKey SomeText 1 Row Number 1 2 Row Number 2
That code should produce output something like this:
<?xml version="1.0" encoding="UTF-8" ?> <deleteme> <row> <TheKey>1</TheKey> <SomeText>Row Number 1</SomeText> </row> <row> <TheKey>2</TheKey> <SomeText>Row Number 2</SomeText> </row> </deleteme>
I’m pretty sure the DB2 utility from QSH could probably do this, but I don’t know how to do it. Do you know of an easy way to accomplish this on DB2 for i? Thanks!
I’m not aware of any free tool to accomplish this. In fact, even the latest IBM iAccess tools (including the file transfer utility and the RunSQL “save results as”) fail to provide this kind of save as XML option.
I believe part of the reason IBM hasn’t given us a way to do this is because offering XML as an export option opens up a whole can of worms. Think about it: Do you want to export your data as elements? How about as attributes or a combination of both? What do you want to call your root element and what case should it be?
IBM could probably make this work for single table flat data, but what if you have multiple tables in a query? What if you’re joining customers with sales orders and you want the sales data nested within the customer data? With XML, the export options are endless and having the user configure an XML export could be a little daunting.
However, DB2 for i 7.1 includes full support for creating XML documents almost any way you want. While there is no generic utility in DB2 to do this type of conversion automatically, the SQL required to build an XML is a simple pattern that can be molded into a generic utility.
For now, I’ll cover how to use the new XML features to build a simple XML from relational data. Below is the base SQL data retrieval statement:
-- Base Query SELECT PRODUCTID,NAME,PRODUCTNUMBER,COLOR FROM ADVWORKS.PRODUCT ORDER BY NAME
The resulting XML should look like this:
<?xml version="1.0" encoding="UTF-8" ?> <Products> <Product> <ProductId>1</ProductId> <Name>Adjustable Races</Name> <ProductNumber>AR-5381</ProductNumber> </Product> <Product> <ProductId>352</ProductId> <Name>Front Derailleur Linkage</Name> <ProductNumber>FL-2301</ProductNumber> <Color>Silver</Color> </Product> </Products>
Author’s Note: Column COLOR is nullable and hence doesn’t appear in the XML when it is null.
And now we’ll transform the base query to build the XML document. The first thing to do when converting table row data to XML elements is to use the XMLROW function.
-- Step 1: Convert a tabular row to an equivalent "XMLROW" SELECT XMLROW(PRODUCTID AS "ProductId",NAME AS "Name",PRODUCTNUMBER AS "ProductNumber",COLOR AS "Color" OPTION ROW "Product") AS XML_ROW FROM ADVWORKS.PRODUCT ORDER BY NAME
XMLROW accepts a list of columns and returns an XML fragment. Each column name passed to the function will be converted to an XML element with its data as the element value. In this case, I want the element names to be mixed case so I aliased the column names within the function. Finally, OPTION ROW is used to define a parent element. The element “Product” will be a “parent element,” and all of the column elements are “child elements.”
The result of this query now looks like this (only a few sample rows are shown here):
At this point, there is only one column in the result set (called XML_ROW) that contains the requested data as XML elements. But since the goal is for all of the information to be placed in one XML document, the query should return only one row with one XML column.
So the next step is to combine all of the XML rows into a single row and column. Aggregate function XMLAGG (XML Aggregate) can combine the XML data from all of the product rows to create a single column:
-- Step 2: Aggregate all rows (ORDER BY goes in nested table expression) SELECT XMLAGG(XMLROW(PRODUCTID AS "ProductId",NAME AS "Name", PRODUCTNUMBER AS "ProductNumber",COLOR AS "Color"OPTION ROW "Product") ) AS XML_AGG FROM (SELECT * FROM ADVWORKS.PRODUCT ORDER BY NAME) PRODUCT
Since XMLAGG is an aggregate function it can be used just like any other aggregate function with a GROUP BY. Though in this case, aggregating the XML of all the rows doesn’t require a GROUP BY. The input to the XMLAGG function is the result of the XMLROW function; each XMLROW is concatenated into one large string. The abridged result set containing only one row and one column looks like this:
Author’s Note: There is no white space inserted in the XML; I inserted the line breaks for readability.
With an aggregate query, the ORDER BY cannot specify a column that is not in the SELECT list. Therefore, to maintain the ordering of the data in the XML, the base query was converted to a nested table expression.
An XML document has only one root element. The next step to polish off the XML is to add a root element named “Products”. The XMLELEMENT function can be used to define an element. This function accepts two inputs: an element name and its value. In the query below, the name given to our root element is “Products” and the value is going to be the result of the XMLAGG aggregate function:
-- Step 3: Add root element SELECT XMLELEMENT(NAME "Products", XMLAGG(XMLROW(PRODUCTID AS "ProductId",NAME AS "Name",PRODUCTNUMBER AS "ProductNumber",COLOR AS "Color" OPTION ROW "Product")) ) AS XML_DOCUMENT FROM (SELECT * FROM ADVWORKS.PRODUCT ORDER BY NAME) PRODUCT
The abridged result looks like this (the same result as the prior query with the exception that there is now a root element):
Again, the line breaks were added to the XML for readability. The actual result is one long string with no line breaks.
The XML is now close to the one shown above. The last item to add is the XML declaration (i.e., <?xml version=”1.0″ encoding=”UTF-8″?>) to the beginning of the XML document. The XMLSERIALIZE function will perform this task for us:
-- Step 4: Add the XML declaration SELECT XMLSERIALIZE(XMLELEMENT(NAME "Products", XMLAGG(XMLROW(PRODUCTID AS "ProductId",NAME AS "Name", PRODUCTNUMBER AS "ProductNumber",COLOR AS "Color" OPTION ROW "Product"))) AS CLOB(1M) INCLUDING XMLDECLARATION) AS XML_DOCUMENT FROM (SELECT * FROM ADVWORKS.PRODUCT ORDER BY NAME) PRODUCT
XMLSERIALIZE accepts an XML input parameter, and can optionally add an XML declaration to it and cast the entire result to a string or binary value. In this case the input is everything we’ve already done to this point (an aggregate of XML fragments for each row in the Products table combined with a root element). The abridged result now looks like this:
Because of the function nesting, the SQL statements that generate XML can get a little messy. However, it’s pretty easy to understand at each step how these functions work and it is definitely worth the small time investment to learn how to construct XML with DB2 for i. IBM has done a great job in giving developers a versatile approach to building XML.
Attribute Based XML
One simple variation I want to introduce is how to create an attribute based XML result instead of an element based XML. The code is exactly the same except for the addition of the AS ATTRIBUTES option. Backing up to the first step in the query:
-- Step 1A (Alternative) - Attribute Based SELECT XMLROW(PRODUCTID AS "ProductId",NAME AS "Name",PRODUCTNUMBER AS "ProductNumber",COLOR AS "Color" OPTION ROW "Product" AS ATTRIBUTES) AS XML_ROW
The abridged result now looks like this:
Every column given to XMLROW is now rendered as an attribute value instead of an element. Attribute based XML is smaller than element based XML because the ending element tag isn’t required. For my small data set, the XML size was reduced from about 63K (element based) to 42K (attribute based).
For reusability, it’s possible to create a view producing the desired XML as follows:
CREATE VIEW ADVWORKS.PRODUCT_XML AS SELECT XMLSERIALIZE(XMLELEMENT(NAME "Products", XMLAGG(XMLROW(PRODUCTID AS "ProductId",NAME AS "Name", PRODUCTNUMBER AS "ProductNumber",COLOR AS "Color" OPTION ROW "Product"))) AS CLOB(1M) INCLUDING XMLDECLARATION) AS XML_DOCUMENT FROM (SELECT * FROM ADVWORKS.PRODUCT ORDER BY NAME) PRODUCT
From here, the QShell DB2 utility can be used to dump the results of the XML to an IFS file:
db2 select xml_document from advworks.product_xml >/tmp/products.xml
However, the DB2 utility includes headings and a row count that must be stripped out to make the result useful. Therefore, I actually favor using RPG and either the CLOB_FILE SQL data type or the IFS APIs to dump the XML data directly to the IFS.
While it’s not the prettiest thing to look at, it is possible to construct a utility that accepts a simple SQL statement and adds the necessary XML functions in order to get the desired XML result.
In a future tip, I’ll demonstrate how to construct an XML document from multiple tables where nesting is required.