An Introduction To Processing XML With RPG, Part 1: The Basics
August 4, 2015 Jon Paris
Note: The code accompanying this article is available for download here.
RPG IV’s built-in XML support has been available for some time now, having been originally introduced with V5R4 back in 2006. Due to deficiencies in that release, many RPG’ers chose not to use RPG to process XML. Fortunately, IBM has corrected those deficiencies, and RPG provides a smart way to read XML. Here’s how it’s done.
In that first release, the most powerful of the op codes, XML-INTO, had some gaps in its support that sometimes made it difficult to use without pre-processing the XML. In addition, the size limits on variables, Data Structures, and arrays prior to V6 often made it difficult to process all but the smallest of XML documents. As a result the number of RPGers that adopted it was relatively small. Some switched to using XML-INTO’s little brother, XML-SAX. Others reverted to using Java, PHP, or third-party tools.
All this began to change when V6 came along and relaxed the various size limits that had been a barrier for many. Subsequently most of the remaining issues were remedied via PTFs in both the V6 and V7 releases. The result is a powerful and easy to use method for handling the processing of XML documents whether these are received in the form of a file or as variables in response to a web service call.
In this series of tips I am going to start with the basics and then, as we progress, illustrate how to handle the more difficult situations that can present themselves. That said, let’s get started.
XML-INTO: A “Move The Mountain” Op-Code
The heart of RPG’s XML support is XML-INTO. On the surface it is a deceptively simple op-code, using only two factors. The first identifies the target for the extracted data and the second information about the XML source. As you will see, it does a lot under the covers, so much so that I put it into the “move the mountain” category along with other powerful operations such as EXFMT.
This is the basic syntax:
XML-INTO xmlTarget %XML( xmlSource : processingOptions );
“xmlTarget” is where the extracted data will be placed. It can be almost anything, from a simple field, to an array, to a Data Structure (DS), to a DS array, the list goes on. The reason there are so many possibilities is that the “shape” (i.e., structure) of the target must match that of the original XML document. RPG determines that shape from the names of the variables and their hierarchy. This is one of those cases where it is far easier to explain what I mean with a couple of simple examples.
Suppose that the XML document contains address information for a number of customers and looks like this:
<Customers> <Customer> <Name>Brown and Sons</Name> <City>San Jose</City> <State>CA</State> </Customer> <Customer> <Name>Smith and Jones Inc. </Name> .... </Customers>
Then the target would have to look like this:
d customer ds Dim(99) Qualified d name 40a d city 40a d state 2a
Because the <Customer> element repeats, it has to be represented as an array. And because it is a compound element (consisting of the three elements Name, City, and State) it must, in RPG terms, be represented as a DS. Note that in this example I have placed the name, city, and state fields in the same sequence in the DS as they were in the XML document. This is not essential but in this case makes sense. In XML-INTO terms, the only requirement is that the fields be in the same hierarchical position, i.e. subordinate to customer. You’ll see this in action in the second example, where I deliberately changed the sequence.
If we assume that the XML document is contained within the IFS file Customers.xml in directory XMLDocs, then the XML-INTO operation needed to process the document would be:
XML-INTO customer %XML( fileLocn: 'doc=file case=any');
Where the character variable fileLocn contains the value /XMLDocs/Customers.xml.
Note that I have used two processing options here. The first, “doc=file”, tells the compiler that the variable “fileLocn” contains the name of the file to be processed. Without this option the RPG compiler would assume that the variable contained the XML. You will forget to add this option (we all do) and when you do there will be a run time error indicating that the XML document does not appear to be valid. Not surprising, since it is trying to process a file name as if it were XML!
The second option, “case=any”, is one that you will have to use just about every time you use XML-INTO. Why? Because XML element names are case sensitive and in order to match them up to the RPG definitions the names must first be converted to upper-case since that is how all RPG names are seen by the compiler.
Once the XML-INTO operation completes, all that remains is to process the data that was retrieved. To do that we will almost certainly need to determine how many Customer elements were found. RPG supplies a very simple solution by placing a count of the number of elements filled in positions 372 – 379 of the program status data structure (PSDS) as a 20-digit integer. This count can be used to loop through each of the filled elements and process them. You will see this in action in my sample programs. The count field is only valid when, as in our example, the target of the XML-INTO is an array. In the original release of the XML support, this was the only kind of element count supported. You’ll learn about the enhanced support in the next part of this series.
A More Complex Example
By now you are certain to have realized that very few XML documents will be this simple. For example, what if the Customer ID was included in the document as an attribute, and perhaps the City and State were part of a compound element named Address? XML for such a document would look something like this:
<Customers> <Customer Id="B012345"> <Name>Brown and Sons</Name> <Address> <City>San Jose</City> <State>CA</State> </Address> </Customer>
Since a compound element maps to a DS in RPG terms we now have the need to nest one DS (Address) inside another (Customer). Luckily RPG gave us this facility back in V5R2. But how to handle the attribute ID? Turns out this is very simple. Basically, an attribute of an element is treated as being at the same hierarchical level as a child of that element. So the way we code the RPG structure is exactly the same as if the XML had been:
<Customers> <Customer> <Id>B012345</Id>
The changes needed to process this revised format are shown here:
d customer ds Dim(99) Qualified d id 7a d name 40a d address LikeDS(address_T) // For a V5R4 system remove the keyword "Template" d address_T ds Template d state 2a d city 40a
As you can see, by using the LikeDS keyword I was able to nest the “address” DS within the original “customer” DS. I have also added the “id” field to the structure.
For those of you unfamiliar with nested data structures such as these, I will just point out that in order to reference the ID field for a particular customer element you would refer to customer(index).id. To reference the state field in the address you would code customer(index).address.state. For more information on nested DS and DS arrays see this article.
Notice that in my example I used the keyword “Template” to describe the “address_T” structure. Without this the address_T DS would simply waste memory. Template was introduced in V6. If you need to compile this example for V5R4 simply remove the keyword.
If you would like to play with these examples you can download the code here.
In my next tip, I will be taking a look at how to deal with repeating elements that occur within the body of the XML document and the basics of how to deal with optional elements.
Jon Paris is one of the world’s most knowledgeable experts on programming on the System i platform. Paris cut his teeth on the System/38 way back when, and in 1987 he joined IBM’s Toronto software lab to work on the COBOL compilers for the System/38 and System/36. He also worked on the creation of the COBOL/400 compilers for the original AS/400s back in 1988, and was one of the key developers behind RPG IV and the CODE/400 development tool. In 1998, he left IBM to start his own education and training firm, a job he does to this day with his wife, Susan Gantner–also an expert in System i programming. Paris and Gantner, along with Paul Tuohy and Skip Marchesani, are co-founders of System i Developer, which hosts the new RPG & DB2 Summit conference. Send your questions or comments for Jon to Ted Holt via the IT Jungle Contact page.