An Introduction to Processing XML With RPG, Part 2
August 18, 2015 Jon Paris
Note: The code accompanying this article is available for download here.
In the first part of this series I introduced you to the basics of using RPG’s XML-INTO op-code. In that tip I showed how the provision of a count provided by RPG in the PSDS can be used to determine how many of a repeating element were processed.
However, as I noted at the time, this can only be used when handling a repeating outer element. But what if there is a repeating element within each of those outer elements? In this second part of the series we will be studying how to handle those situations and in particular how the V6 enhancements filled a gap in the initial support.
Take a look at the sample XML below:
<Customer Id="S987654"> <Name>Smith and Jones Inc. </Name> <Address> <City>Jonesboro</City> <State>GA</State> </Address> </Customer> <Customer Id="B012345"> <Name>Brown and Sons</Name> <Address Type="Shipping"> <City>San Jose</City> <State>CA</State> </Address> <Address Type="Mailing"> <City>San Francisco</City> <State>CA</State> </Address> </Customer> ...
As you can see it is very similar to the examples we used in part 1. I have just taken it to the next level. First I have allowed for a customer to have more than one address. In addition each individual </Address> element can now have a Type attribute associated with it to indicate whether it represents a mailing or shipping address. I demonstrated how to handle attributes in part 1, but this time I am also allowing for the Type to be omitted, in which case the program logic will default it to “Mailing”. You can see the changes I have made to the data structures below. Note in particular that I have allowed for a maximum of 10 addresses per customer.
d customer ds Dim(99) Qualified d id 7a d name 40a d address LikeDS(address_T) d Dim(10) d address_T ds Template d type 8a d state 2a d city 40a
If you were to leave the rest of the program unchanged it would compile just fine, but if you tried to run it you would get the error: “The XML document does not match the RPG variable; reason code 2.” And if you were to press F1 on this message you would see that the problem arose when processing the address portion of the first customer.
So why did this happen? If you remember, in part 1 I said that the names and hierarchy of the fields within the variable that you use to receive the data must match those in the XML document. What I didn’t point out then was that the rules for what is considered a “match” are very strict and by default:
That is to say that there must be an exact one for one correspondence between the XML document and each and every in field in the DS. To put it another way, the XML cannot contain an element for which there is no “home” in the DS, and there must be an XML element to supply data for each and every occurrence of every field in the DS.
Remember the Type attribute? If you study the XML you will see that I omitted it for the first customer. This is the data that RPG considers to be missing. So one possible solution would be to modify the XML to include a type for the second customer. That would fix things, right? Sadly, as those of you who rushed ahead and tried have discovered, the answer is no. RPG still thinks that there is data missing. What data? Remember I said that by default the XML has to supply “data for each and every occurrence”? We told the compiler that there would be 10 occurrences of the address element, but in our test data we don’t have that many. As a result RPG considers the rest to be “missing”.
The compiler folks anticipated this issue and provide a processing option that allows us to deal with this. The keyword for this is allowmissing and if we give it a value of yes then RPG will not cause an error to be thrown under these circumstances. This is what our XML-INTO statement looks like in sample program XMLINTOB3:
XML-INTO customer %XML( xmlSource: 'case=any allowmissing=yes');
If we are forced to use this option, and at V5R4 it is the only solution we have, then we must test for a blank field in the address array to determine when we have reached the end. Similarly we can also test for a blank type field and set the default value when one is encountered. You can see the resulting logic from XMLINTOB3 here:
for i = 1 to count; Dsply ( 'Id ' + customer(i).Id + ': ' + %TrimR( customer(i).name )); a = 1; // Loop through addresses DoU ( customer(i).address(a).city = *Blanks ); // Default blank address type to Mailing If ( customer(i).address(a).type ) = *Blanks; customer(i).address(a).type = 'Mailing'; EndIf; Dsply ( customer(i).address(a).type + ': ' + %TrimR(customer(i).address(a).city ) ); a += 1; EndDo; EndFor;
So, what is wrong with this solution? Basically there is no granularity. I would like to be able to say that it is OK if the type attribute and some of the address elements are missing. But if a customer name were missing then that is indeed an error. But the minute I say allowmissing=yes then anything goes. The XML document could effectively be completely empty and RPG would say “looks good to me!”
Clearly that is not satisfactory and there has to be a better way. And indeed in the V6 and later releases IBM provided an alternative that gives us the control we need. The option that controls this is countprefix and it allows us to specify the name of a field into which the compiler can place a count of the number of repeating items loaded. In other words we can specify a count for arrays at any level, in addition to the one that RPG supplies in the PSDS.
This is the modified XML-INTO used in program XMLINTOB4:
XML-INTO customer %XML( xmlSource: 'case=any countprefix=count_' );
Now all we have to do is to modify our data structures to include count fields wherever we need them. Here’s the result:
d customer ds Dim(99) Qualified d id 7a d name 40a d address LikeDS(address_T) Dim(10) d count_address... d 5i 0 d address_T ds Template d type 8a d count_type 5i 0 d state 2a d city 40a
Since we want to count the number of address elements loaded, we create a count field by prefixing the element name with our chosen prefix. So the name of the field in this case is count_address and it must be specified at the same hierarchal level as the item it is counting. If you are paying attention, you may have also noticed that I added a field count_type to the address_T DS template. The countprefix support not only allows us to count the number of elements loaded into an array, but it can also be used to check the presence or absence of any optional element. So we can use it to determine if the type attribute was present or not rather than have to test for blanks.
Courtesy of these two new count fields we can now simplify the processing logic to take advantage of them like so:
For i = 1 to count; Dsply ( 'Id ' + customer(i).Id + ': ' + %TrimR( customer(i).name )); // Loop through addresses For a = 1 to customer(i).count_address; // Default address type to Mailing if not supplied If ( customer(i).address(a).count_type ) = 0; customer(i).address(a).type = 'Mailing'; EndIf; Dsply ( customer(i).address(a).type + ': ' + %TrimR(customer(i).address(a).city ) ); EndFor; EndFor;
This is by no means a comprehensive introduction to the processing of XML with RPG’s built-in support. There are many more features that you may need from time to time and in future tips we will be exploring those. In the meantime, if there are any specific issues that you have with XML-INTO please let me know and we’ll prioritize those in future tips.
Jon Paris is one of the world’s most knowledgeable experts on programming on the System i platform. Paris cut his teeth on the System/38 way back when, and in 1987 he joined IBM’s Toronto software lab to work on the COBOL compilers for the System/38 and System/36. He also worked on the creation of the COBOL/400 compilers for the original AS/400s back in 1988, and was one of the key developers behind RPG IV and the CODE/400 development tool. In 1998, he left IBM to start his own education and training firm, a job he does to this day with his wife, Susan Gantner–also an expert in System i programming. Paris and Gantner, along with Paul Tuohy and Skip Marchesani, are co-founders of System i Developer, which hosts the new RPG & DB2 Summit conference. Send your questions or comments for Jon to Ted Holt via the IT Jungle Contact page.