An Introduction to Processing XML with RPG, Part 4
November 17, 2015 Jon Paris
Note: The code accompanying this article is available for download here.
In the first three parts of this series, I focused on the basics of using RPG’s XML-INTO. In this episode I want to wrap things up by covering two of the more recent additions to this support.
Let’s start with namespaces. I haven’t got time to go into all the whys and wherefores of namespaces. For now let’s just say that they allow you avoid name collisions by qualifying element and attribute names. Since you will encounter them in many documents it is important to understand the options you have in RPG for dealing with them. (You can learn more about namespaces here.)
The XML extract below shows the simple usage of a namespace.
<p400:OrderDetail p400_OrderNumber="12345" p400_Date="2015-11-14" > <p400:Name>James Smith</p400:Name>
Notice that all the element names are prefixed by the characters “p400:”. This is the shorthand notation for the namespace and it is associated with a URI. In this case that association is made through the highlighted “>As I discussed earlier in the series, in order for XML-INTO to process the XML, the element and attribute names in the document must match the names and hierarchy in the RPG data structure used to receive the data. Teeny tiny problem: The colon (“:”) is not a valid character in an RPG name.
RPG’s solution is to provide options to either remove the namespace prefix or to convert the colon to a valid RPG name character. This is done via the ns option. Specifying ns=remove as an option to the %XML BIF will simply strip the namespace qualifier completely. ns=merge on the other hand will retain the qualifier and convert the colon to an underscore.
Below are extracts from the sample programs that demonstrate the two options. The first example shows how the DS needs to be structured when the ns=remove option is specified.
d OrderDetail... d DS Qualified d OrderNumber 5a d Date 10a d Address LikeDS(Address_T) d Dim(2) ... xml-into OrderDetail %xml( xmlFileName: 'case=any doc=file + ns=remove');
This second example shows the changes required when ns=merge is used.
d p400_OrderDetail... d DS Qualified d p400_OrderNumber... d 5a d p400_Date 10a d p400_Address LikeDS(p400_Address_T) d Dim(2) ... xml-into p400_OrderDetail %xml( xmlFileName: 'case=any doc=file + ns=merge ');
So which option should you choose? In most cases the remove option will be the best choice as it simplifies the variable names, which makes them both easier to type and easier to read. Of course this only works in cases where there is no duplication of names within the document. Luckily that is normally the case. This option also simplifies the processing of standardized XML documents that only differ in the namespace used. But there is a drawback to this.
Suppose that you receive the same basic document from two or three different suppliers. The remove option allows you to use the same data structures etc. to process the document, but what if you need to vary the processing based on the source of the document? Luckily IBM thought of this and supplied us with an option to capture the value of the namespace that is being stripped off. You do this by specifying the nsprefix option. This operates in a very similar fashion to the countprefix option, which we covered in part 2 in that if we specify nsprefix=ns_ then by adding a field to the receiver DS with the appropriate name we can capture any namespace of our choosing.
In the example below I have modified the %XML BIF to include the nsprefix option and added the variable ns_OrderNumber in order to capture the prefix associated with the OrderNumber element. Notice that as with the count prefix option the variable to receive the prefix must appear at the same level in the hierarchy as the element whose prefix is to be captured.
d OrderDetail... d DS Qualified d ns_OrderNumber... d 4a d OrderNumber 5a xml-into OrderDetail %xml( xmlFileName: 'case=any doc=file ns=remove nsprefix=ns_');
Before I wrap this up . . .
There is one more processing option that we haven’t discussed that you will find you need quite often. Just as the colon is not valid in an RPG name, nor are a number of other characters that are valid in XML names. In North America, and most English speaking countries, the most common one that you will encounter is the hyphen. While names including a hyphen are valid in COBOL (for example) they are not valid in RPG. RPG deals with this by allowing us to specify an extension to the case processing option which will cause any hyphens to be converted to underscores. So with case=convert specified, an element name such as Street-Address would be converted to Street_Address as an RPG name.
Outside of the English speaking world it is also possible to have XML element names that include accented and other characters that are not valid in RPG names. For example characters such as é, ö, and á. With the convert option in use, these would all be converted to their upper case equivalents. i.e., E, O and A. But in some cases there is no simple upper case equivalent, for example the “double” characters æ and œ. In these cases the characters are replaced by underscores.
It may occur to you at this point, that by the time all these conversions are completed, the resulting name may contain multiple consecutive underscores. That would make them very hard to type accurately since they would tend to appear on the screen as a single long line. RPG deals with this by converting all consecutive underscores to a single one. Thus A___B__C would become A_B_C. In theory this could result in duplicate names but it is highly unlikely to occur.
There are a few other XML-INTO options that you may have need of from time to time, but hopefully I have covered all of the ones you are likely to encounter in your daily work. As before I have supplied both fixed and free-form examples of the sample programs from which the extracts above were taken. If you download them you will find that I also used the count prefix option in addition to the ns options described here. So that you can “play” with the code I have also included the XML used in the examples. You can download the zip file here here.
If you have any questions relating to processing XML with RPG, please let me know and we’ll address them in future tips.
Jon Paris is one of the world’s most knowledgeable experts on programming on the System i platform. Paris cut his teeth on the System/38 way back when, and in 1987 he joined IBM’s Toronto software lab to work on the COBOL compilers for the System/38 and System/36. He also worked on the creation of the COBOL/400 compilers for the original AS/400s back in 1988, and was one of the key developers behind RPG IV and the CODE/400 development tool. In 1998, he left IBM to start his own education and training firm, a job he does to this day with his wife, Susan Gantner–also an expert in System i programming. Paris and Gantner, along with Paul Tuohy and Skip Marchesani, are co-founders of System i Developer, which hosts the new RPG & DB2 Summit conference. Send your questions or comments for Jon to Ted Holt via the IT Jungle Contact page.