The Path To XML-INTO Happiness, Part 3

September 1, 2015 Jon Paris

Note: The code accompanying this article is available for download here.

In part 1 and part 2 of my XML series, I introduced you to the basics of using RPG’s XML support. In this tip we begin to explore some of the challenges that you may face when processing commercial XML documents, and the support RPG offers to handle them.

Take a look at the snippet of an XML document below:

<ItemsXRef >
    <Header RefId="xxxxx" TimeStamp="2011-11-30T00:06:06.643Z">
        <to id="nnnnnn" name="nnnnnn"/>
        <from id="nnnn" name="A Company in Canada"/>
        <TransactionType>ItemXref</TransactionType>
    </Header>
    <Items>
        <SKU>
            <SKUID>10050322</SKUID>
            <UPC>6866261486</UPC>
            <WIN>30269675</WIN>
            <StatusCode>A</StatusCode>
        </SKU>
        <SKU>
               ... <snip> ...
        </SKU>
        <EnterpriseCode>CAN</EnterpriseCode>
        <RecordCount>46</RecordCount>
    </Items>
</ItemsXRef>

As you can see it is not a particularly complex document but it demonstrates a common issue, namely that part of the data is not really terribly useful (the <Header> section in this case ) and would require the coding of data definitions that in the end we would just subsequently ignore. In this particular case it would not have been a particularly arduous task, but when industry standard documents are used the required definitions can be far more complex. And at the end of the day the data will still be “thrown away.”

The magic to processing documents such as this is to make use of the path processing option of the %XML built in. This effectively allows us to specify the position in the document at which the XML parser should begin its work, thereby skipping over unwanted elements.

If you study the hierarchy of the sample XML above you will see that at the top level we have the <ItemsXRef> element. This in turn has two child elements, <Header> and <Items>. It is <Items> that contains the actual data that we want to process. Namely the SKU, EnterpriseCode, and RecordCount. <Header> also has subordinate elements but they are of no interest to us.

To ignore the header information all we need to do is to direct the parser to begin its work with the <Items> element. We can do this by specifying a path that details the nodes to be traversed in order to arrive at the first element that we actually want. These paths are specified just as a path in the IFS (or for that matter on a Windows or Mac system) would be. The only difference being that we are referencing the element name hierarchy rather than a directory name hierarchy.

So to tell the parser to start processing at the <Items> element we simply use the directive path=ItemsXRef/Items. Note that the actual element name beginning (<) and ending (>) markers are not included in the path specification.

This particular document was submitted by a reader, and in their case they were only interested in the SKU element details and did not need to capture the EnterpriseCode and RecordCount data. So we simply had to have the path “dive down” one more level to position to the first of the SKU elements. This was achieved by simply adding /SKU to the end of the path, so the full path directive became path=itemsXref/Items/SKU. You can see it in action at (C) below.

Since we were only interested in the SKU data, the required data definitions are also really easy. They consist of a simple DS array as you can see at (B) below. Because we are targeting an array, we are able to take advantage of the RPG supplied element count in the PSDS (A).

The resulting program (XMLPATH1) is, as you can see, very simple indeed.

     D progStatus     SDS
(A)  D   xmlElements                 20i 0 Overlay(progStatus: 372)

(B)  D sku             ds                  Dim(9999) Qualified
     D  skuid                        15a
     D  upc                          15a
     D  win                          15a
     D  statuscode                    1a

     D pause           s              1a

      /free
       xml-into sku
(C)          %xml('/partner400/XMLStuff/ITEMXREF.xml':
                  'case=any doc=file path=itemsXref/Items/SKU');

       Dsply (%Char(xmlElements) + ' SKU records loaded.' ) ' '
               pause;

You may be wondering what changes would be required had the reader required to also extract the EnterpriseCode and RecordCount data. It is really very simple, but does require that we apply the count prefix option that I described in XML-INTO part 2, and, of course, to modify the IFS path.

Here is the modified version of the program (XMLPATH2) that demonstrates the data declarations and additional logic that was needed. I have highlighted the most significant changes.

     D items           ds                  Qualified
     D   sku                               LikeDS(sku_T) Dim(9999)
     D   count_sku                    5i 0
     D   enterpriseCode... 
     D                                3a
     D   recordCount                  5i 0

     D sku_T           ds                  Template
     D  skuid                        15a
     D  upc                          15a
     D  win                          15a
     D  statuscode                    1a


      /free
       xml-into items
             %xml('/partner400/XMLStuff/ITEMXREF.xml':
                  'case=any doc=file countprefix=count_ +
                  path=itemsXref/Items');

       If items.count_sku = items.recordCount;
         Dsply ('Counts match - ' + %Char(items.recordCount) 
               + ' processed' );
       Else;
         Dsply ('Count Mismatch - Actual: ' + %Char(items.count_sku) 
               + ' Expected: ' + %Char(items.recordCount));
       EndIf;

As with my previous tips on XML-INTO, I have supplied fully free form versions of the sample programs for those of you on V7.1 and later releases.

If you have a particular problem with the use of XML-INTO then please send it to me and I will try to address it in a future tip.

Jon Paris is one of the world’s most knowledgeable experts on programming on the System i platform. Paris cut his teeth on the System/38 way back when, and in 1987 he joined IBM’s Toronto software lab to work on the COBOL compilers for the System/38 and System/36. He also worked on the creation of the COBOL/400 compilers for the original AS/400s back in 1988, and was one of the key developers behind RPG IV and the CODE/400 development tool. In 1998, he left IBM to start his own education and training firm, a job he does to this day with his wife, Susan Gantner–also an expert in System i programming. Paris and Gantner, along with Paul Tuohy and Skip Marchesani, are co-founders of System i Developer, which hosts the new RPG & DB2 Summit conference. Send your questions or comments for Jon to Ted Holt via the IT Jungle Contact page.

5 thoughts on “The Path To XML-INTO Happiness, Part 3”

ARUN says:

April 6, 2017 at 2:24 am

Hi Sir,

Thanks for your great explanation. But I want to perform xml-into operation on 50 mega bite file. In RPG, maximum size of an data structure is 16 mega bite. Could you please suggest me on this to process 50 mb xml file.

Regards,
Arun

- lean says:
  
  August 22, 2018 at 10:50 pm
  
  Same problem, was this query answered? thankyou
  
  - lean says:
    
    August 22, 2018 at 10:51 pm
    
    same problem, was this query answered? thank you
    
- lean says:
  
  August 22, 2018 at 11:04 pm
  
  Hi, I have this same problem..
  when my array data structure is too big and exceeds 16mb.. I would have error upon compiling the program..
  
  can please have your suggestions on this?
  
  - Jon Paris says:
    
    August 27, 2018 at 1:54 pm
    
    There’s an answer there now – sorry for the delay.