• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Guru: Dealing With Non-Normalized Data

    April 9, 2018 Jon Paris

    From time to time, many of us have to find solutions for handling our old non-normalized tables in an efficient manner. It would be nice to have the luxury of redesigning and normalizing these databases, but real life is not like that. This is particularly true when the tables in question are part of an application package where you have no control over the file layout.

    I should point out that by “efficient” in this context I don’t necessarily mean processing speed, but rather in terms of the number of lines of code needed to perform the necessary manipulations and/or how obvious (and therefore maintainable) the resulting code is.

    The classic example, and one that I often reference when teaching this topic, is where some form of monthly sales data is stored as a series of 12 consecutive fields in the record. For example, the file might contain a product code, followed by January sales, February sales, and so on all the way to December.

    This story contains code, which you can download here.

    If the requirement is to produce the total sales for the year from such a record, how would I go about that in an RPG program? Of course I could do it field by field, i.e. totalSales = JANSALES + FEBSALES + … DECSALES;

    But if I could redefine the record so that I can treat the monthly sales values as an array (which after all is what they really are) then I can simply say totalSales = %XFoot(salesByMonth)

    Now wouldn’t that be a whole lot nicer? Plus of course I could also reference individual months by using the month number as an index. Much more flexible.

    So how do we go about doing that? It is very simple really, as you can see in this code extract.

    dcl-f  salesHist  keyed;                                                                                                                    
     // List all sales value fields from the history file in sequence            
     //   No size or type is needed - this will be derived from the file   
     dcl-ds productItems;                                                  
       JANSALES;                                                           
       FEBSALES;                                                           
       MARSALES;                                                           
       ...
       NOVSALES;                                                           
       DECSALES;                                                           
       
       // Now simply define an array over the fields                                                                    
       salesByMonth  Like(JANSALES)  Dim(12)  Pos(1);                      
     end-ds;                                                               
                                                                           
     dcl-s  totalSales  Like(JANSALES);                                    
     // Now we can sum all the months sales values with ...                                                                      
     totalSales = %XFoot(salesByMonth);

    While this provides a neat solution to this particular problem, often such legacy files are more complex than this. I was reminded of this during a recent series of emails with an RPGer who is faced with such requirements every day. In his case he was faced not with a simple one field “array,” but with a repeated series of fields. There was also a “twist” in his requirements that I will get to in a moment.

    In the past I have most commonly encountered this type of record layout in apparel applications where a single record may hold (for example) pricing information for a number of sizes. So, a simplified version of such a file is what I will be using for this example.

    The basic layout of my file is that a product code is followed by fields for size, cost, and selling price. These three fields are repeated, in the case of my test file, a total of four times. In reality I have seen them repeat eight to 10 times or more. These days we would not, hopefully, ever design a table like this but . . . .

    The simplest approach might be to use the same technique I demonstrated above and create three separate data structures, one for each of the field types. But the result is somewhat clumsy and the relationship between the group of three fields is not immediately obvious. Instead I’m going to demonstrate how group fields can be used to achieve the same thing a little more cleanly.

    The basic principal is the same, I simply list the columns in the order I want them like this:

    dcl-ds productItems;                                       
          SIZE1;                                                   
          COSTPR1;                                                 
          SELLPR1;                                                 
          SIZE2;                                                   
          COSTPR2;
    	SELLPR2;                                                 
          ... // fields are repeated until                                                
          SELLPR4;                                                 
                                                               
          // Now define a group field to contain the array         
          //   Note the field sequence _must_ match                
    (A)   prodDetail  Dim(4)  Pos(1);                              
            prodSize  Like(SIZE1)    Overlay(ProdDetail);          
            prodCost  Like(COSTPR1)  Overlay(ProdDetail: *Next);   
            sellPr    Like(SELLPR1)  Overlay(ProdDetail: *Next);   
        end-ds;

    The difference begins at (A) where I define the group field prodDetail. What makes it a “group field”? Simply the fact that it does not have any length or type definition in and of itself, but rather is defined by the combined length of the fields that overlay it. In this case it “groups” the fields prodSize, prodCost and sellPr. Even though it does not have a length specified, it can be declared as an array and — this is the important bit — that in turn allows each of the three individual fields to be treated as an array. That means I can reference any of the groups (e.g. prodDetail(1) ) or any of the individual fields (e.g. prodCost(1) ). Because prodDetail is defined as starting in Pos(1), i.e. the beginning of the data structure, prodSize(1) maps SIZE1, and sellPr(4) maps SELLPR4.

    I mentioned earlier that my correspondent had a “twist” in his requirements. Simply put, he needed to copy the fields in the record to corresponding, but differently sized, typed, and named, fields in display, print, and disk files. So, for example, a field that was packed 5,2 was perhaps packed 7,2 in another table and zoned 9,2 in display and print files. Even if he placed the fields in a DS, the size and type differences precluded the simple copying one DS to another. Plus of course the name differences precluded the use of EVAL-CORR to copy the data.

    My approach was to create group field arrays as shown above for both the source and target records and then to simply assign each of the input arrays to their corresponding target array. The target array definitions (B) look like this:

    dcl-ds printItems  Inz;                                  
        ITEMSIZE1;                                            
        ITEMCOST1;                                            
        ITEMSELL1;                                            
        ITEMSIZE2;                                            
        ... // fields are repeated until
        ITEMSELL4;                                            
    (B) itemDetail  Dim(4)  Pos(1);                           
          size  Like(ITEMSIZE1)  Overlay(ItemDetail);         
          cost  Like(ITEMCOST1)  Overlay(ItemDetail: *Next);  
          sell  Like(ITEMSELL1)  Overlay(ItemDetail: *Next);  
     end-ds;

    Now that the arrays are defined, copying data from the input record to the target is simply a matter of coding:

       size = prodSize;
       cost = prodCost;
       sell = sellPr;

    RPG takes care of any size and type differences and effectively copies the fields correctly one by one to the target — with just three lines of code!

    Interestingly this approach also takes care of another issue my correspondent had, which was that sometimes the source record was from a file with only (say) three repeats but needed to be presented in the same reports/displays as files with five or six repeats. Again RPG takes care of this because the array with the smaller number of entries governs the number of fields copied.

    Advantages?

    I have already mentioned a number of advantages of this approach but just to summarize:

    • Because the record field names are used in the DS the arrays are populated the minute a record is read. No need to code individual field moves.
    • Copying of a series of fields requires just one line of code.
    • Any differences in data type and size are taken care of by RPG.
    • The arrays can be sorted on any of their component fields. For example, SORTA prodCost will put the groups of three fields into order based on the product cost.
    • A group of associated fields can be referenced via the group field array For example, itemDetail(4) references fields ITEMSIZE4, ITEMCOST4 and ITEMSELL4.

    If you are interested in “playing” with this technique you can download the sample programs and data here. Got a similarly annoying problem you’re looking for a “fix” for? Or if you simply have questions or comments I’d love to hear from you.

    Jon Paris is one of the world’s most knowledgeable experts on programming on the System i platform. Paris cut his teeth on the System/38 way back when, and in 1987 he joined IBM’s Toronto software lab to work on the COBOL compilers for the System/38 and System/36. He also worked on the creation of the COBOL/400 compilers for the original AS/400s back in 1988, and was one of the key developers behind RPG IV and the CODE/400 development tool. In 1998, he left IBM to start his own education and training firm, a job he does to this day with his wife, Susan Gantner (also an expert in System i programming). Paris and Gantner, along with Paul Tuohy and Skip Marchesani, are co-founders of System i Developer, which hosts the new RPG & DB2 Summit conference.

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags: Tags: 400guru, FHG, Four Hundred Guru, IBM i, RPG

    Sponsored by
    UCG Technologies – Vault400

    Do the Math When Looking at IBM i Hosting for Cost Savings

    COVID-19 has accelerated certain business trends that were already gaining strength prior to the start of the pandemic. E-commerce, telehealth, and video conferencing are some of the most obvious examples. One example that may not be as obvious to the general public but has a profound impact on business is the shift in strategy of IBM i infrastructure from traditional, on-premises environments to some form of remote configuration. These remote configurations and all of their variations are broadly referred to in the community as IBM i hosting.

    “Hosting” in this context can mean different things to different people, and in general, hosting refers to one of two scenarios. In the first scenario, hosting can refer to a client owned machine that is housed in a co-location facility (commonly called a co-lo for short) where the data center provides traditional system administrator services, relieving the client of administrative and operational responsibilities. In the second scenario, hosting can refer to an MSP owned machine in which partition resources are provided to the client in an on-demand capacity. This scenario allows the client to completely outsource all aspects of Power Systems hardware and the IBM i operating system and database.

    The scenario that is best for each business depends on a number of factors and is largely up for debate. In most cases, pursuing hosting purely as a cost saving strategy is a dead end. Furthermore, when you consider all of the costs associated with maintaining and IBM i environment, it is typically not a cost-effective option for the small to midsize market. The most cost-effective approach for these organizations is often a combination of a client owned and maintained system (either on-prem or in a co-lo) with cloud backup and disaster-recovery-as-a-service. Only in some cases of larger enterprise companies can a hosting strategy start to become a potentially cost-effective option.

    However, cost savings is just one part of the story. As IBM i expertise becomes scarce and IT resources run tight, the only option for some firms may be to pursue hosting in some capacity. Whatever the driving force for pursing hosting may be, the key point is that it is not just simply an option for running your workload in a different location. There are many details to consider and it is to the best interest of the client to work with an experienced MSP in weighing the benefits and drawbacks of each option. As COVID-19 rolls on, time will tell if IBM i hosting strategies will follow the other strong business trends of the pandemic.

    When we say do the math in the title above, it literally means that you need to do the math for your particular scenario. It is not about us doing the math for you, making a case for either staying on premises or for moving to the cloud. There is not one answer, but just different levels of cost to be reckoned which yield different answers. Most IBM i shops have fairly static workloads, at least measured against the larger mix of stuff on the public clouds of the world. How do you measure the value of controlling your own IT fate? That will only be fully recognized at the moment when it is sorely missed the most.

    CONTINUE READING ARTICLE

    Please visit ucgtechnologies.com/IBM-POWER9-systems for more information.

    800.211.8798 | info@ucgtechnologies.com

    Article featured in IT Jungle on April 5, 2021

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    COMMON Changes Things Up With PowerUp 2018 What A Concept: Distribution Software Aimed At Real SMBs

    Leave a Reply Cancel reply

TFH Volume: 28 Issue: 27

This Issue Sponsored By

  • Fresche Solutions
  • New Generation Software
  • Harkins & Associates
  • Computer Keyes
  • COMMON

Table of Contents

  • Counting The Cost Of IBM i On Power9 Entry Systems
  • What A Concept: Distribution Software Aimed At Real SMBs
  • Guru: Dealing With Non-Normalized Data
  • COMMON Changes Things Up With PowerUp 2018
  • Pushing The Capacity Envelopes With IBM i 7.3

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • Query Supervisor Gives Database Engineers New Power
  • IBM Unveils New and Improved IBM i Services
  • 3 Takeaways from the 2021 PowerTech Security Report
  • Four Hundred Monitor, April 14
  • IBM i PTF Guide, Volume 23, Number 15
  • Big Blue Unveils Spring 2021 IBM i Technology Refreshes
  • Thoroughly Modern: Innovative And Realistic Approaches To IBM i Modernization
  • Guru: Web Services, DATA-INTO and DATA-GEN, Part 2
  • Back To The Future With A New IBM i Logo
  • IBM i PTF Guide, Volume 23, Number 14

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2021 IT Jungle

loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.