Guru: Teraspace To The Rescue
August 14, 2017 Jon Paris
My team has been trying to resolve a problem for the last few weeks and they are running into several obstacles. The main one has to do with RPG restrictions for data structure length.
We use a third-party program to process transactions. In the latest version, the vendor has increased the length of the fields in the data structure that we have to pass them. After our programs were changed to accommodate this increase, the RPG program would not compile because the DS exceeds 16,733,104 bytes. The vendor has told us to use teraspace but by itself that does not do anything about the RPG data structure limitation. Can you suggest a solution to this problem?
You are correct in thinking that, while the RPG compiler allows you to use teraspace, it still limits the maximum size of the items you can specify to 16Mb. Luckily for you, the DS (data structure) you need to create is effectively a DS array and as long as we limit ourselves to defining just a single element and work with that we can take full advantage of teraspace storage and its huge capacity. In many ways the resulting code is similar in effect to using the old Multiple Occurrence Data Structures (MODS) that we had to use before we had DS arrays. The major difference is that we have to use pointer math to move from occurrence to occurrence rather than simply use the OCCUR op-code. Apart from that the process is quite straightforward.
While the technique described here addresses the writer’s specific problem, it can also be used to create dynamically sized arrays with no pre-set number of elements, whether teraspace capacity is needed or not. Sorting and searching such arrays is beyond the scope of this tip but I’ll address that aspect in a future tip.
I have written a small program that demonstrates the process. Note that I am only using a small DS (40 characters per occurrence) but the principle is the same regardless of the size. Limiting the size makes it possible to study the mechanics in the debugger. More on this in a moment.
Let’s take a look at the code.
The first critical step is to tell the compiler that we want all dynamic storage to be allocated in teraspace. This is done at (A) with the control option ALLOC(*TERASPACE). The DS we will be using is defined at (B). Notice that it is defined as being based on the pointer p_currentPosn. This is the pointer we will be manipulating to move through the different occurrences of the DS.
The next definitions (C) specify the pointer p_bigDS, which will hold the address to the beginning of the allocated memory, and the field bigView, which I use while testing to allow me to study the workings of the process. That is, it allows me to view multiple instances of my 40-character DS. I used a size of 5,000 bytes, which should be more than big enough. If you EVAL this field in the debugger you will be able to see the first 1K. To see more simply type EVAL bigView:c nnnn, where nnnn is the number of bytes you want to view.
These are followed (D) by the definitions for the fields max and increment. Max is used to hold the current maximum capacity for the storage area. Increment provides the value to be used when the current memory is used up and we need to increase the allocation. In other words, we are starting off with capacity for 20 elements and should we need more, we will increase that maximum to 30, then 40, and so on. Of course if you know from the outset the specific number of elements that you require, you can just work with that number and forget about the increment.
(A) ctl-opt ALLOC(*TERASPACE); (B) dcl-ds bigDS Based( p_currentPosn ); field1 char(20); field2 char(20); end-ds; (C) dcl-s p_bigDS pointer; dcl-s bigView char(5000) based( p_bigDS ); (D) dcl-s max int(10) Inz(20); // Maximum number of elements dcl-c increment 10; // Increment size when "array" full
Now that we have the data definitions in place it is time to take a look at the logic needed to implement this dynamic array.
(E) We begin by allocating enough memory to hold the current maximum number of elements desired. We simply multiply the size of a single element by the current maximum and request %ALLOC to give us a pointer to a piece of storage large enough. We then set the current array position to that address (F). For the purposes of testing the process we then enter a loop that will create 100 DS entries (G). At (H) we load values into the DS’s fields.
Next comes the critical part: advancing the DS pointer to the next instance. The first item of business is to test if we have room left in the current allocation for one, or more, additional entries. If we do, then we simply advance the pointer by the size of an element and we are done.
If we need more room (J) then we first increase the current maximum by the increment value, then use %REALLOC to request that the memory allocation associated with p_bigDS be increased to the new capacity. The critical part of this process is to recalculate the current position pointer (p_currentPosn). Why do we need to do this? Simply because there is no guarantee that the pointer returned by %REALLOC will be to the same memory as our original allocation. It may be that the system cannot give us the amount of memory that we are requesting at that location. When this situation occurs there is no fear of data loss as the system copies all of our existing data to the new location before it returns the pointer to that new location.
That is all there is to it really. The array has been loaded and we can perform any processing we need to do with it before finally (L) releasing the memory back to the system once we are finished with it. Don’t get carried away and release the storage before you have finished processing the data! For reasons I have never understood, DEALLOC will not null your pointer. So it is a good idea to do it yourself since the pointer will still be valid, but other processes now own the memory and it can (and will according to Murphy) change behind your back! Note that even if DEALLOC had nulled out the pointer, it would still be your responsibility to make sure you did not use any related pointers – such as p_currentPosn in my example. By nulling them manually we ensure that should we accidentally try to use that memory we would get a run time error.
In case you are wondering, the memory would have been released anyway when the job (or activation group) ended, but it is good practice to always manually release it. Otherwise, one of these days you’ll use a technique like this in a never-ending program and will eventually manage to run out of memory! Yes, it is possible.
For the original questioner, “processing” of the data involved passing the loaded DS to the third party’s software. In order for that to work, it is necessary to reset the base address of the DS we are going to pass as the parameter back to the root of our allocated storage. That is the purpose of the assignment at (K). Once that is done we can then call the program passing bigDS as the parameter. This process works because when passing parameters to programs all we are ever doing is passing the address (i.e. a pointer) to the parameter’s storage. So by resetting the basing pointer for bigDS, we ensure that the called routine sees the beginning of the DS.
Some of you may be concerned by the fact that our definition of the DS is much shorter that the “real thing”. But this doesn’t matter because only the pointer is passed – the length actually used is up to the called program’s interpretation. If this is news to you, find out more in this tip.
(E) p_bigDS = %Alloc( %Size( bigDS ) * max ); (F) p_currentPosn = p_bigDS; // Set current position to beginning // Fake out 100 data elements (G) for count = 1 to 100; // Store test values in the DS (H) field1 = 'Pos ' + %Char( count ); field2 = 'Max ' + %Char( max ); (I) if count < max; // Still space remaining in current allocation ? p_currentPosn += %Size( bigDS ); // If so advance to next element else; (J) max += increment; p_bigDS = %ReAlloc( p_bigDS: %Size( bigDS ) * Max ); p_currentPosn = p_bigDS + (%Size( bigDS ) * ( count )); endif; endfor; (K) p_currentPosn = p_bigDS; // Set address of DS back to byte 1 of stor-age callProg ( bigDS ); // And pass as parm to vendor program (L) DeAlloc p_bigDS; // Free allocated storage p_bigDS = *Null; // and null out pointers p_currentPosn = *Null;
Before I close, one more point to make. In the case of the questioner, the DS actually contains a fixed-length component ahead of the repeating area. Not surprisingly, this area contains, among other data, a count of the number of active DS elements. In order to accommodate this I modified the code slightly as shown below.
First I modified the base DS (bigDS) to contain only the header information (M) and added a new DS (currentPosn) to contain the original DS array content. Next I had to adapt the pointer calculations to accommodate the size of the header at the beginning of the space (N) and (O). Apart from that, the code is basically the same.
(M) dcl-ds bigDS Based( p_bigDS ); header1 char(12); count int(10); end-ds; dcl-ds currentPosn Based(p_currentPosn); field1 char(20); field2 char(20); end-ds; ..... (N) p_bigDS = %Alloc ( ( %Size( bigDs ) + ( %Size( currentPosn ) * Max ) ) ); ..... (O) p_bigDS = %ReAlloc( p_bigDS: ( %Size( bigDS ) + ( %Size( currentPosn ) * max ) ) ); p_currentPosn = p_bigDS + %Size( bigDS ) + ( %Size( currentPosn ) * count );
If you would like to experiment with this approach you will find annotated copies of the code used here. In a future tip, I will explore how dynamic arrays such as these can be searched and sorted. In the meantime if you have any questions or need further explanation, please email me via Ted Holt through the IT Jungle Contact page.