bsearch: Partial Key Searches and More
February 22, 2012 Jon Paris
Note: The code accompanying this article is available for download here.
In bsearch: A Better %LOOKUP, I mentioned that bsearch permits us to do a partial key lookup (i.e., a lookup that is based on just a portion of the search field). Since bsearch performs a binary search, it will not necessarily return the first matching element in the array. If we need to handle all matching elements then we have to be able to locate the first match in the array, and then process subsequent matching elements.
Before we jump into the mechanics of how to do this with bsearch, I’m going to start with an example that uses %LookUp. One of the advantages of using the %LookUp BIF over its predecessor op-code LOOKUP is that it performs a high speed binary search whenever an array specifies the ASCEND keyword. Because of this it presents the same problems of locating the first matching element as bsearch.
It is not until the V7 release that SORTA and %LookUp permit sorting and searching of data structure arrays. As a result, I have had to code the array slightly differently. If you are not familiar with this coding technique, you can find more about it here.
d customerData ds Qualified d customerInfo Dim(1000) Ascend d name 30a Overlay(customerInfo) d addr 40a Overlay(customerInfo: *Next) d city 30a Overlay(customerInfo: *Next) d state 2a Overlay(customerInfo: *Next) d zip 10a Overlay(customerInfo: *Next)
This technique allows me to sort on any of the individual fields and have its associated data come along with it. In this example I am going to sort on the city field using the following code:
SortA %SubArr(customerData.city: 1: customerCount);
Notice the use of the %SubArr BIF to constrain the sort to just the active array elements. This saves having to load the unused elements with high vaIues and wasting time sorting those unused elements. Having sorted the array, I then use %LookUp to locate the element that I need.
element = %LookUp(searchString: customerData.city: 1: customerCount);
Once we have found a match, we must then “walk” backward through the array to identify the first element that matches the search key. The logic is quite simple, the critical thing is to ensure that we don’t “fall off” the top of the array when testing for a match. This is why we test that the current element number is greater than 1 before we attempt to test for a matching key.
If element <> 0; // We have a match DoW ( element > 1) And ( customerData.city(element - 1 ) = searchString ); element -= 1; // Move back to previous element EndDo;
We will exit this DO loop once we have located the first matching element, or reached the beginning of the array. Either way we are now set to loop through the elements, starting from this position, until we have processed all of those that match. Like this:
DoW ( customerData.city(element) = searchString ); Dsply ( 'Name is: ' + customerData.name(element)); Dsply ( 'City: ' + customerData.city(element)); element += 1; EndDo;
That’s all there is to the process. But I promised you I would show you how bsearch can use only a part of the search key. Since the success of the search relies on the array being in sequence, we can’t just use any arbitrary part of the key, but we can certainly perform our comparison tests using just the leading characters.
Searching for Partial Keys
If we are dealing with a partial key, then there is obviously a high probability that we will get duplicates, so we will need to be able to locate the first matching entry as we did in the SortA/ %LookUp example above. The problem is that bsearch returns a pointer to the first byte of the matched element, not an array index. Luckily, converting a pointer value to its related index is a relatively simple task. All we need to do is:
Let’s look at a simple example to see how this works.
The starting address of element 1 is byte 1, element 2 starts at byte 5, 3 at byte 9 and so on. If we assume that bsearch returns the address 13, then we can calculate its element number as follows:
13 - 1 = 12 12 / 4 = 3
Unlike most languages, RPG arrays use a base number of 1 and not zero, so we have to add 1 to the result of the calculation to get the correct result. If we were programming in another language, adding 1 would not be necessary, as most start their array indices at zero. The RPG code to perform the calculation looks like the example below. Notice that I made use of the %Div BIF in the calculation to make it obvious that I only want the integer result of the division:
distance = pointerFrombsearch - %Addr(startOfData); element = %Div( distance : %Size(singleElement) ) + 1;
Once we have calculated the element number, we can use the same basic logic we used previously to step back to the first matching element. We’ll discuss the minor differences in a moment.
So how do we go about searching for a partial key? The first thing to remember is that with bsearch, you write the logic to determine if you have a match. So if you choose to match against only the first n characters of the search string, that is up to you. Here’s a bsearch routine that does just this.
p SearchForName b d pi 10i 0 d searchName Like(compareString) d candidate LikeDS(customerData) /Free If searchName > %Subst(candidate.name: 1: %Len(searchName)); Return HIGH; ElseIf searchName < %Subst(candidate.name: 1: %Len(searchName)); Return LOW; Else; Return EQUAL; EndIf;
There are a couple of points worth noting here. First, the comparison variable searchName is defined to be like compareString, which is a variable-length field. By using a variable-length data, we can easily capture the length of the string we are going to compare to. You can see this in play in the IF statements above, where the length of the comparison string (%Len(searchName) controls the length of the substring being compared (%Subst).
The mainline logic that loads the value into compareString, passes it to bsearch, and then calculates the element number is shown below. Notice that the only real difference in the logic that determines the first matching element is that we restrict the comparisons to the appropriate substring of the key, just as we did in the bsearch routine.
compareString = %TrimR(searchString); p_element = bsearch(%Addr(compareString) : %Addr(customerData) : customerCount : %Size(customerData) : %PAddr(SearchForName) ); If p_element >< *null; // Did we find a hit? // If so then how many bytes from start of array to hit location distance = p_element - %Addr(customerData); // Then determine the array entry number that represents element = %Div( distance : %Size(customerData) ) + 1; DoW ( element > 1) And (%Subst(customerData(element - 1).name: 1: %Len(compareString)) = compareString ); element -= 1; // Move back to previous element EndDo; DoW (%Subst(customerData(element).name: 1: %Len(compareString)) = compareString ); Dsply ( 'Name is: ' + customerData(element).name); element += 1; EndDo;
That’s really all there is to it. There’s a lot more you can do with arrays once you master the basics of using qsort and bsearch. With the V6 increases RPG’s maximum array size, techniques like these become even more valuable.
One final point for those of you are stuck at V5R4 or earlier: You can use qsort and bsearch to overcome RPG’s size limitations by treating any piece of memory as an array, you just can’t use conventional array indices to access the elements. Contact me via Guru Editor Ted at IT Jungle if you would like information on how to do this.
Jon Paris is one of the world’s most knowledgeable experts on programming on the System i platform. Paris cut his teeth on the System/38 way back when, and in 1987 he joined IBM’s Toronto software lab to work on the COBOL compilers for the System/38 and System/36. He also worked on the creation of the COBOL/400 compilers for the original AS/400s back in 1988, and was one of the key developers behind RPG IV and the CODE/400 development tool. In 1998, he left IBM to start his own education and training firm, a job he does to this day with his wife, Susan Gantner–also an expert in System i programming. Paris and Gantner, along with Paul Tuohy and Skip Marchesani, are co-founders of System i Developer, which hosts the new RPG & DB2 Summit conference. Send your questions or comments for Jon to Ted Holt via the IT Jungle Contact page.