Enable Row Set Paging in a Client/Server Environment Using SQL

November 11, 2009 Michael Sansoterra

One of the general paradigms of client/server programming is keeping the server side processing “fat” while keeping the client side processing “thin.” That is, a client/server application should let a server do as much work as possible while making as few assumptions as possible about how much processing power, memory, network bandwidth, etc., the client will have. Of course, sometimes certain applications, such as those requiring a graphics intensive environment, must be heavy on the horse power, but in general it’s good practice to keep network, memory, and processing requirements minimal on the client side.

One of the areas where it can be difficult to keep client-side processing at a minimum is with row sets returned from a query. Sometimes users can legitimately ask for very large results. (Other times, it’s wise to set a limit on the number of rows returned!) Whether it is a Web-based, desktop-based, PDA-based, or even a DRDA green screen-based application, it can be a programming chore to minimize network utilization and memory usage, particularly on the client side. Because these applications are usually “stateless” (that is, they do not rely on a constant connection to the server to utilize its resources), it’s not advisable to leave open a server-side cursor and page through the cursor results when re-connecting to the server.

One common solution to this problem is to give the user all the rows from their query at one time. To help manage large result sets, often a grid control of some kind (it could also be a subfile) is used to allow a user to page through the results. Consider Figure 1 below, which shows a grid control with paging implemented. This grid shows five rows per grid page, indicates the total number of rows available, and gives the user the ability to jump to a specific page within the results. Some grids even allow a client-side search or filter on the results.

Figure 1–This is one of many grids that demonstrate the concept of paging. This picture was taken from a demo on obout.com’s Web site.

These grids provide a nice user interface. However, if we’re dealing with a potentially large number of rows, do we really want to load all the results at one time (and tax the client with large data transfers and a large memory footprint), or should we only request one page at a time as the user requests them (and hence tax the database server)?

This tip will operate under the assumption that the client processing requirements should be minimal and that a paging operation should be done on the server with only the desired page returned to the client when requested. This will keep client-side resource utilization at a minimum.

The Origin of this Tip

The fact is, a grid control can be used to retrieve one page at a time while still giving the users the appearance that they have all rows available. I was recently working with a Microsoft .NET/SQL Server application where the client code used a feature of the .NET Framework called Language Integrated Query (LINQ). The LINQ code was automatically generating SQL statements on behalf of the grid control to retrieve one page of data at a time. Once I understood how LINQ was generating the “page at a time queries” for SQL Server, it was easy to convert the SQL Server T-SQL statements to the syntax required by DB2 for i. This SQL code requires the i/OS level to be at V5R4 or higher because it will make use of the ROW_NUMBER OLAP Ranking function.

The Paging Technique Using SQL

While tracing the .NET application’s “retrieve a specific page” technique, I noticed everything boils down to running a few SQL statements, which are detailed in abstract form here:

Statement 1–Retrieve the row set count for the specified application query. You’ll need to know how many rows are being returned and what the page size of the grid is in order to calculate how many pages are available. If a query returns 1,000 rows and the grid shows 25 rows per page, then the grid will have 40 available pages. This statement is necessary so that the grid can show the available pages even though we’re only going to retrieve a single page as requested.
Statement 2–Retrieve the first n rows from the application query, where n is the number of rows in the grid page. This SQL statement will only be submitted when page number one is requested. Also, this statement must have an ORDER BY clause to run correctly.
Statement 3–Retrieve rows x through y in the query’s row set where x is the first row on request page n and y is the last available row on the same page. So, if there are 1,000 rows returned from a query, the grid shows 25 rows per page and the user requests page four, x would be set to 76 and y would be set to 100. The formula for calculating x is:

x=(Requested Page Number-1)*Page Size + 1.

The formula for y is either:

y=x + Page Size - 1

y=Requested Page Number * Page Size.

In summary, for the first page request, the client will run statements 1 and 2. Statement 1 will get the row count so that the client can calculate the number of pages available. Statement 2 will fetch the first page’s rows. For a page request other than page one, the client will run statements 1 and 3 to get the total number of rows followed by the rows for the requested page. Statement 1 will need to be re-run in case additional rows are inserted to the result set.

A Sample Query

Now let’s move from the abstract to the concrete. Let’s say the application query our user runs looks like this, and that a client side grid should display 25 rows. (I’m using the * for simplicity–columns should be explicitly named.)

Select * From MyTable Order By MyColumn

With this as our base query, the three abstract statements described above to control paging can be created in the following form using nested Select statements:

Statement 1–Retrieve the row count:

Select Count(*)
  From (Select MyTable.*From MyTable) Temp

The base query (without the ORDER BY) is put in a Nested Select to retrieve the total row count.

Statement 2–Retrieve the first page of data:

     Select MyTable.*
       From MyTable 
   Order By MyColumn 
Fetch First 25 Rows

The base query simply limits the query results to be restricted to the first 25 rows.

Statement 3–Retrieve the second page of data:

Select *
  From (
Select Row_Number() Over (Order By MyColumn) As Row_Number,
       MyTable.*
  From MyTable 
  Order By MyColumn
Fetch First 50 Rows Only  -- Only fetch the max number necessary
) BaseData
Where Row_Number Between 26 And 50  -- Only get the rows on the page

The Row_Number OLAP function is now returned with the base query results. This function will assign a unique incremental number to every row in the set based on the value in the function’s ORDER BY (which should match the query’s ORDER BY). The FETCH FIRST clause will limit the result set to only the maximum number of rows needed to fulfill the request. Since this example is requesting page two (rows 26-50), we don’t need to have DB2 retrieve and number any more than 50 rows. Finally, the modified base query is sandwiched within a nested Select where the resulting rows returned are limited to those row numbers that appear on the specific requested page.

Of course, the hard-coded values given in statements 2 and 3 (FETCH and WHERE clauses) should be replaced with appropriate parameterized values. This simple example can be used with just about any base query.

That being said, keep a few things in mind:

The base query you’re using should be able to run through the SQE. The CQE will not process the OLAP functions. Check the SQE restrictions for your particular version of i/OS to see what query options are not available with the SQE. Fortunately, many SQE query restrictions were lifted in V6R1 (a.k.a. 6.1).
Don’t include clauses such as FOR READ ONLY and OPTIMIZE FOR n ROWS in the base query because these aren’t allowed in a Nested Select.
These queries are server intensive. Not only does this technique require two queries per request, for page requests other than page one, DB2 must take the time to number the result set rows before returning the specific rows on the requested page number. Use tools such as Visual Explain to make sure the base query is running optimally before letting users execute this type of query.
If the application users are likely to page through an entire result set, it may defeat the purpose of making the server do all the paging work when the client will need all the data anyway. This example is beneficial when a user may get the initial portion of a large result set and browse through a few of the pages.

Summary

Overall this technique is simple to implement by using the power of the Nested Select and the Row_Number function. It can greatly aid in the development of thin client applications by returning only the appropriate rows as the user requests them.

Michael Sansoterra is a DBA for Broadway Systems in Grand Rapids, Michigan. Send your questions or comments for Mike via the IT Jungle Contact page.

New in V5R4: OLAP Ranking Specifications

                     Post this story to del.icio.us
               Post this story to Digg
    Post this story to Slashdot