Enable Row Set Paging in a Client/Server Environment Using SQL
November 11, 2009 Michael Sansoterra
One of the general paradigms of client/server programming is keeping the server side processing “fat” while keeping the client side processing “thin.” That is, a client/server application should let a server do as much work as possible while making as few assumptions as possible about how much processing power, memory, network bandwidth, etc., the client will have. Of course, sometimes certain applications, such as those requiring a graphics intensive environment, must be heavy on the horse power, but in general it’s good practice to keep network, memory, and processing requirements minimal on the client side.
One of the areas where it can be difficult to keep client-side processing at a minimum is with row sets returned from a query. Sometimes users can legitimately ask for very large results. (Other times, it’s wise to set a limit on the number of rows returned!) Whether it is a Web-based, desktop-based, PDA-based, or even a DRDA green screen-based application, it can be a programming chore to minimize network utilization and memory usage, particularly on the client side. Because these applications are usually “stateless” (that is, they do not rely on a constant connection to the server to utilize its resources), it’s not advisable to leave open a server-side cursor and page through the cursor results when re-connecting to the server.
One common solution to this problem is to give the user all the rows from their query at one time. To help manage large result sets, often a grid control of some kind (it could also be a subfile) is used to allow a user to page through the results. Consider Figure 1 below, which shows a grid control with paging implemented. This grid shows five rows per grid page, indicates the total number of rows available, and gives the user the ability to jump to a specific page within the results. Some grids even allow a client-side search or filter on the results.
These grids provide a nice user interface. However, if we’re dealing with a potentially large number of rows, do we really want to load all the results at one time (and tax the client with large data transfers and a large memory footprint), or should we only request one page at a time as the user requests them (and hence tax the database server)?
This tip will operate under the assumption that the client processing requirements should be minimal and that a paging operation should be done on the server with only the desired page returned to the client when requested. This will keep client-side resource utilization at a minimum.
The Origin of this Tip
The fact is, a grid control can be used to retrieve one page at a time while still giving the users the appearance that they have all rows available. I was recently working with a Microsoft .NET/SQL Server application where the client code used a feature of the .NET Framework called Language Integrated Query (LINQ). The LINQ code was automatically generating SQL statements on behalf of the grid control to retrieve one page of data at a time. Once I understood how LINQ was generating the “page at a time queries” for SQL Server, it was easy to convert the SQL Server T-SQL statements to the syntax required by DB2 for i. This SQL code requires the i/OS level to be at V5R4 or higher because it will make use of the ROW_NUMBER OLAP Ranking function.
The Paging Technique Using SQL
While tracing the .NET application’s “retrieve a specific page” technique, I noticed everything boils down to running a few SQL statements, which are detailed in abstract form here:
x=(Requested Page Number-1)*Page Size + 1.
The formula for y is either:
y=x + Page Size - 1
y=Requested Page Number * Page Size.
In summary, for the first page request, the client will run statements 1 and 2. Statement 1 will get the row count so that the client can calculate the number of pages available. Statement 2 will fetch the first page’s rows. For a page request other than page one, the client will run statements 1 and 3 to get the total number of rows followed by the rows for the requested page. Statement 1 will need to be re-run in case additional rows are inserted to the result set.
A Sample Query
Now let’s move from the abstract to the concrete. Let’s say the application query our user runs looks like this, and that a client side grid should display 25 rows. (I’m using the * for simplicity–columns should be explicitly named.)
Select * From MyTable Order By MyColumn
With this as our base query, the three abstract statements described above to control paging can be created in the following form using nested Select statements:
Statement 1–Retrieve the row count:
Select Count(*) From (Select MyTable.*From MyTable) Temp
The base query (without the ORDER BY) is put in a Nested Select to retrieve the total row count.
Statement 2–Retrieve the first page of data:
Select MyTable.* From MyTable Order By MyColumn Fetch First 25 Rows
The base query simply limits the query results to be restricted to the first 25 rows.
Statement 3–Retrieve the second page of data:
Select * From ( Select Row_Number() Over (Order By MyColumn) As Row_Number, MyTable.* From MyTable Order By MyColumn Fetch First 50 Rows Only -- Only fetch the max number necessary ) BaseData Where Row_Number Between 26 And 50 -- Only get the rows on the page
The Row_Number OLAP function is now returned with the base query results. This function will assign a unique incremental number to every row in the set based on the value in the function’s ORDER BY (which should match the query’s ORDER BY). The FETCH FIRST clause will limit the result set to only the maximum number of rows needed to fulfill the request. Since this example is requesting page two (rows 26-50), we don’t need to have DB2 retrieve and number any more than 50 rows. Finally, the modified base query is sandwiched within a nested Select where the resulting rows returned are limited to those row numbers that appear on the specific requested page.
Of course, the hard-coded values given in statements 2 and 3 (FETCH and WHERE clauses) should be replaced with appropriate parameterized values. This simple example can be used with just about any base query.
That being said, keep a few things in mind:
Overall this technique is simple to implement by using the power of the Nested Select and the Row_Number function. It can greatly aid in the development of thin client applications by returning only the appropriate rows as the user requests them.