|
|
![]() |
|
|
New Statistics Optimize V5R2 Database Performance by David Morris One of the not readily apparent features of V5R2 is a new query optimizer that uses file statistics gathered in a background process to improve SQL performance. Initially, use of those statistics by a rewritten query optimizer is limited. In most cases, the old query uses the optimizer that existed on releases before V5R2. However, IBM is phasing in use of the file statistics and new query optimizer on subsequent releases and possibly through PTFs.
A new system value--QDBFSTCCOL--identifies the types of statistics requests that a background system job--QDBFSTCCOL--will process. You can change this system value using the Work with System Value (WRKSYSVAL) and Change System Value (CHGSYSVAL) commands or you can use iSeries Navigator (formerly Operations Navigator. With iSeries Navigator, modify this system value by selecting your system, configuration and service, system values, performance, and database. The QDBFSTCCOL system value supports four values:
In most cases, you will want to keep the system default of *ALL. This will tell the system to gather statistics in the background and allow processing of user-requested collections. To better control collections, you can call APIs to start, update, list, delete, or cancel statistics gathering. The following list describes the statistics-gathering APIs:
The main reason why you would force statistics to be built is to gather statistics for new files. One scenario might be to build statistics when new objects are moved into a production library and to delete statistics when an object is removed from production. You might also decide to force statistics to be built when data is replicated between systems. The actual statistics gathered keep track of cardinality (the number of unique values in a field), estimated value ranges, and counts for the most common values. Initially, statistics are gathered for single columns. Future releases or PTFs may enable statistics collection for combined columns. The following table gives you a rough idea of what is collected.
The query optimizer uses these statistics in several ways. The most common use is for the query optimizer to use statistics to better estimate the cost associated with a particular way of implementing a query. For example, in the example above, a query for records with a name less than or equal to Adams will only return the first two rows in the underlying file. Without statistics or an index, the query optimizer often assumes a query involving less than or greater than comparisons will return half of the records in a file. In some cases, statistics can replace the need for a specialized index. In other cases, the query optimizer uses statistics to build a temporary index more quickly. The file statistics that V5R2 gathers lay the foundation for future performance enhancements. Those enhancements have the potential to dramatically improve query performance and to reduce the amount of optimization work done by application developers. Initially, use of these statistics is so limited that their value is harder to justify. In the future, the value of these statistics will far outweigh the cost of gathering and storing this information on your system. David Morris is a software architect at Plum Creek Timber Company and started the iSeries-toolkit open source project. He can be reached at dmorris@itjungle.com.
|
Editor
Contact the Editors |
|
Last Updated: 8/28/02 Copyright © 1996-2008 Guild Companies, Inc. All Rights Reserved. |