Guru: IBM i Experience Sharing, Case 5 – Using IBM i PDI Charts To Answer Performance Questions
July 18, 2022 Satid Singkorapoom
During my long career as an IBM i performance specialist, it was natural for people to ask me performance-related questions of various kinds. But in many cases, it seemed to me that the customer could not understand my answers, and I could not provide supporting evidence due to the lack of a performance reporting tool that could produce easy-to-understand explanatory information. The situation changed for the better with the availability of IBM i Performance Data Investigator (PDI) tool as of release 6.1. I would say that the PDI tool provides charts of information that resemble those used by business people in discussing business health status and performance. PDI succinctly illustrates the meaning of the saying “A picture is worth a thousand words.”
Let’s explore some cases of the PDI tool’s persuasive power to provide clear performance-related information that leads to sensible answers.
Why Do I Need To Use More CPU Power?
An SME customer runs a Java-based core application in a one-core Power8 server. There is only one main Java core application job running 1,200 threads. The customer encounters persistently slow application performance during every month-end period, which they know is at the peak workload of each month. The customer observes that during the duration that users complain about long response time, the Work with Active Jobs (WRKACTJOB) command shows high overall CPU % Busy but it reaches 100 percent only sporadically and never lingers there too long. The question is whether they need to activate more CPU and if so, why?
I produce a set of PDI performance report charts and browse through them on CPU and jobs, disk performance, memory faulting, wait times, and a few more and find that CPU rarely reaches 100 percent, disk performance is consistently decent, memory faulting rate is never abnormally high, but the Wait Overview chart does not look good.
In the 24-hour timeline chart in Figure 1, you can see that CPU Queuing wait time exists in substantial to overwhelming proportion against Dispatched CPU Time whenever CPU % Busy reaches 70 percent or higher (the long red horizontal line in the chart). CPU Queuing time is at the largest value when CPU % Busy is 100 percent at a few instances but it is very small or disappears when CPU % Busy is lower than 70 percent.
In cases of customers who deploy their RPG, COBOL or C-based applications running a number of single-thread jobs in a range of only a few hundred jobs, rather than a few multi-thread jobs with 1,000 threads or more, as in this case, I rarely see a lot of CPU Queuing Time when CPU % Busy is lower than 90 percent in Power7 servers and later models. In the case of this Java application, I see that the main cause of a sizeable amount of CPU Queuing Time at high workload period is that there are an overwhelming number of concurrently active threads competing for CPU cycles. The Power8 CPU, even with its SMT-8 capability, cannot accommodate all these threads fast enough.
The Wait by Job or Task chart reveals further that the lone core Java application job appears to suffer the dominant proportion of CPU Queuing wait, as shown in Figure 2. (I use the Wait by Job or Task chart here rather than Wait by Generic Job or Task, but both charts display the same performance data for this single job.)
Although the amount of CPU Queuing Time is not overwhelming against Dispatched CPU Time, it exists in substantial proportion. Combining that with the facts that (1) the customer observes degraded application performance for the peak workload day of this report and (2) all other PDI charts show no additional performance bottleneck, I see that these two wait time charts provide good evidence that more CPU power is needed.
This straightforward analysis shows that the customer needs to activate at least one more CPU core to reduce or eliminate CPU Queuing wait time and improve overall application performance. The customer agrees. Figure 3 shows the resulting month-end wait time charts after adding the second CPU core.
CPU Queuing Time is almost completely gone, and users report better application response time. The overall CPU % Busy also reduces noticeably, staying below 70 percent most of the time.
The Database Record Lock Contention Time (the light green bars in Figure 1) is addressed by creation of database indexes, which I will discuss later in this article.
In my experience, very high CPU % Busy is not a sufficient reason in itself to activate more CPU cores, unless you want to plan for future workload growth. I worked with several cases where application performance was still decent even at very high CPU % Busy. When I looked into PDI charts, I saw there was no performance bottleneck in any part of the system. I was also involved in several cases where customers had performance issues and very high CPU % Busy. The customers added more CPU core (without my involvement), only to be subsequently dismayed that it did not solve the performance issue. They had not checked for existing performance bottlenecks in other parts of the system before making the decision to add CPU core.
I hope you now see that a good reason to add more CPU core (other than system capacity sizing) is that CPU Queuing Time (or Machine Level Gate Serialization Time) exists for an extended period in a dominant or overwhelming proportion against Dispatched CPU Time in your core application jobs, and there is no sensible reason to reduce the existing number of concurrent jobs or threads, nor is any additional performance bottleneck detected in other parts of the system. In this regard, PDI’s Wait Overview and Wait by Generic Job or Task charts are good places for you to obtain the information you need to make a sensible decision.
Do I Need To Add More Memory?
For a long time, this was a tough question for me to answer. But in early 2020, an enhancement to the PDI tool (through PTFs for IBM i 7.3 and 7.4) added a new set of charts to the “Memory” category. These enhancements allow you to look at four new charts on Memory Usage by Pool, Memory Usage Peak by Pool, Memory Usage Minimum by Pool, and Memory Available by Pool. The last chart tells you whether you need to add more memory or not, and if so, to which memory pool to allocate the additional memory. “Available” here means unused, which means that if you see a sizeable amount in the chart, you have unused memory in a particular memory pool that you can move to another pool that may need it more. Let’s consider a sample (Figure 4).
In this 24-hour timeline chart on Memory Available by Pool, you can see that *MACHINE pool (the bright pink bars) has about 5,500 MB (5.5GB) of memory lying around unused all day long! Other pools appear to have less excess memory (around 1GB) but not all the time. You can use zoom-in feature in the chart to look at any specific area of the chart for more details. This is a good chart that gives you a comparative view of memory consumption for each memory pool in your system.
To determine which memory pool you should move this excess memory to, we employ an “evidence of absence” approach. This means the bar graph representing a memory pool that uses up all its memory all day long will not appear in this chart. Or if it uses up all its memory for significant periods in a day, you should see an absence of its bar graphs most of the time within the entire chart. Its bar graphs, when present, may not appear high in value, which indicates a modest amount of memory being left unused sporadically. A memory pool that exhibits such a trait is a good candidate to receive more memory.
You should look at this Memory Available by Pool chart on several peak workload days to ensure that some pools have much unused memory while other pools need more on a regular basis. If this behavior does not appear to be regular, you may not want to adjust memory pool allocation yet. It may also be a sign that IBM i Automatic Performance Adjuster (QPFRADJ system value) is moving the unused memory around for the pool in need. Only when you consistently see excess memory in the same pool(s) (as in Figure 4) and persistent absence of unused memory for other specific pool(s) over several peak workload days should you adjust the pool size. You may also want to learn to use the Work with Shared Storage Pools (WRKSHRPOOL) command to limit minimum and maximum size of the pools based on the information you observe from this useful chart. This article teaches about WRKSHRPOOL.
I think that, in a desired case of sufficient memory for all pools at all times, you want to see that all pools display modest amounts of unused memory all day long in the chart of peak workload days. But if you see a totally empty chart with no bar graphs at all, what does this mean? Do not be confused. Look at this chart over several days to confirm that it displays a blank chart all or most of the times. If so, it means your workload is consistently very memory-hungry and consumes all of the memory regularly, thus it is time to add more memory to your server urgently.
I Have Multiple Disk Pools (ASP) In My Server. How Does Each Perform?
When we investigate performance cases, the customers may have more than one disk pool (ASP) in their servers. When we want to look at comparative disk response time among these disk pools from PDI tool, a chart named “Disk Throughput Overview for Disk Pool” comparatively displays a timeline of disk service and wait time against data transfer rate (in MB/sec.) for each disk pool in the same chart. It is a very detailed and useful chart. I discussed using the chart in Figure 1 of a previous article.
But if you want a simpler view of comparative average disk response time for all pools and nothing else, a chart named “Disk Overview for Disk Pool” (Figure 5) is the one to go to. (Keep in mind that there is another chart named Disk Overview “by” Disk Pool, which is not what we want here.)
This 24-hour comparative chart tells us clearly right away that overall average response time of ASP 33 (EMES_IASP – the light green line) is quite bad all day long (much higher than 5 msec.) while ASP 1 (System ASP) performs mostly in a decent range (5 msec. or lower). Since, ASP 33 stores the core application database, its degraded response time may lead to undesired application performance. So, if we want to improve application performance, the analysis is that we need to improve the hardware of ASP 33 to restore decent performance. Case closed.
For an extra effort on application performance tuning in this particular customer case, when I look at Wait Overview and Wait by Generic Job or Task charts, I see the results in Figures 6 and 7.
The proportion of Disk Page Fault Time wait is not a lot, nor is it overwhelming compared to Dispatched CPU Time. This wait time is a result of memory faulting caused by running jobs. Each fault causes disk read or write. There are ways to reduce memory faulting, one of which is briefly discussed here.
The chart in Figure 7 focuses on disk page faulting.
It turns out that QZDASO* jobs (QZDASOINIT, the DB2 remote SQL server job) accumulate almost all Disk Page Fault Time in the entire 24 hours. This customer runs IBM i as a database-tier server responding to remote SQL requests (through ODBC/JDBC connections) from many web-tier servers. So, QZDASOINIT jobs carry the core application workload, as is evident from the chart. The proportion of Disk Page Fault Time is not overwhelming against Dispatched CPU Time, but it is substantial. If we can reduce this dominant wait time, we can improve overall performance of this client-server application.
All memory faulting happens in ASP 1 only. Although we saw earlier that ASP 1 provided decent response time, if we can reduce overall memory faulting rate for all QZDASOINIT jobs, it will reduce overall Disk Page Fault Time and potentially improve application performance. Since this is a case of SQL workload, we can make good use of tools in DB2 for i to deliver this improvement. How? Here is a brief discussion.
When we deal with SQL workload (and Query for i), one common cause of a high, and mostly unnecessary, memory faulting rate is that the SQL engine resorts to table scans due to a lack of useful indexes on tables. (A table scan means that all rows in a table are read.) The larger the tables, the higher the memory faulting as a result of table scans, and therefore longer response time for the jobs that run such SQL statements. In many cases, a table is scanned just to retrieve only one row. What a wasteful operation! You can prove this using Visual Explain on Plan Cache snapshot data. I hope to discuss this in the future.
I personally create indexes for tables larger than 100MB. Collectively, index probes (directly reading index entries of interest) and index scans cause much lower memory faulting than table scans. So, it is worthwhile to be mindful of this fact and use DB2 tools to identify and create useful indexes. The main tools we use to achieve this goal are Plan Cache snapshot, Visual Explain, Index Advisor, and Index Condenser. These are available through the Navigator for i GUI tool. Once we create sufficient number of useful indexes, the performance improvement result is seen in the chart in Figure 8.
Comparing to the chart in Figure 7 (all charts in this section are from the same customer case), you can notice that the proportion of Disk Page Faults Time was reduced by about 70% (from about 600,000 sec down to about 180,000 sec.) and users reported improved workload performance.
The point I would like to make here is that we can take actions in multiple system factors in the effort to improve application performance. In this case, that means improving disk hardware of ASP 33 for better response time and creating useful indexes for SQL statements run by the QZDASOINIT jobs. The more relevant performance factors you can identify and address, the better the improvement. PDI charts are helping hands for you in this effort.
Do I Have Workload Growth Or Reduction In My Server?
Over the years, I have asked my customers this question far more often than they have asked it of themselves. To help you answer this question, the PDI tool has a useful chart that I would like you to be aware of. It is called “Resource Utilization Overview”. Actually, this PDI item contains two charts, and it is the second chart I’m talking about here. See Figure 9 for an example.
In this chart, I see that the line graph for Total Logical Database I/O Per Second (orange line) is a good representation for typical business processing workload, as it heavily accesses the database. If you run a web serving workload, which mainly accesses stream files, this chart is not applicable. Web serving is CPU intensive, so you would use the Dispatched CPU Time in Wait Overview chart instead.
It is especially useful to compare charts from different servers or timeframes. A good case is when you are about to upgrade your Power server to a new model. Comparing these peak workload period charts from your old server against the new one can give you meaningful information. But you also need to re-adopt an ancient concept in geometry from your school days — calculating the area under a graph! (No, I’m just kidding. Visual estimation would suffice.)
I hope you agree with me on the power of the modern performance report charts from the IBM i PDI tool. Once you learn how to interpret them, they speak clearly about the state of system performance health so that you can take the proper remedial actions. Let the PDI tool provide the benefits you deserve to have from your server.
Satid Singkorapoom has 31 years of experience as an IBM i technical specialist, dating back to the days of the AS/400 and OS/400. His areas of expertise are general IBM i system administration and performance of server hardware, the operating system, and database. He also has an acquired taste for troubleshooting problems for his IBM i customers in the ASEAN geography.