Guru: Physical System Performance
November 15, 2021 Dawn May
Almost all IBM i shops use logical partitioning and have several partitions on a single Powerbox. There may be several IBM i partitions, VIOS partitions, and possibly AIX or Linux on Power partitions. Regardless of what type of operating system is running in the partition, the hypervisor collects performance metrics for all partitions. These performance metrics are always being collected, and you can allow a partition access to these performance metrics. In the case of IBM i, this physical system performance data can be gathered by Collection Services. IBM documents this feature in Collecting and displaying CPU utilization for all partitions.
When there are multiple partitions on a single Power box, you may be interested in understanding the interaction between those partitions, particularly when using shared, uncapped processors.
To get started, you must enable the collection of this performance data for the desired partition. You only need to collect this data on one partition, and it must be an IBM i partition if you want to view the data with the Investigate Data task. The CPU utilization information collected will reflect work done in all partitions, including VIOS, AIX, Linux on Power, as well as IBM i.
To enable the collection of this performance data, you must turn on the Enable Performance Information Collection option for the desired partition. To find this configuration parameter using the enhanced HMC interface, navigate through Resources → All partitions. Click on the desired partition and on the actions drop down tab select View Partition Properties, then expand Advanced Settings. The screen capture in Figure 1 has the Enable Performance Information Collection option outlined in pink.
Once the performance data collection support is enabled, Collection Services will gather this additional information, starting with the next collection interval; no restart of any kind is needed. At each collection interval, Collection Services will collect partition configuration and utilization information from the hypervisor. The data is stored in files QAPMLPARH, QAPMSYSPRC, and QAPMSHRMP.
The Performance Data Investigator includes perspectives in the Collection Services content package to visualize this physical system data. It includes the charts listed below; for memory or I/O charts, your environment needs to be using the underlying hardware that applies to these charts.
- Logical Partitions Overview
- Donated Processor Time by Logical Partition
- Uncapped Processor Time Used by Logical Partition
- Virtual Shared Processor Pool Utilization
- Physical Processors Utilization by Physical Processor
- Dedicated Processor Utilization by Logical Partition
- Physical Processors Utilization by Processor Status Overview
- Physical Processor Utilization by Processor Status Detail
- Shared Memory Overview
- Full System I/O Architecture
- All 12X Loops
- All PCIe Gen2
In the Heritage Version of Navigator for i, you find these charts by expanding Collection Services then Physical System. In the New Navigator, you don’t have the hierarchical organization of the charts, so you may need the list of charts above to find them by filtering on the perspective column.
The Logical Partitions Overview chart is a good starting point to get familiar with this support. This chart displays the average partition CPU utilization for each partition, as well as the CPU entitled time used and CPU uncapped time used for all partitions on the box. The legend on the left is time and pertains to the stacked bars; the legend on the right is the average partition CPU utilization represented by the lines for each partition. Under the chart you will find the checkboxes for each metric shown in the chart. I have excluded this from the screen capture in Figure 2, as it shows partition names. As the number of partitions on the box increases, this chart can become somewhat busy. The ability to turn off selected metrics in PDI is useful; if you want to focus in on a couple partitions, turn the metrics off for the other ones.
When reviewing this chart, I generally look for those partitions using uncapped CPU time to review if that makes sense given the work each partition needs to do. If you see a partition that is consistently using a lot of uncapped CPU, it may be an indication that the entitlement for that partition should be increased.
The Uncapped Processor Time Used by Logical Partition chart (Figure 3) shows you which partitions are using the most uncapped processor time along with the average partition CPU utilization. This chart helps you understand which partitions are using the most uncapped time and to verify if this is expected usage.
A recent client of mine discovered the value of the logical partitions overview chart. They took my advice and saved their management collection objects. Many months later, an application started to experience performance issues. We restored the known, good collection, and compared the Logical Partitions Overview chart with a current collection experiencing problems. Using these charts, we identified a partition using significantly more uncapped time in the shared processor pool during the timeframe when the application performance was suffering. We also saw that when the application performed well, it used uncapped CPU, but when the application struggled, the partition could not use over 100 percent utilization due to the other partition using the uncapped processor capacity. This helped identify workloads that conflicted in their run time, as well as tuning the partition weights so less important work would not impact key production applications.
Using this ability to compare good performance versus bad, anyone can identify key differences during troubleshooting.