Using i5/OS Performance Adjuster to Better Manage Memory

November 1, 2006 Doug Mewmaw

My boss has a saying that goes: “Do you want to get your brain surgery by a doctor that has done one operation or do you want the operation done by a doctor that has done 1,000 surgeries? He uses this analogy to give our customers peace of mind when we analyze their performance data. After all, in 2005 we did 3,000+ capacity plans alone. As a result, we get the pleasure of helping customers understand their performance needs. Big shops, small shops–our company has seen it all in regards to performance.

I state the above not to be a commercial for my company, but merely to lay down the foundation that my company not only knows performance, but we understand how to efficiently analyze the performance data. I think sometimes I see performance data in my sleep. (Insert “get a life” here.)

As someone who talks to a lot of people in the industry, I have come to the conclusion that performance management is not an easy concept for everyone. Particularly, memory analysis is a performance component that drives most people bananas. In the old days, one company had a product that predicted the impact of adding memory to the system. We also had very specific guidelines given to us by IBM for System/3X and AS/400 systems that helped us manage the memory component of our systems. To this day, I still have these guidelines (now faded to yellow) pinned to my bulletin board. What some people learned from experience, however, was that these guidelines didn’t work for all shops. The reason was simple. All applications are different. That is, 100 memory faults a second for company A would be fine, but company B’s 100 faults per second would bring the system to its knees.

Since everyone is not proficient at memory performance management, IBM created a performance process that helped manage the memory component on the AS/400, iSeries, and System i5 servers. This process is called Performance Adjuster (QPFRADJ) and it has been a part of the operating system for years. The IBM iSeries Information Center V5R3 Experience Report states: “The iSeries server has the ability to automatically manage the shared memory pools without any user interaction.” [Emphasis added.]

However, after looking at tons of performance data, I would give this analogy about the above statement by IBM concerning Performance Adjuster: My car will go forward if I turn it on and press the accelerator. However, will it get me to my destination with me merely turning the car on? Of course not. I’m required to steer the car, add gas to it when needed, and do periodical maintenance and check ups.

In other words, managing memory with Performance Adjuster requires one to do a lot more than merely turning it on. Let me explain. But before I do that, let’s be clear. It is not my goal to teach you about Performance Adjuster. You can get the documentation from IBM. I do however want to show you how important user intervention is when using Performance Adjuster.

Monitoring How QPFRADJ Is Affecting System Performance

Today, I’m constantly asked for guidelines that will help shops manage their memory performance component better. As I said before, IBM stopped giving out guidelines because they were not applicable for all types of environments. However, since machine pool faulting is so critical to the overall performance of the box, machine pool faulting is a metric where we must measure it against a best practice guideline. The best practice guideline is to have machine pool faulting less than 10 faults per second. When I managed my iSeries environment in my previous job, it was my philosophy to be a little more aggressive. That is, it was not uncommon for my boxes to be under 5 faults per second sec. That is something you will have to determine what is acceptable for your location.

With that said, let’s look at the below graph where performance adjuster is merely turned on and there is no user intervention.

Figure 1: Machine Pool Faulting between 16:00 and 23:45

Here are some observations:

1. This is a two Y axis graph. The Left Y axis data (shown in orange) shows the machine pool faulting rate and the Right Y axis data (shown in gray) shows the QPFRADJ memory movement throughout the day.

2. This machine has a high faulting rate. Between 16:00 and 23:45, notice that there are many intervals where the machine pool faulting rate is way over the recommended best practice guideline of 10 faults per second. During this period, the average is 20 faults per second.

3. This machine has an unstable machine pool memory allocation. Between 16:00 and 23:45, notice how QPFRADJ is moving the memory throughout the day. In fact, the machine pool memory allocation is extremely unstable, fluctuating from 1.2 GB to 2.9 GB.

Figure 2: The car is rolling down the road, but it is certainly not firing on all cylinders

Since the machine pool faulting rate affects how efficiently the systems tasks are being processed, it’s imperative that there is enough memory in the pool to ensure the faulting rate is within the best practice guideline. In other words, you don’t want a system that is spending all its time simply trying to manage the OS and related system tasks. Your CIO expects you to create a stable environment so your company’s applications can run successfully. That’s why the machine pool faulting metric is so critical to your system.

In this example, we need to tell Performance Adjuster we not only need to add more memory in the machine pool, but we need to keep it in there. This is done in two steps:

Add memory in the machine pool via the QMCHPOOL system value
Increase the minimum value in the machine pool via the WRKSHRPOOL command (F11)

Remember, the goal is to have zero problems in the machine pool. Only then can the your normal applications have a chance to run efficiently.

In the screen shot below, notice an environment where the system administrator has enough memory in the machine pool. This not only creates a stable QPFRADJ environment (total memory in the pool is not fluctuating in huge movements), but the faulting rate is within the best practice guideline.

Figure 3: Machine Pool Faulting Rate has decreased. Notice that QPFRADJ is not moving the machine pool memory up and down all afternoon

Ensuring the Machine Pool is Set Up Appropriately

Once you understand how QPFRADJ is affecting the machine pool and you make the necessary tuning changes, you must measure the machine pool service level. That is, with any tuning exercise, you must prove the change had a positive effect. In this case, we must ensure the machine pool faulting rate is within the best practice guideline:

Figure 4: Just like we measure our children’s grades in school, we must measure important performance components. In this example, we ensure our machine pool faulting is within the best practice guideline. The good news is that we are now within the guideline 99 percent of the time. The goal is to be within the guideline 95 percent of the time, with 5 percent being anomalies.

Do a Performance Adjuster MRI on Your System

In talking to people in the industry, a lot of performance tuners simply don’t understand the big picture of how Performance Adjuster is affecting their system throughout the day. A neat trick is to simply measure the total memory in each pool throughout the day (for all pools). By doing this Performance Adjuster MRI, you can see how QPFRADJ is moving memory around on the entire box. By understanding the memory component better, you can make educated performance tuning decisions.

Figure 5: Total Memory By Percentage

In this example, it’s obvious that pool 3 (which was *INTERACT) is consuming a majority of the total memory resources. The above screen shot shows that at 8:15 a.m., just under 62 percent of the total memory is in pool 3. Also notice that pool 5 and pool 6 has had no movement throughout the day.

Another View of the Performance Adjuster MRI Results

End Time

Mch Pl

*BASE

Pool 3

Pool 4

Pool 5