fhg
Volume 7, Number 27 -- July 25, 2007

Memory Management: It's Your Fault, Now Fix It

Published: July 25, 2007

by Doug Mewmaw

This year, I will be on the AS/400 platform for 19 years. Wow--where did that time go? All those years working with peers, customers and the like, I'm convinced that the performance component that is the most misunderstood, is memory. In a previous article, I explained about the importance of the machine pool and we looked at the Performance Adjuster feature of i5/OS as well. The feedback I received from that article proved to me that we need more articles on managing memory.

I would like to share some neat tricks of the trade and best practice techniques that I've used over the years that ensured my systems were running smoothly in regards to memory. We'll assume that the machine pool faulting is within the best practice guidelines--under 10 faults per second.

Let's start with a basic question that I received in class the other day as it is a perfect starting point for our memory discussion. A student asked this: "What is a fault?"

Here is a great definition one of my mentors gave me years ago: A fault occurs when a virtual address is referenced, but it is not in main storage. When a fault occurs, a job will stop dead in its tracks and wait for an I/O, and by definition, it is a synchronous I/O.

Since I'm a big picture person, let's look at a non-technical description. My wife is a fifth grade teacher. Every year, she describes a phenomenon that I think explains faulting perfectly. A simple way to look at faulting is thinking of homework with our kids. The scenario is a child doing homework early in a new school year:

A child (our system job, by analogy) needs an answer to a homework question. The problem is that the child forgot some things they learned from the previous year (needed data not in memory). At that point, the child uses Google or Yahoo to search for it or pages through an old book until he or she finds the answer (accessing info from DASD). In other words, the info the child needed was not in their memory. The child is forced to find the answer. That process of not having the info is a fault. Now, the child may have to look for numerous answers (faulting rate). The more the child needs to look for answers, the longer their homework takes (sync I/O affecting how long a job takes to complete).

So why do we need to manage memory? Here are the short answers:

Interactive jobs: If there is a high faulting rate (Sync I/Os are occurring), the interactive job will slow down and response times will suffer.

Batch jobs: If there is a high faulting rate (Sync I/Os are occurring), the batch job simply takes a long time to complete. This phenomenon affects the nightly batch process window.

Where Do You Start?

Managing memory requires two very important work management prerequisites:

  1. Inventory your system to fully understand what is running in each pool
  2. Prioritize and categorize your jobs into functional areas

I'll explain each in turn.

1.    Inventorying your system pools. More often then not, I've run into situations where there were too many cooks in the kitchen. That is, jobs were not only set up by different areas within IT, but the environment was set up many years ago. As a result, its imperative to ensure work management is set up efficiently. I would always ask questions like:

  • Are my jobs separated by functional area? That is, Order Entry, Accounting, and so forth.
  • What jobs are most important to the business? Order entry jobs where the business is making money are more important than a once a month accounting job.
  • Am I sure that interactive and batch jobs are not running in the same pool?
  • Are my jobs running in the right type of subsystem?
  • Are my jobs running at the appropriate time of the day?

In a real-life example, notice what I encountered in a recent study where a site had performance issues.


Batch and Interactive jobs in the same pool. Note that Pool 4 was *INTERACT, but there are batch jobs in the same pool.


2.    Prioritize & categorize your system workload. I'm not a big fan of running all jobs out of *BASE. Instead of having all jobs run in a "hodge podge" pool, I like the idea of separating core application workloads into separate pools. To appropriately prioritize and categorize workloads, I would talk to the application teams and the business side of the house to ensure the system is set up correctly. A simple example is shown below:


Workload

Pool Type

Business Priority

Order Entry

*INTERACT

#1

Accounting

*SHRPOOL1

#4

HTTP (Web)

*SHRPOOL2

#2

Warehouse

*SHRPOOL3

#3

 Every Thing Else

*BASE

 

And so on...

 

 


Once you have your system set up by core functional areas, then you have the starting point where you can begin to manage memory efficiently.

Note: While it is my philosophy to separate workloads into separate pools, I do not recommend slicing the workload into too many memory pools. What is too many? I believe one can separate workload types into no more than 10 total pools. Keep in mind that is my personal guideline. Obviously, everyone has their own guideline or number they feel comfortable with. The bottom line for me is that I don't want the process of managing memory to become a management nightmare. Too many pools also causes overhead to the system when Performance Adjuster is turned on.

Manage Your Memory With a Simple Methodology

Here's the methodology I use to manage memory in systems.

1.    Start with the big picture and measure total faults on the system

There is no better starting point than looking at total faults on the system. Here's why:



This graph is telling. Not only does it tell you the faulting rate for the entire system, but it points out a huge faulting rate increase starting the week of July 25. For the record, the faulting increased on the system 369 percent! Check it out:



What happened? Was a new workload added to the system? Was there a memory change on the system? Was there simply an application change? This supporting documentation proves that you need to dig further to see what changed.

2.    Determine your system's faulting factor

Many years ago, I saw a neat article by IBM's Mike Denney (called "Analyze This") explaining that memory analysis should start with determining the percentage the system is page faulting. I wholeheartedly agree. Shortly after the article came out, my company worked with Mike with the goal of taking the concept and making a graph that would help people understand the memory component better. The end result was a neat memory graph called a Faulting Factor. Note: It is not my intention to teach you the technical aspects of how the faulting factor is calculated. Just know that the calculation uses Sync I/Os, disk response time, CPU usage, and faults. For a complete description of how the factor is calculated, see "Analyze This" for details.

So what is a Faulting Factor? It is the percentage of time your system is faulting. It's pretty easy when you think of a clock.

For example: Your system has a faulting factor of 50 percent. What does that mean? That means in a 60 minute time frame, your system faulted for 30 minutes. Why is that important? That means that your jobs were only in the CPU for 30 minutes! See the example below:



In this real life example, notice that the faulting factor for this system has many intervals where the percentage of faulting is over 40 percent. Thinking about our one hour time clock as a measuring stick, this system had intervals where jobs were waiting almost half the time--24 minutes out of 60.

Remember what the goal is: To have enough memory where your system is processing jobs in the CPU efficiently. This is an example where more memory would help this system. If jobs are waiting a lot, services levels will suffer.

What is the faulting factor guideline? My personal best practice guideline is to be under 30 percent.

3.    Measure each pool's faulting rate. I like to start with a normal day. That is, for ever interval during the day, I like to see what pools are faulting the most.



In this example, each pool is measured for each interval.

Next, I like to look at the pools historically:



Pool 2 (*BASE) is analyzed. Note that the faulting rate is increasing.



Here's Pool 3 and so on . . . .

What am I looking for? To understand the faulting rate for each pool. I ask questions like:

  • What are the growth rates for each pool?
  • Is the faulting rate increasing or decreasing?
  • Has the faulting rate for each pool changed significantly?

4.    Compare pool faulting and total system faults. I also like to understand how each pool is affecting the overall faulting. I ask this basic question: Are my pool percentages versus total faults constant or has one pool increased significantly?

Here are two graphs that I use. The first is a normal day, plotting pool faulting against total system faulting:



In this example, Pool 2 (*BASE) faulting rate is compared to the total system faulting rate. A neat technique is to graph the pool percentage of total faults as well (shown in green in the chart above). This shows you how the faulting changes throughout the day. This kind of data helps you manage Performance Adjuster more efficiently, too.

I do this for every pool on the system. Next, we breakdown the Faulting Factor, showing faulting factors by pool:



Since the Faulting Factor is such a big component for measuring memory successfully, I like to understand how each pool is affecting the overall system faulting factor. In this example, notice that *BASE's average faulting factor is 5 percent (in gold in the chart). The total faulting factor is 21 percent (blue). That means of the total faulting factor, 23 percent of it is from *BASE. I do this for each pool.

The Bottom Line

Measuring memory does not have to be an overwhelming experience. True it's not an exact science and it does takes a bit of detective work. But if you create a structure plan to manage the memory component, you put yourself in great position for solid performance service levels. Here's a neat thing to take away from this article: I created the above methodology to not only help me with memory analysis, but to also help with my overall performance of the system.

Even though the prerequisites seemed to be a daunting task at first, it was well worth it when I saw performance improvements immediately. From a work management standpoint, I had peace of mind that my system was set up efficiently too. So to summarize, my methodology is as follows:

  • Inventory your system to understand what is running in each pool
  • Prioritize and categorize your system workload
  • Start with the big picture--measure total faults on the system
  • Determine your system's faulting factor
  • Measure each Pool's faulting rate
  • Compare pool faulting against total system faults

In my next article, we will drill down deeper and analyze jobs within a pool.


Doug Mewmaw is an 25-year "jack of all trades" IT veteran who currently is director of Education & Analysis at Midrange Performance Group, an iSeries business partner that specializes in performance management and capacity planning. He can be reached at DMewmaw@mpginc.com.


RELATED STORIES

Using i5/OS Performance Adjuster to Better Manage Memory

Why American Employees Are So Unhappy



                     Post this story to del.icio.us
               Post this story to Digg
    Post this story to Slashdot


Sponsored By
PRODATA COMPUTER SERVICES

Join us for our next webinar,
DBU Remote Database (DBU RDB).
DBU RDB allows you to connect via green screen
to the following databases:

                                                                     · Oracle
                                                                     · MySQL
                                                                     · Microsoft SQL
                                                                     · DB2
                                                                     · and more!

Deploy the powerful DBU functionality for cross-platform access to data.
Get your FREE 30 day license today and start working outside the box!

ProData Computer Services
800.228.6318
www.DoDBU.com


Senior Technical Editor: Ted Holt
Technical Editors: Howard Arner, Joe Hertvik, Shannon O'Donnell, Kevin Vandever
Contributing Technical Editors: Joel Cochran, Wayne O. Evans, Raymond Everhart,
Bruce Guetzkow, Brian Kelly, Marc Logemann, David Morris
Publisher and Advertising Director: Jenny Thomas
Advertising Sales Representative: Kim Reed
Contact the Editors: To contact anyone on the IT Jungle Team
Go to our contacts page and send us a message.

Sponsored Links

Maximum Availabilty:  The Ultimate System i Replication for Business of All Sizes
COMMON:  Join us at the Annual 2008 conference, March 30 - April 3, in Nashville, Tennessee
New Generation Software:  Leading provider of iSeries BI and financial management software


IT Jungle Store Top Book Picks

The System i Pocket RPG & RPG IV Guide: List Price, $69.95
The iSeries Pocket Database Guide: List Price, $59.00
The iSeries Pocket Developers' Guide: List Price, $59.00
The iSeries Pocket SQL Guide: List Price, $59.00
The iSeries Pocket Query Guide: List Price, $49.00
The iSeries Pocket WebFacing Primer: List Price, $39.00
Migrating to WebSphere Express for iSeries: List Price, $49.00
iSeries Express Web Implementer's Guide: List Price, $59.00
Getting Started with WebSphere Development Studio for iSeries: List Price, $79.95
Getting Started With WebSphere Development Studio Client for iSeries: List Price, $89.00
Getting Started with WebSphere Express for iSeries: List Price, $49.00
WebFacing Application Design and Development Guide: List Price, $55.00
Can the AS/400 Survive IBM?: List Price, $49.00
The All-Everything Machine: List Price, $29.95
Chip Wars: List Price, $29.95

 

The Four Hundred
IBM Creates New Power, SMB Server Divisions

Rumor Du Jour: i5/OS on Other Platforms? Not!

IBM Turns In Its Best Second Quarter in Six Years

As I See It: Lawyers, Lies, and Statistics

The Linux Beacon
Linux Distro Xandros Buys Email Specialist Scalix

HP Buys System Management Tool Maker Opsware for $1.6 Billion

IBM Creates New Power, SMB Server Divisions

As I See It: Lawyers, Lies, and Statistics

Four Hundred Stuff
IBM Upgrades High-End System i Server with Power6

IBM Previews i5/OS V6R1, Due in 2008

EMC Offers Hardware-Based HA Alternative

SugarCRM Now Available for i5/OS

Big Iron
Three-Digit z Boxes Head for History

Top Mainframe Stories From Around the Web

Chats, Webinars, Seminars, Shows, and Other Happenings

System i PTF Guide
July 21, 2007: Volume 9, Number 29

July 14, 2007: Volume 9, Number 28

July 7, 2007: Volume 9, Number 27

June 30, 2007: Volume 9, Number 26

June 23, 2007: Volume 9, Number 25

June 16, 2007: Volume 9, Number 24

The Windows Observer
Ballmer Talks Up 'Cloud Computing'

Opsware Locks Down Server Changes with SolidCore

Microsoft Ships Windows Home Server

Oracle Says 11g Database Is Better, Cheaper, and Faster

The Unix Guardian
IBM Creates New Power, SMB Server Divisions

IBM Turns In Its Best Second Quarter in Six Years

Intel Certifies Solaris on Its Carrier-Grade Servers

Mad Dog 21/21: To Avatar and Avatar Not

Four Hundred Monitor
Four Hundred Monitor's
Full iSeries Events Calendar

THIS ISSUE SPONSORED BY:

Help/Systems
ProData Computer Services
Guild Companies



TABLE OF CONTENTS
Avoid Large Local Variables in Modules

Memory Management: It's Your Fault, Now Fix It

Admin Alert: Getting Around System i Default Passwords, Part 1

Four Hundred Guru

BACK ISSUES

From the IT Jungle Forums
What's coalesce good for?

Duplicated printer files

Urgent Help Needed--Limit the result set in SQL stored procedure

Problem with "cpyfrmimpf"

FNDSTRPDM Output Member Name to *OUTFILE





 
Subscription Information:
You can unsubscribe, change your email address, or sign up for any of IT Jungle's free e-newsletters through our Web site at http://www.itjungle.com/sub/subscribe.html.

Copyright © 1996-2008 Guild Companies, Inc. All Rights Reserved.
Guild Companies, Inc., 50 Park Terrace East, Suite 8F, New York, NY 10034

Privacy Statement