System Performance Management Is Like Having Insurance
October 6, 2008 Doug Mewmaw
One of the things in life that drives me crazy is paying for insurance. Okay, I realize it’s necessary and the responsible thing to do, but it just feels like I pay a premium and I get nothing in return. Then one thing happened. It rained. No, it rained a lot. Here’s something you don’t see every day. It was raining in my kitchen. Some $10,000 later, our house was back to normal, and thank heavens for the insurance.
As someone that teaches performance management and capacity planning, I am often asked what software and methodologies I use when I do my analysis. Even though I’m not in sales, I am often asked to put on my sales cap and give a presentation on my processes and the like. With the risk of making a shameless plug for my company’s performance management product, I simply used real-life examples on how our product changed my former IT shop’s service levels for the better. In those engagements, I bet the product sells itself 95 percent of the time. The reason it does is simple. When people realize I was a customer first (using a performance management product in the trenches), it gives them the peace of mind that I truly am looking out for their best interests.
But I’d like to talk about the other 5 percent when the sale doesn’t occur. It’s hard for me to say this, but even though I think my company’s product is the best performance management software product in the industry, I can understand that there might be a reason where our solution isn’t a good fit for your IT organization. Here are a couple reasons I’ve heard:
The first two are valid reasons, but the third is a reason I will never accept, and it is the reason that I’m writing this article. I love this one. At every COMMON expo I’ve attended, I’ve heard many tech support managers, CIOs, and the like that proudly exclaim, “Our system is good. We don’t have any performance issues.” As someone that analyzes systems for a living, let me assure you that if you’re not using performance management software because “we don’t have any performance issues,” I will be the first to try to convince you that you are living in dangerous territory.
Let me share some real life horror stories. Both are from prominent IT organizations in the United States using i platforms. Here’s example one, where an application programmer made one program change:
This is a CPU graph. Note that prior to the change, the CPU percentage was averaging only 31 percent. After one program change, the system was pegged averaging an astounding 97 percent. This graphic proved not only did the system have performance issues, but more importantly, it helped the system team prove to the application team that the “fix” they suggested of putting the production system back to where it was prior to the initial change, wouldn’t work. As a systems programmer that’s constantly trying to communicate what’s happening on the system, this functionality is a powerful tool to have in your systems tool belt.
Take a look at example two, from a prominent U.S. bank, with two 20-year veteran systems programmers who looked at the below picture. It caused both programmers to run out of the room in a panic.
The reason was simple. There was a production issue that no one realized was occurring. In this example, a vital production process that normally consumed 4 percent of the system was now consuming 34 percent of the CPU resources (purple piece of the pie). When the programmers drilled down to isolate the problem, they uncovered the job that was the culprit. See below:
The above graphic shows the CPU ms used. This graph not only proved what day the problem started, but more importantly it indicated what program was causing the issue. The amazing part of the story is that the production issue was discovered to be in 10 other job streams. This is a real-life example where a major system issue occurred and no one knew.
The above real-life example is telling. Think about it. Two veteran systems programmers with 40 years of experience didn’t see the issue. In fairness, their production system processes 10,000-plus jobs every day. How can anyone manage that from a green screen? I use to, but I learned a valuable lesson when I, too, uncovered some performance issues on my production systems. Only after utilizing the appropriate methodologies and using a performance management tool to understand your system at a deeper level, can one be sure that “hidden” problems don’t exist.
Remember, even if your system is truly operating efficiently and the phones aren’t ringing off the hook, the key is to take a picture of your core performance components when things are normal. When that performance baseline is recorded, one can use that historical data to help troubleshoot future problems. Not only is performance management software insurance for your system, but it’s peace of mind that you can manage your production systems on good days and bad days.
Doug Mewmaw is a “jack of all trades” IT veteran who currently is director of education and analysis at Midrange Performance Group, an i business partner that specializes in performance management and capacity planning.