The Many Pros and Few Cons of iSeries Logical Partitioning
by Timothy Prickett Morgan
I have been writing about the increasingly sophisticated logical partitioning capabilities of the AS/400 and iSeries family of servers for over six years now, and all I can tell you is this: you really should consider a move to Power5 technology and the latest iteration of dynamic logical partitioning. There is a lot more to what IBM has done than meets the eye in the iSeries i5 announcement letters, product brochures, and stories such as the ones I have written.
It takes time for new ideas to sink in, and even longer for companies to figure out the finer details of any new technology. In many cases, a new technology sounds great on paper, and then when you get into the details, you find the devil lurking. With the latest implementation of logical partitioning on the i5 and p5 servers, it is quite the opposite situation. When you look into the details, there are capabilities and functions that are both technically and economically attractive--and ones that you may not have considered. I know that I had not fully thought through some of the new partitioning features that were announced with the "Squadron" Power5 servers, which is why I bothered bringing this up.
What got me off on this tangent is a discussion I had with a colleague about the overhead associated with logical partitions and virtual machine partitions on different architectures. He wanted to know if I had seen any performance information on the partitioning overhead, and I told him I had not seen such information, nor had I really given it much thought. I traded some information in that I explained that, according to customers I have talked to, the overhead of using a virtual machine partition such as ESX Server and GSX Server from VMware is quite high, sometimes as high as 50 percent of CPU capacity. The reason for this is that the VM implemented by the products has to do X86 instruction set virtualization so operating systems running in the VMs think they are talking to a real chip, as well as supply a hypervisor layer that allows multiple partitions to share a single processor or processor complex implemented in SMP or NUMA. These hypervisors in ESX Server and GSX Server allow dynamic allocation and re-allocation of resources. Only by doing both can you trick an operating system running inside a VM into thinking that it has a whole machine at its disposal. In future X64 processors from Intel and AMD, the X86 instruction set virtualization is implemented largely (or maybe even completely) in the chip hardware, which means the overhead of using future ESX Server and GSX Server products will be a lot lower. But it will certainly not be zero--that's for sure.
While I cannot fully prove this, I suspect that IBM has invented a lot of clever tricks in logical partitioning that allow it to have a lot lower overhead than the VM partitioning approach. The reason I believe this to be true has to do with the capacity planning documents I read after trying to answer the question about overhead associated with using logical partitions.
Before I get into that, though, I want to give the IBMers who work on the performance teams in the eServer group, and in the iSeries Division in particular, more than a pat on a back. I think a round of drinks is in order, and I will send Amit Dave, the senior technical staff member for the iSeries (who I talked to for this story), a case of my very own homebrew to show my appreciation to the team. (Yes, Amit, you have to share.) As IBM customers, you may not be aware of this, but other server makers are terrible when it comes to gauging and documenting the performance of their servers. To use a technical term in the computer business, they suck--royally.
When I first started out following the AS/400, business partners used to feed me the System Performance Reference, a super-secret IBM document that showed the performance of all of the gadgets and software features in the AS/400. I have never seen or heard of a similar set of documents from any other server maker, and IBM has made all of its performance documents available on the Web for years now (see the Resources section at the end of this article). Moreover, IBM has always provided multiple performance metrics that span its entire product line and back across many (if not all) prior generations of machines; RAMP-C, CPW, and MCU for the iSeries, rPerf for the pSeries, and LSPR for the zSeries. These performance metrics allow customers to calculate the performance increase of moving between any machine IBM ever sold within a product line. Hewlett-Packard and Sun Microsystems used to offer such numbers for a brief while in the mid-1990s, when they needed to show off Unix performance, and Dell has never offered relative performance metrics across its server lines and likely never will. IBM's xSeries machines do not have relative performance metrics, either, and all I have to say to the xSeries team is this: You know better, and you should learn from the examples of your older siblings.
SMP Taketh, LPAR Giveth Back, Micropartitioning Taketh Again
The documents presented in the Resources section below offer a wealth of information and insight into the use of logical partitions. I am only going to go into three of the more striking effects of partitions as they relate to iSeries performance.
1. SMP scalability takes away processing power, but breaking an SMP box into multiple partitions gives some back. A single processor has a certain amount of raw performance, and adding them together in an SMP cluster to create a bigger single system image requires that some of that processing capacity get used to manage the cluster. There are memory and I/O contention issues as well, which result in degraded performance relative to the magical linear scalability line.
Let's use the i5 line to illustrate this. In the i5 line, the i5 520 with a single 1.65 GHz Power5 processor core activated and with L3 cache turned on has a CPW rating of about 3,300 CPWs. So, in a purely theoretical linear scaling world--provided that the SMP interconnections and memory and I/O bandwidth were not bottlenecks--a dual-core box should have 6,600 CPWs, a four-core box should have 26,400 CPWs, all the way up to a 64-core box with 211,200 CPWs. This, as we know, is not what happens. A two-core box is rated at 6,000 CPWs or 6,350 CPWs (a loss of 5 to 10 percent), a four-core box is rated at 12,000 CPWs (a loss of 9 percent), all the way up to a 64-core behemoth rated at 165,000 CPWs (a loss of 22 percent).
Now, when you chop these Power5 SMP boxes up into pieces using logical partitions, the aggregate, usuable amount of computing capacity in the machine goes up. The reason is simple: the overhead of running the logical partitioning hypervisor is low compared to the increasing overhead of the SMP clustering as processors are added to the box. According to Dave, the amount of performance increase you get from chopping up an SMP box tends to increase as you cut the machine into partitions and will increase so long as you never make a partition smaller than a single processor core. On a 32-way box that IBM tested, cutting the machine into two partitions made an extra 4 percent of CPW power available--for free--to workloads running in those partitions, and cutting it into 16 partitions added 16 percent more aggregate oomph to the box, which was spread evenly across those partitions. IBM's zSeries mainframes have seen a similar 12 to 15 percent boost in carving up SMPs into partitions dedicated at the processor core level.
2. Microparitioning eats up some processor resources. This makes perfect sense if you think about it for a second. In the Power5 systems, the L3 caches are tied to specific cores, which reduces contention and is one of the reasons why breaking up the SMP machine into partitions at the core level boosts the aggregate performance of the box. But once you create a micropartition, you are asking a bunch of virtual servers to start sharing single processors, caches, and access to main memory and I/O bandwidth. This takes overhead, just like running VMware GSX Server and ESX Server does. The rule of thumb at IBM is for customers to allocate at least 10 percent extra CPW power for each micropartition compared to the actual amount of capacity a workload needs, according to Dave. This is a very conservative rule, by the way. It could turn out to be less.
3. Uncapped i5 and p5 partitions allow for a shared pool of partition resources and for that pool to exceed the aggregate maximum capacity requirements for each partition by a huge amount. In plain English, here is what that means. With the Power4 servers, you had to assign a logical partition a specific amount of processor capacity (usually in 1/100ths of a processor), and while you could change it dynamically by hand or by using Barsa Consulting's Automatic Partition Resource Manager (APRM), you had to allocate in a fundamentally hard way even though you could change it. If you had spare capacity in one partition and a need to add capacity in another, too bad for you, you had to change the resource allocations by yourself. This partition type was called "capped partitions" after IBM invented a much more interesting alternative.
With the Power5 machines, IBM invented something called uncapped partitions. Basically, you can still create dedicated partitions that lock in at one or more processors (or even fractions of processors, if you want), which is useful for workload isolation. But with uncapped partitions, you can also designate a shared pool of resources that many partitions can access. Now here's the neat bit: you can massively over-allocate the shared resources that these uncapped partitions tap into.
The idea is that you plan for capacity at peak workloads, and while you may have one or two dozen workloads running in as many partitions at the same time, all of them are not actually using their peak capacity at the same time. So they can borrow capacity from each other. Dave says in many shops, they are able to mix workloads--such as production ERP, Web infrastructure, and development and testing--in a single shared pool of uncapped partitions that has 200 percent, 300 percent, or even 400 percent over-allocation without affecting the performance of the workloads one bit as they run through their peaks and valleys over the course of a day, week, or month.
This over-allocation using uncapped partitions means you cannot put workloads that are getting hammered at the same time in the shared pool, of course. You can't expect everything running in partitions at month end to have access to all of its potential resources--you don't actually have 200 to 400 percent capacity. Keep that in mind. But if you do your workload planning carefully, you can buy a lot less machine than you might think you are going to need to support many concurrent workloads.
This, of course, can translate into big savings--provided you can get flexible, partition-based pricing for your software vendor. This is an area where home-grown code (which has a licensing fee that is called a programmer's salary) has a distinct advantage.
The main "con" in using logical partitions is the complexity of capacity planning. You have to make sure that you really don't over-commit partitions in a real, not virtual, sense. If you need to do more sophisticated capacity planning to implement logical partitions than the back of an envelope, you should check out Midrange Performance Group's Performance Navigator and BMC Software's PATROL for iSeries - Predict. Each IBM geography also has its own technical support team to cope with logical partitioning issues, which you escalate to through your IBM salesperson or business partner. There are other issues as well, such as dealing with the Hardware Management Console (which is apparently getting better but which is still far from simple) and having the hypervisor be a single point of failure in the system. But on the whole, partitioning has benefits that outweigh the drawbacks.
Performance Management for IBM eServer iSeries (this is a performance portal)
i5/OS LPAR Performance on Power4 and Power5 Systems
eServer i5 Uncapped Partition CPU Utilization & Queuing
IBM eServer iSeries Performance Capabilities Reference
iSeries Performance, Version 5 Release 3
Virtual Partition Manager: A Guide to Planning and Implementation (Redpaper)
Logical Partitions on the IBM PowerPC: A Guide to Working with LPAR on Power5 i5 Servers (Redbook)