Java Performance Is OS Agnostic on Power6 Gear
May 12, 2008 Timothy Prickett Morgan
Way back when, in the dawn of time–well, about a decade ago when IBM first caught the Java bug and decided that this would be the language of choice on its commercial servers and their operating systems–the software engineering teams in Rochester, Minnesota, and Toronto, Ontario, worked diligently to take advantage of the 64-bit addressing that OS/400 had and that many Unixes of the time lacked. The idea was simple: If Java was going to be the imposed lingua franca for future application development, then OS/400 would speak it fluently–and fast.
At the time, when the 64-bit memory and symmetric multiprocessing scalability of the AS/400 system was not available in X86 server architectures, and when Windows hated Java and Linux was but a mere toy of an operating system (with aspirations, no doubt), the AS/400 platform could compete toe-to-toe with RISC/Unix iron when it came to Java workloads. This was important because at the time, many of the key application software providers in the AS/400 space–which was driven by application sales and still is to this day–were opting to move from RPG to Java to take their applications to the broader Unix and someday Windows markets.
For a variety of reasons, IBM has consolidated down to a single Java Virtual Machine environment for the OS/400-i5/OS-i platform and its baby Unix brother, AIX, and has not used the single-level storage and other architectural features to differentiate the Java performance on the AS/400 and successor platforms. Which leads us to a place more than a decade later, and it is difficult to distinguish the advantages in terms of the performance of Java applications between the new i 6.1 and its AIX and Linux competitors on the same Power6 iron.
The reason that AIX and i Java performance are nearly identical is simple, and I explained why when i 6.1, previously known as i5/OS V6R1, was launched in late January. (See i5/OS V6R1 and Its Java Enhancements for more on that.) It is because IBM has taken the 64-bit Java Virtual Machine created for AIX and plunked it down inside the PASE AIX runtime environment, which has hooks into the i operating system and for all intents and purposes looks like it is running inside the i platform even if it is really in a stripped down AIX instance sitting inside the i operating system itself. (This is distinct from running JVMs inside AIX inside logical partitions, which is an option IBM could, of course, choose, but that would not look very native and would cause application houses some headaches.) IBM says that this 64-bit PASE version of the AIX JVM has better performance than the so-called “classic” 64-bit JVM inside OS/400 and i5/OS, which itself was slower on entry level iron than the 32-bit JVM that IBM ported to i5/OS several years ago and, oddly enough, IBM has used in the SPECjbb2005 benchmark tests. (I know it is counter-intuitive that a 32-bit JVM should outperform a 64-bit JVM.)
Knowing this, then, it should come as little surprise that a Power6 server has virtually the same performance on the SPECjbb2005 benchmark running i 6.1, AIX 5.3, and Linux 2.6. Linux actually performs a little under par compared to the i and AIX platforms. This conclusion is based on three benchmarks that IBM ran on the Power 570 server with its two operating systems and contrasted with the machine running Red Hat Enterprise Linux 5.1, the latest release from that company.
The SPECjbb2005 benchmark is administered by the Standard Performance Evaluation Corporation and is basically a Java implementation of the Transaction Processing Council‘s TPC-C online transaction processing test with the substantial (some would say ridiculous) I/O hardware requirements of the TPC-C test removed. Basically, it converts a system-level benchmark test into one that stresses the processors and main memory more than its does disk and network subsystems.
On its most recent test on the Power 570 box, IBM loaded up i 6.1, with its integrated DB2 for i database, and the Java 1.6-compliant 32-bit JVM (and, to be precise, a variant of the JVM that will not ship until June of this year). This particular box had two chasses, for a total of eight 4.7 GHz Power6 cores, and was configured with 32 GB of main memory. Each Power6 core has 4 MB of L2 cache and each pair of cores on the Power6 die shared a 32 MB L3 cache. To show you how little I/O is stressed in this box, the machine had a single 73 GB disk to run the test, and used the Integrated File System built inside OS/400, i5/OS and now i since V3 more than a decade ago. This particular box was able to handle 345,809 business operations per second (BOPS) across four JVMs, or 86,452 BOPS per JVM.
Last June, when the Power6-based 570 was first launched running AIX, IBM did a similar SPECjbb2005 test on the same piece of iron but running AIX 5.3 and using the JFS file system and an older Java 1.5-compliant JVM. This setup was able to handle 346,742 BOPS on the test, or 86,686 per JVM. When equipped with RHEL 5.1 and the newer Java 1.6-compliant 32-bit JVM, this same Power 570 server could handle 335,424 BOPS, or 83,856 per JVM. That works out to 3.3 percent lower performance than the AIX 5.3 configuration and 3 percent lower than the i 6.1 configuration. Basically, the difference makes no difference.
Because Unix and Linux shops are interested in scalability on Java workloads, IBM has also tested a heavier configuration of the Power 570 with 16 cores activated using the same software stacks, and to prove AIX’s scalability, IBM also showed SPECjbb2005 tests for Power 570 machines configured with two and four cores and on a mammoth 64-core Power 595. Presumably, i 6.1’s Java performance will not be significantly different at least as far as the SPECjbb2005 test goes on identical configurations. Here’s how it looks graphically:
Two things to note. First, notice how AIX 6.1 shipped last fall but IBM has not run benchmarks with it? Peculiar, isn’t it?
Second, Java scalability is pretty linear, so long as there are not disk and network I/O issues. On the Power 570 machine, the performance per core only goes down slightly, which is the benefit you get by chopping up a Java workload so it can run on multiple JVMs on a machine rather than trying to get one giant JVM to span a single machine. The JVM approach to application runtimes means you can circumvent some of the overhead of the symmetric multiprocessing (SMP) scalability on this server, but you pay the price by having to use Java instead of RPG or C and by having to support complex and, maybe for some, unfamiliar Java environments. Because of the much more efficient SMP scalability in the Power 595 machine, the larger main memory, and other system-level differences, the Power 595 actually offers more Java performance per core on the SPECjbb2005 test than the Power 570, at least when running AIX 5.3. That Power 595 machine was equipped with AIX 5.3, the newer Java 1.6-compliant JVM, and the JFS2 file system; the machine used 64 of IBM’s 5 GHz Power6 cores and had 512 GB of main memory and a single 146 GB disk. This box could process 3.44 million BOPS on the SPECjbb2005 test, or 107,359 BOPS per JVM.
Presumably, a Power 595 box running i 6.1 would perform more or less the same; I am a little less comfortable making the same assertion about RHEL 5.1, but logically, there is no reason why Linux can’t be in the same ballpark even on this big iron, seeing how it is really just supporting a large number of 32-bit JVMs. Making RHEL span all the 64 cores in the machine and supporting a giant database is another matter entirely, and as we know, the DB2 implementation for i5/OS has not historically been able to drive as many transactions on benchmark tests as the same iron running AIX and DB2 Universal Data Base.
Right now, I am building up the data and doing the analysis to make comparisons of Power6 iron to their Power4 and Power5 predecessors as well as to alternative platforms from other server makers. The SPECjbb2005 Java benchmarks will feature prominently in those comparisons, which will also examine other workloads, too. Stay tuned.