Power7 Boxes Show Good Java Oomph Versus Other Iron
July 12, 2010 Timothy Prickett Morgan
Since the Power7-based machines were launched in February and March, The Four Hundred has been digging around for all of the performance information we can find on the new machines, particularly running the i 6.1 or i 7.1 operating system and filling in some gaps by making some estimates for online transaction processing on the Power7 blade servers. This week, I wanted to give you a sense of how the Power7-based machines stack up on Java application performance, in this case gauged by the SPECjbb2005 application serving test.
In March, in the wake of the launch of the midrange Power 750, 770, and 780 servers, I walked you through IBM‘s initial benchmarks for the Power 750 and how this box compared to prior generations of Power5, Power5+, Power6, and Power6+ machines running i, AIX, or Linux. A month ago, I did a review of more recent SPECjbb2005 Java benchmarks that IBM did on Power 770 and 780 machines as well as on the Power 702 blade servers. Where necessary, I used the Commercial Performance Workload (CPW) that Big Blue uses to gauge the Java performance on i-based machines where IBM had only tested an AIX and/or Linux Power7 machine. (Yes, I find the third-class citizen status of the i platform very annoying.) As best I can figure, the i platform takes somewhere between a 10 to 20 percent performance hit compared to AIX, and as Alex Woodie discussed in this newsletter back in June, part of the reason is that the new 32-bit and 64-bit JVMs that were used in the i 6.1 and i 7.1 SPECjbb2005 tests runs in the PASE AIX runtime environment and there’s a performance penalty.
For fun this week, I went through the most recent SPECjbb2005 test results from the Standard Performance Evaluation Corp and picked out interesting machines based on the latest Xeon 5600 and 7500 processors from Intel and Opteron 6100 processors from Advanced Micro Devices, plus whatever other interesting boxes I could find in the list, including Oracle‘s Sparc T5440 server and some interesting clusters from ScaleMP, 3Leaf Systems, and Silicon Graphics. You can see the full listing of machines tested using the SPECjbb2005 benchmark here; there are many, many more machines that have been tested than I have put into my monster SPECjbb2005 performance comparison table.
In the table I created, the machines above the light blue line are for SPECjbb2005 test results I culled for the March story, and those above the red line were outlined in the June story; all are Power-based systems. The machines detailed below the red line are various non-Power platforms. And the machines below the black line are for exotic cluster or supercomputer machines that have been put through the SPECjbb2005 tests, which I add for the sake of amusement as much as to point out there are other ways to skin the JVM cat besides building traditional SMP machines.
The first thing you will note is that the six-core Xeon 5600 processors running at 2.93 GHz yield about the same Java performance per core–as gauged in SPECjbb2005 business operations per second (BOPS)–as the Power7-based systems using 3 GHz and 3.10 GHz chips. The Xeon 5600s can only be put into two-socket systems, so they are limited in terms of system scalability, but the Power7 chips at eight cores (and four threads per core compared to the Xeon 5600’s two per core) have inherently more scalability and the clock speeds can be pushed up to 3.86 GHz or 4.14 GHz. Moreover, on the current machines, you can put 32 or 64 cores in a single system image, which means the Power7 machines scale a lot further than the Xeon 5600 platforms can.
If you jump to machines using Intel’s higher-end, eight-core Xeon 7500s, such as the four-socket BladeCenter HX5 or System x38850 X5, the Intel machine can get up to 64 threads supporting Java, but the chips in these machines, the L7555 low-voltage and X7560 standard parts, only run at 1.87 GHz or 2.27 GHz, respectively. An eight-socket Power 780 running in TurboCore mode with half its cores turned off has the same 32 cores, but 64 threads and spins at 4.14 GHz, yielding another 50 percent more or so Java performance compared to the IBM System x rack server. Granted, that Power 780 is four times as large physically and heaven only knows how much more expensive. And an eight-socket Xeon 7500 machine, as Fujitsu shows with its PrimeQuest 1800E box, can do over 3.3 million BOPS on the Java benchmark, a little more than what a Power 780 running flat out in TurboCore mode can do.
But IBM has another Java scalability trick up its sleeve. Java lots threads, cores, and cache more than anything else, so backstepping to a Power 770 gets you 64 cores running at 3.1 GHz or a Power 780 running at the standard 3.86 GHz clock speed with the similar 64 cores. Such boxes can hit 5 million BOPS. And thus far, there is no Xeon 7500-based machine that can scale beyond eight sockets, although I keep hearing such machines are in the works. IBM can certainly build one, but must be sorely tempted to not do it and protect its Power-AIX business, and ditto for Hewlett-Packard, which is in no hurry to push Xeon 7500 machine beyond eight sockets and still doesn’t have its ProLiant DL980 G7 machine to reach eight sockets in the field. HP does have a four-socket ProLiant DL580 that it just launched last month and will ship this month. As we go to press, this machine has not been tested, but it is hard to imagine it will perform substantially differently from the Fujitsu PrimeQuest box.
Oracle’s T5440, a four-socket box using the Sparc T2+ processor running at 1.6 GHz and sporting eight cores and 32 threads per chip, doesn’t do horribly on the Java benchmark, but it needs to be refreshed and soon to compete. (And it is about to have its core count doubled up to 16 and its clock speed nudged to 1.67 GHz, if Sun Microsystems’ old Sparc roadmaps are any guide.) The Sparc T5440 could handle 814,380 BOPS, and the upcoming machines using the “Rainbow Falls” Sparc T3 should do about double this, or close to 50,000 BOPS per core. That’s as good as the Xeon 7500 is doing at its low-voltage tier with the L7555 part.
The Opteron 6100s are really two six-core processors jammed into a single package and then crammed into two-socket or four-socket motherboards with AMD chipsets and improved HyperTransport 3 interconnects between processors, memory, and I/O. Socket for socket, AMD can put 50 percent more cores into a socket than Intel, but doesn’t have HyperThreading like Intel does. Given the relatively low clock speeds of the Opteron 6100s compared to the Xeon 5600 and Xeon 7500 machines it competes with–we’re talking under 40,000 BOPS per core–Intel has something of an advantage. But AMD is compensating with more cores (because HyperThreading doesn’t help all workloads and hurts some) as well as much lower processor and chipset prices. The question is whether or not IT shops will go with the bang for the buck advantage AMD is offering (which is nowhere near the advantage AMD was offering from 2004 through 2007 when Intel’s Xeon chips were just awful). They may just pay a premium for Intel’s chips and leave AMD out in the cold.
The ScaleMP machine, which is a cluster of 16 older Dell PowerEdge servers using the vSMP Foundation virtual SMP systems program, yields 128 cores and 256 threads and nearly 7 million BOPS. Moving to 16 more modern Dell boxes using the Xeon 5600s would create a machine with 192 cores and 384 threads that could, I estimate, hit about 10 million BOPS. That’s two Power 780s with TurboCore turned off and running at 3.86 GHz. Or perhaps half of a Power 795, if you want to look at it that way. And very likely for a lot less money. And quite a bit more risk, since ScaleMP is not as known a quantity as the SMP electronics inside IBM’s Power Systems.
The 3Leaf Systems machine is a cluster of 16 machines, in this case last year’s Opteron 8400s, that implements the virtual SMP in special ASICs created by 3Leaf instead of in software as ScaleMP does. The machine can hit 5.5 million BOPS, which is impressive performance and just a tiny bit ahead of the 128-socket, 256-core Altix 4700 supercomputer that SGI tested when it was bored a few years back. I am not saying that SGO doesn’t have customers running Java on its supers, but I am sure if they told me who they were, they’d have to kill me.
One more thing: When I compiled the SPECjbb2005 benchmark results, I did not notice that on the Power7 boxes, in some cases the AIX version of a machine had a lot more memory than a machine tested to run i 6.1 or i 7.1. An intrepid reader of The Four Hundred correctly pointed this out and intimated that this might account for the performance difference and said further that this would obviously give the i version of the box a big price/performance advantage. After nearly having a heart attack for being so stupid, I looked at the tests a little more carefully and noticed that the number of JVMs and the heap sizes are the same on the i and AIX machines, and this, it seems to me, is what is more important as far as Java performance goes.
The Power 780 that IBM tested with the SPECjbb2005 benchmark topped out at 512 GB of main memory (a quarter of its theoretical maximum once IBM ships 256 GB memory cards later this year), and was probably the same box that IBM used to do its TPC-C online transaction processing benchmark tests (which I told you about here). I’ve got a call in to IBM to see if this extra memory really does anything on the SPECjbb2005 tests. I have no doubt that it would on the much heavier TPC-C OLTP test, but suspect that on machines in the same architecture with the same core counts and clock speeds, the JVM heaps dedicated to run Java apps are more important than the total main memory once you have enough memory to run the operating system.
On the i, AIX, and Linux Power 780s tested by IBM, both had 32 JVMs, each with a dedicated 2,560 MB heap, for a total of 80 GB of heap memory. The i box had 256 GB of total main memory, with 176 GB left over for i 7.1. The AIX and Linux machines had the same 80 GB of Java heap memory allocated out of total of 512 GB of memory, with a huge 432 GB left over for AIX or Linux. As far as Java apps are concerned, these Power 780 machines were identical, excepting one was running JVMs in native AIX and the other was running them in PASE. If the memory were much tighter–say only 96 GB on an i box and 128 GB on an AIX or Linux box, I think this would probably make a difference. But maybe not even a big one.
I’ll let you know what IBM thinks about all of this.