How IBM Stacks Up Entry Power8 Machines Against X86 Iron
September 29, 2014 Timothy Prickett Morgan
We are on the verge of an expanded Power8 system line from IBM and rival Intel has launched its next-generation “Haswell” Xeon E5-2600 v3 processors. But the performance specs that compare the Power8 iron to X86 and other alternatives in the midrange are only just starting to trickle out of Big Blue. So now is a good time to take a look and then brace ourselves for more comparisons in the coming months.
In past issues of The Four Hundred, I have showed how the scale-out variants of the raw Power8 processors rate in terms of raw Commercial Processing Workload (CPW) performance compared to prior Power7 and Power7+ generations, but the performance figures I have gotten my hands on show that the Power S814 with one Power8 processor offers between 40 and 50 percent more CPW oomph than the Power 720+ using Power7+ chips and the Power S824 (which has two sockets) offers between 40 and 90 percent more CPW transactions processing. The price/performance story is a bit more complex at list prices for the raw processors, as I showed in my comparison of Power7+ and Power8 entry machines in this story from late May and further analyzed with a look at the cost of base processor prices, IBM i licensing, and Software Maintenance in this follow-on story from July.
IBM’s competitive analysis focuses on most of the usual suspects when it comes to benchmarks, and as far as I know Big Blue has not run TPC-C online transaction processing tests on its machines and it doesn’t look like it will if it hasn’t by now. No one has submitted official TPC-C results since March 2013, in fact, and IBM doesn’t run the TPC-E test on Power Systems. IBM is probably steering away from the TPC tests because they are expensive to run and require list pricing for all the gear used in the tests. But that is just a guess, since IBM does not explain itself on these things.
To get a quick thumbnail of how a two-socket Power S824 using the top-end Power8 chip that is currently shipping–a 12-core processor running at 3.52 GHz–compares to a two socket server using the top-end Intel Xeon E5-2697 v2–which has twelve cores spinning at 2.7 GHz–IBM offers customers and partners this handy little table:
On the SAP Sales & Distribution (SD) test, the Power S824 system is running AIX 7.1 and DB2 10.5 and has an average transaction response time of just under one second. The Xeon E5 systems is a Fujitsu RX300 S8 system running Windows Server 2012 Standard Edition and SQL Server 2012. Generally speaking, at this level of NUMA scaling and threading, IBM i plus DB2 for i should perform more or less equivalently to the AIX and DB2 combination. The AIX team was always more aggressive about tuning the TPC-C test than the IBM i team, which accounted for the difference in performance of these two Power Systems platforms. AIX scales a lot further on NUMA clustering and thread count than does IBM i, so for larger systems the performance spread is larger.
Just for fun, IBM ran the Oracle E-Business Suite 12.1.3 payroll application benchmark on the Power S824 system with only one processor populated and compared it to Cisco and Hewlett-Packard machines using Xeon E5 processors. Here is what that looked like:
You can see various Oracle benchmarks at this link. This particular test used the Payroll Batch Extra Large benchmark kit, and as you can see, the Power S824 system could cut over 1 million paychecks per hour. (No public company has that many employees, of course.) A Cisco rack server with 24 Xeon cores could do nearly the same amount of work, but if you look at how much work per core is done–which makes some sense because many software providers charge per core for their wares–then the Power box comes out way ahead, being able to process 90,909 checks per hour per core compared to 42,402 for the Cisco machine. The HP BL460c blade server had fewer cores running at a higher clock speed, so it could do 49,345 checks per hour per core, but it could only do a total of 789,515 checks per hour across its two sockets.
With Oracle being a dominant supplier of application software, IBM also wanted to show off how the Power S824 did against Sparc T4 machinery from Oracle running Oracle’s own Siebel CRM 126.96.36.199 software, and here is how that lined up:
You can see the listing of Siebel CRM benchmark results at this link. The Power S824 has a lot less cores deployed to run this test, where it can support 50,000 users whacking against the Siebel CRM benchmark. If you do the math, that is 8,333 users per core for the Power8 box. The Oracle Sparc T4-2 two-socket machine, which has a total of 16 cores, can support 40,000 Siebel CRM users, but it can only do 2,500 per core. And the Cisco UCS B200 M3 two-socket Xeon blade server can only support 10,000 users, and when you divide by the cores deployed, that works out to only a meagre 625 users per core.
You can guess that IBM is going to be stressing performance per core and the leverage that gives in terms of cutting software license costs. In some cases. Remember, Oracle can set the scaling factors for pricing across various chip architectures at whatever level it wants. And it tends to favor its own X86 and Sparc platforms over Power and Itanium alternatives, which should come as no surprise.
The SPEC integer and floating point tests in the original table are intriguing, since they give a more general feel for relative performance across architectures. But with the Haswell Xeons, Intel has just tweaked its integrated math units to double their performance, so this gap will close considerably. It is harder to say how Haswell chips will fare on the SPECjEnterprise2010 Java benchmark. In this particular comparison above, IBM is running DB2 10.5 on AIX 7.1 on the database server on the Power S824 machine and WebSphere Application Server V188.8.131.52 and AIX on a separate Power S824. The Ivy Bridge Xeon system tested is a pair of Sun X4-2 two-socket servers running Oracle‘s clone of Red Hat Enterprise Linux, called Oracle Linux, and the Oracle 12c relational database and WebLogic Server Standard Edition 12.1.2.
Again, it remains to be seen how the Haswell Xeons will ramp up performance, but early inside benchmarks from Intel indicated that the Haswell E5-2699 v3 with 18 cores running at 2.3 GHz would deliver about 50 percent better performance on the SAP SD test compared to the Ivy Bridge outlined above. SPEC integer and floating point numbers are not expected to jump so high, according to Intel’s early internal tests, but the rules of the SPEC tests do not allow Intel to publish its inside benchmarks. We have to wait until others do their tests and publish them on the SPEC site.
The point is that even with a 50 percent increase in core counts, the top-end Xeon E5 chip is not going to be able to match the Power8 configuration with 50 percent fewer cores. The IBM cores run at 3.52 GHz and the Intel cores run at 2.3 GHz, so that is part of it. But clocks are obviously different across chip architectures, and IBM has always been able to get more work out of a clock than Intel. For one thing, the Power8 chip has eight threads per core compared to two for the Xeons. Going from one thread to eight roughly doubles performance for workloads like Java application servers and database management systems that like cores and have lots of cache memory. Moving from two threads to four boosts performance by about 40 percent, so that first virtual thread pair is the big change. Moving from two to four threads gives another 30 percent, and from four to eight yields another 7 percent. So you get diminishing–but important–returns as you scale up the thread count. It seems unlikely that IBM will ever go to 16 threads per core, given the decreasing yields.
A thorough analysis of these benchmarks would mean trying to configure up systems and seeing what bang for the buck they deliver. I will set about doing that, provided I can find all the right configuration and pricing information. Vendors are a bit cagier about this than they used to be in days gone by.