First Pass On Power8 Enterprise Performance
November 10, 2014 Timothy Prickett Morgan
A little over a month ago, IBM launched the enterprise-class Power8 systems, the Power E870 and Power E880. The machines draw on technologies from the prior enterprise-class Power 770 and 780 machines and their predecessors, but also have some advanced scalability and resilience features drawn from the top-end Power 795. Big Blue provided a lot of insight into the architecture of the machines on launch day on October 6, but some important details were missing. One of them was what the expected performance of the systems would be.
As I told you back in October, I would hunt down the performance data on the Power E870 and Power E880, which IBM said would be available closer to the time the company started shipping the initial entry configurations of the machines. The systems will start shipping on November 18, and as I have said before, I don’t think that IBM had intended to ship these two systems in 2014 at all and had originally planned to ship them in early 2015, perhaps in April or May concurrent with some big Power Systems events like the big COMMON spring conference in April or the Edge2015 event, which runs in in the second week of May.
The roadmaps I had seen did not have anything but systems with dual-chip Power8 modules in them, which I believe are easier for IBM’s–well, now GlobalFoundries’–chip plant in East Fishkill, New York, to make than the single-chip modules used in the Power E870 and Power E880. The DCMs have six cores and two of them are put onto a single Power8 package. The SCM has a maximum of 12 cores, just like a DCM, but it is one die and is aimed at big shared memory systems, while the DCM has a bunch more I/O coming off it to provide more I/O oomph for offloaded accelerators that IBM is pushing as it tries to take on Intel‘s Xeon E5 processors in the distributed computing arena. The SCM is a little less than twice the size of the pair of DCMs and is therefore inherently harder to get yield on, particularly on a new process like IBM’s 22 nanometer processes. If IBM still had the game console business, it would have ramped up that 22 nanometer process on these chips first and then moved to Power8, and I think it would have been able to do it perhaps six months ago and maybe earlier, thereby getting out ahead of Intel in terms of raw compute. As it is, Intel probably has the raw computing lead again, but it is hard to say until we see a lot more performance data.
So far, the only performance data I can get my hands on with the Power E870 and Power E880 systems is the usual Relative Performance (rPerf) benchmark for AIX workloads and the Commercial Performance Workload (CPW) test for gauging the relative performance of IBM i machines. Both are transaction processing workloads that are roughly based on the TPC-C test, with some tweaks and changes that allow IBM to use it without actually running audited TPC-C benchmarks on each and every combination of AIX and IBM i machines using all of its possible processor SKUs.
The interesting bit is that the data has some weird twists in it. First, IBM has raised its rPerf ratings for the Power E870 machines in the past two months while holding the CPW ratings the same for the exact same machines. This seems peculiar to me, and it is yet another indication that IBM spends a lot more time tuning up DB2 running on AIX to show larger leaps in performance and does not do the same things (or cannot do the same things) for the DB2 for i database running inside the IBM i operating system. The Power Systems facts and features report that IBM put out also shows the Power E880 machine using 4.02 GHz Power8 SCMs with eight cores on a die active instead of the 4.35 GHz chips with eight cores that IBM said were in the box back on October 6 when the Power E880 machine was launched. So the rPerf ratings actually went down for the 32-core and 64-core variants machines with one or two nodes in the Power E880, thus:
The specs in the IBM Power Systems Facts and Features document, which you can see here, have the clock speed at 4.35 GHz for all Power E880 models using the eight-core Power8 SCM, so I think the 4.02 GHz is just a typo in the performance data at the bottom of the chart. I have marked these in red to show that I do not believe these numbers to be true, particularly since I verified the performance and the clock speed of the IBM i 7.2 machines in the latest IBM Power Systems Performance Capabilities Reference, which also just came out. That does not, however, explain the drop in the rPerf performance for the two Power E880 configurations, with one or two nodes, that were tested. A 14.7 percent performance drop is a big deal, and one that presumably applies to AIX and Linux workloads alike on the Power E880 machine.
The good news for IBM i users is that they are really mostly concerned with Power8 performance running IBM i and its DB2 for i database.
Incidentally, IBM has not provided performance specs for the three-node versions of the Power E880, but you can create such a machine. It just ran out of room for the column because it did not want to drop down the font size any smaller. For which, in one way with my nearly 50-year-old eyes, I am grateful.
I am personally dying to know what clock speed IBM is going to ramp up the full-on 12-core Power8 SCM to on the fully extended Power E880 with three and four enclosures. The charts I have seen show a 128-core Power E880 being able to deliver 2,865 rPerfs of performance on a fully configured system, which is a tiny bit more than the 2,812 rPerfs for a 256-core Power 795 that it replaces in the product line. (That is different from the 2,978.2 rPerf rating for a Power 795 that is in the February 2013 performance guide, which is a peculiar reduction in the ratings to say the least.) And a 192-core Power E880 is rated at around 4,000 rPerfs in the chart. The ratio between rPerfs and CPWs is all over the place and very inconsistent across the Power E870 and Power E880 machines that IBM has given data out for thus far, and I would not want to make any precise estimates based on this. IBM i was never stretched across the 256 cores and 1,024 threads in a full Power 795, and the largest machine IBM provided test results for was a Power 795 with two 32-core partitions, which was rated at 399,200 CPWs across those two partitions. That would seem to imply that you could get four times the throughput out of the full machine, or about 1.6 million CPWs, if you carved it up into eight 32-core partitions. (IBM i cannot span the cores and threads of a full Power 795 with a single IBM i operating system image; it peaks out at 128 threads.)
The Power E870 and E880 machines are not really aimed at customers with Power 770+ and Power 780+ machines that were just launched in February 2013. Those who are using Power 770 and Power 780 machines from April 2011 are the natural buyers because it has been nearly five years since they bought a machine. Depending on the processor speed and count, the Power 770 with the Power7 chips running at 3.1 GHz or 3.5 GHz deliver from 73,100 CPWs with two six-core processors all the way up to 292,700 CPWs for a 64-core machine with all eight processor sockets full. Depending on the machine, you can get about twice the performance per core in the Power E870 and Power E880 machines compared to Power7+ machines. That should presumably mean that customers can do more work on each core and therefore cut back on their IBM i license costs and, where appropriate, third party software costs. But it rarely works out that way I suspect. In a cloudy world, maybe it should, though.
I will be pondering that in a future essay and how straight CPW-based pricing for software might make it a lot easier for everyone. In the meantime, I will be looking for more details on the Power8 line and do the normal price/performance analysis.