IBM’s Competitive Analysis For The Power10 Midrange Machine
August 29, 2022 Timothy Prickett Morgan
For as long as there has been a Power architecture, way back in 1995 with the advent of the custom “Cobra Muskie” PowerPC AS chips, the rule of thumb for performance is that the Power core did twice the work of an Intel core designed for server workloads. This was true in 1993, when Intel jumped in the server racket with gusto, and it is true 31 years later as the entry and midrange Power10 machines in the same performance class as the top-end X86 iron from Intel and AMD are starting to ramp.
This performance gap on cores is the result of a consistent set of design choices that IBM makes because its core designs do not have to span PCs and servers as Intel’s designs do. This is also one of the reasons why IBM did not aggressively pursue Power-based PCs as a way to expand the market for the Power architecture. Motorola, more than IBM, sold chips into the Apple PC base during the years when Apple was using Power chips, and IBM used more expensive transistors with higher performance that also were more expensive and generated more heat in the pursuit of more oomph for enterprises running big databases and ERP systems and HPC centers running simulation and modeling.
The fun bit, from an architectural point of view, is that as each Intel Xeon design got wider and deeper and more complex, IBM has always figured out a way to rearchitect the Power instruction set to add functions not available in the X86 architecture – decimal math, or what I call money math, is one of many examples that springs immediately to mind – or to make each core do more work per clock. All vendors do this, to a greater or lesser degree, but we have been watching IBM rely on more than just clock speed as a differentiator for a lot longer than when Dennard scaling (the ability to use process shrink to crank CPU clock speeds) stopped around 2000 or so and when Moore’s Law decreases in transistor pricing stopped, which was at the 14 nanometer node four years ago.
It used to be that you could cut the price of a transistor in half every two years as the transistor size was also cut in half, which meant you could add twice as much function or capacity to a CPU for the same money, and you got Dennard scaling as a benefit so you could also get more performance. But IBM stopped being a top-notch foundry somewhere in the late 2000s, and fell behind on process nodes and therefore had to lean more heavily on architectural distinction to differentiate its Power CPUs. And like clockwork, the chip engineers at IBM have done this again with the “Cirrus” Power10 processor.
Nothing demonstrates this better than the competitive analysis that Big Blue has done comparing the midrange Power E1050 system to its peers in the Intel X86 world.
I have made no secret of the fact that I have been annoyed that the midrange Power8, Power9, and now Power10 machines did not support IBM i while the entry and high end Power8, Power9, and Power10 machines did. I will be figuring out what the extra cost is that IBM i shops have to pay by being forced into Power 870 and Power 880, Power 980, and now Power 1080 machines when a Power E850 and Power E850C, Power E980, and Power E1080 box would suit better. But I am not going to do that today. But I will say this. This competitive analysis demonstrates that the shops with the smaller enterprise-class machines need to justify their purchases of this platform against an X86 platform from Intel running Windows Server or Linux and an Oracle or Microsoft database. And defending a higher-end Power E980 or Power E1080 against those X86 machines is tougher because these machines are considerably more expensive per unit of performance than the midrange Power E950 and Power E1050. IBM is making it easier for big X86 iron to take out the many thousands of reasonably large IBM i and AIX sites, and that is not very smart.
With that, let’s get into some performance comparisons between Intel Xeon SP servers and the new Power E1050.
Let’s start with the SPEC CPU 2017 tests, and look at the SPECrate2017 integer throughput tests, which show the base throughout and peak, tuned throughput of processors doing integer calculations. This test is the gatekeeper to who can sell CPUs and who can’t, kind of like the ante to play CPU blackjack in the casino that is the modern datacenter.
What you can see from the chart above is that a four-socket Power E1080 core running at a base 2.95 GHz has about 2.2X the SPEC integer performance of a modern “Ice Lake” Xeon SP core. IBM is really cramming eight Power10 chips into this four-socket Power E1080 machine, and because the yields are pretty low, with only 12 of the 16 cores on the Power10 chip usable in the most capacious version of the Power E1080 machine. Still, IBM can deliver 96 cores across four processor sockets that drive a SPECrate2017 integer peak rating of 1,580, which is 1.9X better than the four-socket Inspur NF880M6 platform that has a quad of Xeon SP-8380H processors with 112 cores and is rated at 846 on the SPECrate2017 integer test. You have to move to a Superdome Flex 280 machine from Hewlett Packard Enterprise with eight of these processors (using a custom HPE chipset) to reach the same performance level as the Power E1080, and it will have 224 cores across those eight sockets. If the systems and application software used in a real-world setting is based on a per socket or per core scheme, the IBM platform has around a 2X pricing advantage. And in many cases, the software costs 10X the cost of the hardware for a given amount of throughput or users, so this is a big deal, as we showed in the story last week on the competitive analysis of IBM’s entry Power10 servers.
On the SPECjbb test, which is roughly based on the TPC-E OLTP benchmark but one that stresses the CPU and is not I/O bound in any way like the real TPC-E test, the story is much the same:
IBM’s 96 cores on the Power E1080 beats the 112 cores of the Inspur four-socket Ice Lake Xeon SP machine by a factor of 2.1X per core and 1.8X at the system level. (For some reason, IBM did not offer performance figures of the HPE Superdome Flex 280 machine, probably because there are no test results for this machine on SPEC JBB.) The IBM system has 1.8X the throughput on SPEC JBB than the four-socket Inspur box.
For the SAP Sales and Distribution (SD) benchmark, which has been around since the late 1990s to gauge the relative performance of systems running SAP ERP code, the gap is a little bit wider, and we do have numbers for the four-way Inspur and eight-way HPE Superdome Flex 280.
On the SAP SD test, shown above, the performance of the Power E1050 core running at a base clock speed of 2.95 GHz is about 2.3X that of what a core in the four-socket Inspur machine delivers and about 2.6X that of what a core in the eight-way HPE machine delivers. Some of the performance of the X86 cores is lost in the NUMA overhead jumping from four to eight sockets due to latency issues across a larger NUMA domain. You are only as good as the rate at which you can feed data to NUMA domains on any multi-core, multi-socket system.
IBM did not do the math in this chart, but at a system level, the Power E1050 machine was 1.93X faster than the Inspur system on the SAP SD test, and had 9.3 percent more oomph than the eight-way HPE Superdome Flex 280 system.
That is a pretty consistent performance advantage. It is very hard to figure out the price/performance of these configurations, but we are digging for the numbers and will report back when we have something.
Suffice it to say, even if the X86 iron is cheaper by 50 percent, it will take twice as much to do the same work, and the software will be 2X more expensive, more or less. So it is not a win for the X86 platforms here.
It remains to be seen if IBM can get new companies to see this and move into the Power Systems fold.