Power8 And The Potential Oomph In Midrange And Big Boxes
September 30, 2013 Timothy Prickett Morgan
In last week’s issue of The Four Hundred, I did a little thought experiment about how the Power8 processors from IBM would affect the entry part of the Power Systems market. I also talked about how a 12-core processor spinning in excess of 4 GHz was a lot of machine for an entry IBM i shop running a database and ERP applications. But, the good news is that IBM wants to sell Power-based systems more effectively against X86 machinery, and that means the performance is going up and the price per unit of performance is coming down.
But what about the midrange and the high-end of the Power Systems product line? Is IBM under the same pressure here as it feels it is under in the volume two-socket space that dominates server shipments these days? Is Intel‘s desire to position its Xeon E5-4600 and Xeon E7-4880 for four-socket boxes and Xeon E7-8800 for eight-socket machines against Unix and proprietary machines using RISC or Itanium processors enough to spur Big Blue into getting aggressive in the midrange and high-end, not only protecting its territory but maybe–just maybe–in this era of big data and this new fad of in-memory processing, actually expanding the footprint of shared memory machines?
Well, with the Power8 chip, that certainly could happen. It is a question as to how far IBM wants to push processing and memory scalability in IBM i, AIX, and Linux.
Let’s start with the chip itself, and then weave in a few new facts I learned last week. To recap. The Power8 chip is implemented in IBM’s 22 nanometer copper/SOI/high-k metal gate processes, and that will bring it on par with the processes that Intel is using with its latest “Ivy Bridge” generation of Xeon processors. The workhorse Xeon E5-2600 v2s have already shipped and I told you all about them already. The low-end Xeon E5-2400 v2 chips for low-cost two-socket machines have not yet come out, and neither have the Xeon E5-4600 v2 chips for four socket boxes. Intel has not discussed the schedule for these, but has said that it will ship the Xeon E7 v2 chips to server makers before the end of the year. It is reasonable to expect these big engines to appear in Xeon systems early next year, and very likely before IBM gets its first Power8 systems out the door around the middle of 2014. The fact that IBM has a chip using a similar transistor wire size as Intel at roughly the same time is a good thing, but now it comes down to how the systems using the Power8 distinguish themselves from the Xeon competition.
A 12-core Power8 chip running at a design target speed of 4 GHz is a formidable beast. According to a report published by The Linley Group, called Power8 Muscles Up For Servers, the Power8 chip has over 3 billion transistors and will clock as high as 4.6 GHz. The Power8 has 96 MB of embedded DRAM L3 cache, larger data caches, and a slew of tweaks that will allow a Power8 chip to have on the order of 2.5 times the performance of an eight-core Power7+ processor running at the same 4 GHz clock speed. So, you will be able to get around 154,000 CPWs of oomph in a socket with 12 cores in an entry machine compared to around 61,500 CPWs with a single socket in a Power 720+ or a Power 740+ system using eight cores. (Commercial Performance Workload (CPW) benchmark test that IBM uses to measure the relative speed of different configurations of its Power Systems iron running the IBM i operating system.)
Generally speaking, clock speeds are higher in midrange systems, and get even higher in the very large Power 595 and Power 795 machines, which have multichip module (MCM) packaging that is designed to cram multiple processors into a small space for NUMA shared memory across the chips. The packaging allows for the heat to be removed efficiently from these chips, which can run as hot as 250 watts according to estimates made by The Linley Group. They estimate that in smaller systems, the Power8 will be in a 130 watt thermal envelope, which is as hot as the hottest Xeon E5 2600 v2 chip from Intel with a dozen cores and 30 MB of L3 cache.
The other bit of data that IBM has released about the Power8 chip is that single-threaded work will run on the order of 60 percent faster on Power8 than it did on Power7. (Not Power7+, but Power7.) This is due to changes in the instruction units and caches, and substantially higher bandwidth at all levels of the cache memory hierarchy. (I went into details about the chip design in Power8 Processor Packs A Twelve-Core Punch–And Then Some if you want to drill down more.)
Being able to get single-threaded work done 60 percent faster is a very big deal, particularly for those batch jobs that still linger in the data centers of the world (and will never, ever go away). But that throughput increase, which comes from having 50 percent more cores, twice as many threads per core, and just a huge amount of memory bandwidth is what is going to make midrange and high-end boxes based on Power8 chips potentially very popular.
Here’s the interesting bit about the Power8 chip: it has one more link for NUMA clustering than Power7 and Power7+ did. To be precise, the Power7 had five links to maintain cache coherency across up to 16 processors without requiring an external chipset. (This is how you make a Power 770 or Power 770+, and it was how you made a Power 570 and Power 570+ using the Power6 and Power6+ chips as well.) With the Power8 having six serial links for lashing processors together, The Linley Group speculates that IBM could create a 48 processor complex without needing an external NUMA chipset; IBM has not confirmed this. But it presents an interesting prospect, with 48 sockets, each with a dozen cores humming away. The Power 595 and Power 795 (there are no “plus” versions) topped out at 32 sockets. So, not only is there a 50 percent increase in the number of cores, there could be a 50 percent increase on top of that in socket count for a Power 770-class, multichassis box. Add in a higher clock speed, which is almost a certainty–let’s call it 4.5 GHz just for fun–and you are talking 173,250 CPWs per socket for multithreaded work (like running databases and Java application servers) and 8.3 million CPWs across all those hypothetical 48 sockets in something I will call a Power 870 for now. Take out some NUMA overhead and maybe it is more like 7 million CPWs. A Power 795 with 4 GHz Power7 processors and carved up into four partitions can push around 1.6 million CPWs.
That might mean, therefore, that IBM does not have to make a Power 895-class machine with the book packaging. Or, it may be able to push even further, say up to 64 sockets and 96 sockets. My mind is just blown thinking about this, but let’s keep playing for fun. At 96 sockets running at maybe 4.5 GHz, you are at something crazy like 13.6 million CPWs of aggregate oomph in a box that probably spans two racks.
That’s just the processing power. Let’s think about memory for a second. It seems reasonable that IBM will double up the memory per socket again. Right now, the midrange Power7+ machines are at 256 GB per socket in the Power 750+ system and 512 GB per socket in the Power 760+ system. The Power 770+ and Power 780+ are at 256 GB per socket max and the Power 795 is at 512 GB tops per socket. So a top-end Power Systems machine using the Power8 processor could have something on the order of 96 sockets with 96 TB of main memory. By the way, that is the precise design limit of Oracle‘s M6 processor and its “Bixby” interconnect.
Such a hefty Power8 system would make one hell of an in-memory database engine. Or a public cloud running IBM i. Or a massive consolidation engine for AIX, Linux, and IBM i workloads. And for all I know, if equipped with a fast parallel file system and a zippy Java virtual machine, it might even make a Hadoop big data muncher look more like real-time and less like batch. That memory bus and processor interconnect is just a lot faster than any InfiniBand or Ethernet network is ever going to be, and it presents some very interesting possibilities for workloads now running on clusters that are as dependent on communications between computing elements and data storage as they are on compute.
It always comes down to price, though. If IBM wants to be a supplier of shared memory, big data boxes, it can’t charge the traditional SMP and NUMA premium. And ditto for the midrange Power8 machines, which are going to have to best X86 systems running Xeon E5 and Xeon E7 chips in machines with four or eight sockets.
This big box talk is all interesting, but what I want in the midrange is a fast box with lots of integrated I/O that has the oomph of a Power 770-class box, the price of a Power 750, and a the software tiering of a Power 720. This, I think, would go a long way toward getting IBM i shops to invest more in IBM i applications. But I also think this is probably not likely. IBM is obsessed with Linux on Power, and rightly so because this is the company’s hope to grow the overall Power Systems business. I doubt very much it will be inclined to cut prices on AIX or IBM i systems one penny more than it thinks it has to. But, I have been pleasantly surprised before, and it could yet happen again.
If you have any ideas on how IBM should shape the future Power8 machines, don’t be afraid to share. Now is the time to tell IBM what you want. I am still noodling this myself, in fact, and may yet have more to say. This time, I just wanted to give you a sense of the tremendous raw performance that IBM has at its disposal.