The Chatter About Future Power9 Servers
September 11, 2017 Timothy Prickett Morgan
In late July, the first of the Power9 systems, the one code-named “Witherspoon” that was designed explicitly to be installed in the “Summit” and “Sierra” clustered supercomputers installed at Oak Ridge National Laboratories and Lawrence Livermore National Laboratory, respectively, started rolling off the production line at Big Blue and into those HPC centers that are funded by the U.S. Department of Energy. It is the beginning, however quiet, of what we expect will be a rolling thunder rollout of Power9 systems in late 2017 and through early 2018.
Because there have been so many processor announcements in the past few months, we expect that IBM’s own sales reps as well as those of channel partners are getting a bit anxious that the new Xeons and the new Epycs are out from Intel and AMD, and that a bunch of other ARM chips have also come out and are ramping their shipments. IBM’s own z14 processor is out in its updated System z mainframe, and that processor and system, which is a true beast and I/O machine, was launched in July and highlighted at the Hot Chips conference in Silicon Valley at the end of August. (If there was ever a hot chip, the z14 is the one, which has a multichip processor complex that burns at 400 watts to 500 watts a pop.) As we have pointed out before, IBM is in a conundrum of sorts with the Power9 systems not being out right now, as many customers had expected based on Big Blue’s own statements over the past couple of years, and shipping a few racks of machines to Oak Ridge and Lawrence Livermore technically meets the shipment goals of getting Power9 machines out in the second half of 2017, as was promised, but that is not what was meant or understood when IBM executives said there would be real shipments then.
To be fair, IBM has been very clear that initial shipments of the 24-core Power9 Scale Out, or Power9 SO, chips used in machines with two and four sockets would come in the second half of 2017. These chips, we have learned, are code-named “Numbus,” if you want to play chip name Bingo with all your friends. The bigger NUMA boxes with four, eight, or 16 sockets using the Power9 Scale Up, or Power9 SU, chips (code-named “Cumulus”) were always expected sometime in 2018. And more recently, IBM said that IBM i and AIX customers should not expect to see Power9 systems capable of running either of these operating systems (or we presume any little endian Linux running atop the PowerVM hypervisor) until early in 2018.
So, basically, if there is an upcoming Power9 launch, perhaps in October or November, the latter being concurrent with the SC17 supercomputing conference in Denver, it seems it will be aimed at the Linux crowd and particularly those in the HPC, machine learning, and data analytics areas.
Having said all of that, and given the desire to not lose accounts to X86 or other iron, IBM’s Power systems peddlers need some sort of insight into what the plan is with a bit more precision, and from the scuttlebutt we are hearing, it looks like Big Blue is letting its partners know a bit more about what the future Cognitive Systems line based on Power9 chips will look like.
From what we can piece together, the Power9 line will be an evolution of the current Power8 line, which started rolling out in April 2014 and which took many months have the full lineup, spanning from one socket entry servers to 16 socket behemoths. With its “Skylake” Xeon SP launch, Intel has converged its entry and midrange Xeon E5 processors with its midrange and high end Xeon E7 processors to create a single, unified processor line that uses standard memory. It has ditched buffered memory in the high end SKUs of the Xeon E7s. Buffered memory offers memory capacity and bandwidth expansion, and IBM has used it in the entire Power8 line and will still use it on Power SU processors for big NUMA boxes that we expect in 2018. The Power9 SO processors all use standard DIMM memory, which is not buffered. Intel is charging a pretty hefty premium on processors that offer fatter and faster memory and NUMA interconnects with four or eight sockets – we have done the math over at The Next Platform – and this presents a competitive opportunity for Power9 machines if IBM does not delay getting into the market for too long.
When we say that Power9 systems are an evolution compared to the Power8 systems, we do not mean that the platforms are socket compatible, which they most assuredly are not because the amount of I/O coming off the Power9 chips, as we explained when talking about their architecture last year, is much larger. The memory capacity on Power9 machines is expected to be anywhere from 2X to 4X that of the Power8s they replace, but that, we think, has more to do with having really fat memory sticks than it does adding memory channels. Intel is offering 768 GB of memory per socket on standard Skylake Xeon parts and 1.5 TB per socket on fatter Skylake Xeon SP parts, and AMD Epyc processors can deliver 2 TB per socket. IBM has already revealed that the architectural limit of the Power8 and Power9 chips is 4 TB per socket (it is about how many pins there are for physical memory, not how wide the 64-bit virtual addressing is). With the Nimbus Power9 SO chips, we expect to see relatively skinny configurations for HPC shops, which have distributed memory software stacks based on the Message Passing Interface (MPI) protocol and that don’t usually have clusters made of fat nodes with lots of memory. But the commercial versions of the Power9 systems should have some pretty beefy memory, pushing up against that 4 TB per socket limit. The Cumulus Power9 SU processors used in the beefier NUMA machines will be able to address up to 8 TB per socket because the buffer chips allow four memory sticks to hang off of each of the eight controllers on the die. If IBM could deliver even fatter memory sticks, say an incredible 512 GB per stick, double that of the 256 GB expected, it could do an even more impressive 16 TB per socket.
But we are not entirely sure how much memory IBM i, AIX, or Linux could address in a single system image, and it could be far lower than what would be 256 TB in a full-on Power9 machine with 16 sockets using these suggested 512 GB sticks. IBM would do well to push the memory limits and memory bandwidth advantage it has with the Power9 to the hilt. Intel is weak here, and in-memory processing, backed by fast flash, is all the rage. It would be good if the operating systems could be tweaked for such large memories if this has not been done because, again, this would be a very big competitive advantage so long as the Power9 NUMA interconnect can handle the strain.
We will have to wait a bit for the precise feeds and speeds, but on a per socket basis, the Power9 chip will deliver twice the cores and twice the raw performance per socket, and if what we are hearing is true, then the actual machines will deliver anywhere from 2X to 4X the memory capacity and still anywhere from 50 percent to 120 percent more memory bandwidth than an equivalent Xeon SP processor delivers. The aggregate I/O bandwidth, even on smaller one-socket and two-socket machines, will also probably blow machines based on the Xeon SP away. These are real advantages, and it allows IBM to preserve its AIX base and extend its reach in the Linux market. We are still not sure what good this does hard-core IBM i shops, and as we pointed out a few weeks ago, IBM might do better overclocking Power9 machines with relatively few cores than trying to give IBM i shops more cores that they don’t need and can’t afford at the prices Big Blue wants to charge for them.
As for the shape of the future Power9 product line, our guess is that if you took the product names like S814, S822, S824 in the scale out line and E850, E870, and E880 in the scale up line and global replaced the 8 with a 9, you would end up with something that looks like the future Power9 line. The basic form factors probably will not change much, as they don’t in workhorse X86 servers, and IBM is not doing blades or modular or disaggregated systems with the Power platform (as it should, so don’t get me started), so I don’t expect too radical of a change here in terms of form factors. This will be a straight up engineering exercise, packing twice as much stuff into the same form factors on just about every front. And that is not so bad, even over a four-year timeframe between generations.
The issue is, then, what will IBM charge for 2X the everything? Back in 1995, when it switched from proprietary CISC processors to the first generation of PowerPC RISC chips, it charged 1X the money for 2X the everything. It would be amazing if IBM did that again and actually tried to convert its vast installed base, as it did then. It has been more than two decades since IBM did that, and now is the time.