Aiming High, And Low, With Power Chips

December 1, 2014 Timothy Prickett Morgan

Two weeks ago, ahead of the SuperComputing 2014 conference in New Orleans, IBM announced that the U.S. Department of Energy had chosen Big Blue and hardware partner Nvidia to build two massive parallel supercomputers, nick-named Sierra and Summit. This deal, along with IBM selling off its System x server business to Lenovo Group and its chip manufacturing operations to GlobalFoundries, gives IBM a tighter focus on its Power Systems line, and importantly, gives IBM the resources to focus on Power chip development and systems design.

I am a firm believer in the trickle-down theory of technology adoption, but not every technology comes down from on high into the supercomputer and hyperscale datacenters of the world into the enterprise. In fact, IBM has had spotty results to date in developing advanced technologies and converting them to commercial advantage. The latest effort by IBM, with partner Nvidia, is taking a different approach, scaling out what is in essence a hybrid CPU-GPU system that is already available to commercial customers to a massively parallel supercomputer.

Some technologies work their way up from below just fine. Flash took off as a means to accelerate databases in the enterprise first, then became normal in hyperscale setups and is only now being adopted in supercomputers. Small businesses adopted X86-based servers running NetWare, Unix, and then Windows and Linux long before they were fitted into pizza boxes and became the norm in the enterprise datacenter. High-speed interconnects all start at the top, but AS/400 shops will remember that this proprietary system was the first to get many different kinds of technologies, including 3.5-inch disk drives (made in Rochester, Minnesota) back when they were exotic as well as advanced and dense memory chips when IBM was still in that racket.

As we reported two weeks ago, the CORAL contract with the Department of Energy for the Sierra and Summit is worth $325 million, which is a hefty chunk of change by any standards. And it renews a much-needed practice that IBM has relied upon since the Power architecture was launched in February 1990, and that is for government and academic supercomputing centers to fund early development of advanced systems that IBM then attempts to commercialize.

This was certainly the case with the RS/6000 PowerParallel machines from 1993, which made their way into Cornell University and CERN the next year, and in 1996 the Department of Energy lined up for a 3 teraflops RS/6000 SP parallel system to simulate explosions of nuclear weapons. The Deep Blue chess-playing variant of the RS/6000 SP took on and beat Gary Kasparov in 1997. Supercomputer centers kept adopting each release of the RS/6000 SP system, and while IBM did sell parallel DB2 and Oracle databases for this system, it wasn’t until 2002, with the launch of the Power5 processors and the unified iSeries and pSeries products that deployed them, that research and development for supercomputers was directly applied to commercial systems. The ASCI Purple supercomputer at Lawrence Livermore National Laboratory was based on Power5 processors, the same ones used in commercial systems, but the BlueGene line of machines continued to use low-power chips with a homegrown interconnect for applications that could use hundreds of thousands to millions of cores.

It never made much sense to me that IBM didn’t position BlueGene as a Linux-based cloud, and I have to believe that Big Blue could not manufacture it cheap enough to meet the price points of hyperscale datacenter operators like Facebook, Google, Yahoo, and others. (We may see that BlueGene interconnect again some day.) And with the original Blue Waters machine, which crammed a ridiculous amount of Power7 cores, main memory, and I/O bandwidth into a very tight space, the engineering was second to none but the machine, at around $1.5 billion, IBM could not afford to manufacture it and still make money. So IBM spiked the deal with the National Center for Supercomputer Applications (NCSA) at the University of Illinois, and tried to reposition the machine as a big analytics box without much success. The US government footed the bill for the Blue Waters development, and that Blue Waters project helped cover the cost of the development of the Power7 processors that IBM i shops are still benefitting from.

With the CORAL project and the Sierra machine at LLNL and the Summit machine at Oak Ridge National Laboratory, the Energy Department is committing to buy machines with at least 150 petaflops of aggregate number-crunching capacity, with the Summit machine at Oak Ridge possibly being extended to 300 petaflops.

According to a presentation I saw by Jeff Nichols, associate laboratory director at Oak Ridge, at the SC14 conference, the Summit machine will have around 3,400 server nodes, each with multiple Power9 processors and multiple “Volta” generations of Nvidia GPU coprocessors. The Power9 chips will support DDR4 memory and the Nvidia GPUs will support High-Bandwidth Memory (HBM) 3D stacked memory that was developed by Hynix and Advanced Micro Devices. The combined HBM on the GPU and DDR main memory on the CPU will amount to 512 GB per node, and importantly, all of the GPUs and CPUs in the node will be able to address all of this memory as a single entity. (Think of it as single-level store within the node.) Each node will have 800 GB of non-volatile memory (of some kind, as yet undefined) that can be configured as a burst buffer between the node and outside storage or as an extension to main memory from the CPUs. This suggests that it will link to the Power9 processors through IBM’s Coherent Accelerator Processor Interface (CAPI), which is a virtual memory protocol that rides atop the PCI controller on the Power processor and allows unified memory between peripherals and the processors.

The Volta GPUs will link to each other and to the Power9 processors through Nvidia’s NVLink point-to-point interconnect. The nodes in the Summit and Sierra systems will link to each other using 100 Gb/sec EDR InfiniBand, which is just coming to market now. (I am surprised that faster InfiniBand won’t be available when these machines are installed in 2017. The next stop is presumably 200 GB/sec, but it could be 400 Gb/sec on the way to 1 Tb/sec.)

The important thing about the CORAL systems being built by the Energy Department is that there is funding to support the Power9 chip, which was not a foregone conclusion. IBM was already working on its 14 nanometer processes when it sold off its chip manufacturing business in October to GlobalFoundries, and it is my guess that this process shrink was aimed at Power8+ chips and IBM has been clear that the 14 nanometer node would have a mix of IBM and GloFo processes. IBM expects to move to the 10 nanometer processes from GlobalFoundries without any of its own tech, and I presume again that this is for the Power9 chips. Beyond that, IBM is not talking process nodes, but if Power9 comes out in late 2017, then we can probably expect a 7 nanometer node, if it can be done, with Power9+ perhaps by 2019 or 2020. That might be around the time that Summit is upgraded to a 300 petaflops machine with whatever GPU accelerators Nvidia has on deck after Volta. (We have had Tesla, Kepler, Maxwell, Pascal, and Volta so far. Eventually we will get to Einstein.)

Knowing that there is a Power9 and probably a Power9+ on the roadmap is meant to be a comfort to all Power Systems customers, and it is a wonder that IBM has not just come out and laid out a five-year roadmap for its chips now that it is no longer selling System x machines, no longer is running its own fabs, and has a lot at stake in trying to promote Power Systems as an alternative to the hegemony of X86 servers and the up-and-coming ARM servers that will start coming into the market in earnest next year. Now is the time to plot the course and tell customers a story about where the Power chip is going and how IBM will not only embrace the hybrid computing that it helped pioneer, but which with the help of Nvidia, Mellanox Technologies, and others will extend into the high-end supercomputing space and down to the low-end of the small and medium business space. Imagine what kind of sophisticated, real-time processing you could do with a single-level storage machine that mixes CPUs and GPUs and several layers of main and flash memories, and with fast interconnects that would allow them to be clustered for scalability.

Hopefully, IBM sees a place for IBM i in this future world. Now, we all need to wrack our brains to figure out the kinds of advanced analytics that could be run on those GPU coprocessors in a future Power9 system. It is time to plan the applications today so they are ready when the iron gets here. Now is the time to radically extend and enhance IBM i and to embed analytics and other kinds of simulation and modeling in the platform much as the relational database has been embedded since day one of the System/38 and the AS/400.

Rather than complain about how IBM i customers do not need such awesome power, we should focus on trying to create the kinds of workloads that midrange shops need that will absorb this awesome power. We want to benefit from all of this largesse, not just gaze upon it from below.