IBM Opens Up Coherent Protocols For Power Chips
October 24, 2016 Timothy Prickett Morgan
In case you did not get the memo, IBM is dead serious about promoting the Power architecture as a credible alternative to the Xeon architecture from Intel and the various upstarts that want to take a piece of the processing action in the datacenter. The OpenPower Foundation, formed a little more than three years ago, is the center of gravity for the Power push, but Big Blue has aspirations for some of its technology that go well beyond its own chips.
That is why IBM is forming the OpenCAPI Consortium, a group of like-minded IT suppliers and customers who want to see a better way of linking modern compute, storage, and networking devices to processors complexes than is possible over the generic PCI-Express peripheral bus that is in its third generation and getting ready to move into the fourth.
But don’t get the wrong idea. IBM is an early and enthusiastic supporter of the PCI-Express 4.0 bus and peripherals that will hook into it, and from the looks of things, the Power9 scale out processors used in machines with one or two sockets, set to debut in the second half of 2017, will be the first of the server processors that support PCI-Express 4.0 devices. It is just that, for many of the workloads that IBM is chasing with the Power9 chips, this bus does not offer low enough latency and software overhead to adequately and efficiently accelerate workloads that increasing depend on a very tight coupling between the processor complex and its main memory and adjunct accelerators based on GPUs and FPGAs, network adapters based on InfiniBand or Ethernet, or various kinds of emerging memory-like non-volatile storage.
The OpenCAPI Consortium is concerned with opening up the Coherent Accelerator Processor Interconnect that IBM originally created on the Power8 chip, which provided a low latency method of linking the memory on an external device into the main memory complex of a Power8 chip, making one giant address space to simplify the programming and movement of data between the CPUs and their memory and these external devices. To oversimplify a bit, it makes external devices look like on-chip accelerators as far as the Power9 is concerned and it makes main memory hanging off the Power9 chips look like local memory as far as those accelerators are concerned. This will radically simplify the programming of hybrid systems that span multiple kinds of devices for compute and storage. The remarkable thing is that IBM is opening this up for other companies, including Intel, to license and adopt. The OpenCAPI Consortium is headed up by IBM, and includes AMD, Dell EMC, Google, Hewlett Packard Enterprise, Mellanox Technologies, Micron Technology, Nvidia, and Xilinx. Some of these companies are already members of the OpenPower Consortium that is promoting the Power architecture, but as IBM explained last week in detail, the company wanted to break OpenCAPI free from OpenPower so others who do not want to be involved with OpenPower can still use the OpenCAPI approach to link devices to each other and to processor complexes.
You can see my technical piece on OpenCAPI over here at The Next Platform to get the nitty gritty details. But I will give you a short overview of OpenCAPI and how it fits in with Power9 and other chips and therefore other systems.
CAPI 1.0, as the first generation is now called, implemented a coherent memory protocol over PCI-Express 3.0 controllers on the Power8 chip, but stripped out all of the unnecessary overhead in the PCI protocol to just allow for streamlined coherent memory addressing between devices. With CAPI 2.0, coming in the Power9 chips, an updated CAPI protocol would be designed to run over PCI-Express 4.0 controllers, which will deliver a peak 32 GB/sec of bi-directional bandwidth over an x16 slot commonly used for a beefy accelerator like an Nvidia Tesla GPU card. This is twice the bandwidth that the PCI-Express 3.0 controllers can deliver with that many lanes, so this is a huge increase in bandwidth. But we are not certain that PCI-Express 4.0 will show any improvement in latency, and with so-called New CAPI, which is now referred to as CAPI 3.0 and now OpenCAPI but also known by its code name “BlueLink,” IBM actually went all the way and designed its own 25 Gb/sec port on the Power9 chip expressly for running an even more highly tuned version of the CAPI protocol. This OpenCAPI port and protocol combination will not have higher peak theoretical bandwidth than a PCI-Express 4.0 slot with the same number of lanes, but IBM is expecting for the effective bandwidth as delivered will be much higher.
On the Power9 chips aimed at scale-out, two-socket servers–the kinds that the vast majority of IBM i shops deploy–the processors have 48 lanes of PCI-Express 4.0 peripheral I/O per socket and deliver an aggregate of 192 GB/sec of aggregate duplex bandwidth per socket. The Power9 SO chip will support 48 lanes of 25 Gb/sec BlueLink bandwidth per socket for other connectivity, with an aggregate bandwidth of 300 GB/sec across those lanes. We do not know the latency differences between PCI-Express 4.0, PCI-Express 4.0 running CAPI 2.0, and BlueLink running OpenCAPI, but we have to figure it has to be substantial or IBM would not have bothered.
Lower latency is the key for hybrid machines, and IBM wants all processors, not just Power9 chips, to have it. Or more precisely, IBM wants for a large ecosystem of compute, storage, and network devices to license its OpenCAPI ports and protocols to add memory coherency to their hybrid systems, and that in turn will feed into the OpenPower ecosystem which, by itself, has not been able to foster a large number of devices equipped with CAPI receivers, much less CAPI transmitters. For OpenPower to work, IBM has to let go of OpenCAPI, which is basically the family jewels that separate a Power chip from its rivals.
The CPU business is a hard one. But IBM has learned to let go and build actual ecosystems, and that is something. It is also something that Intel is not doing. Hence the enthusiasm that Google is showing for Power9 servers, and Dell and HPE are signing up to promote OpenCAPI. This puts pressure on Intel to play along, or show off a better alternative, if it has one. As best as I can figure, Intel is up to something with its future “Skylake” Xeon processors, but the exact details are unknown. It could just be an improved QuickPath Interconnect bus, used for NUMA links between Xeon processors, that allows for devices to hang off it. This is essentially what that BlueLink port is on the Power9 chip. IBM uses 16 Gb/sec ports on the Power9 chip to hook four processors into a single system image, and then uses 25 Gb/sec links to make larger systems that have eight, twelve, or sixteen sockets. By the way, the same 25 Gb/sec ports on the Power9 chip will be able to support the faster NVLink 2.0 protocol for lashing future “Volta” Tesla V100 GPUs to the Power9 complex and allowing for coherent sharing between Power9 DRAM memory and Tesla HBM stacked memory.
All of these developments to open up portions of the Power architecture are of interest to IBM i shops in a few ways. For one thing, if IBM manages to get a lot of licensing royalties for the BlueLink ports and the OpenCAPI (CAPI 3.0) protocol, then this is money that by all rights should be counted as supporting the funding of the Power Systems platform development within IBM. This money will not show up in the Power Systems profit and loss statements necessarily, but Big Blue knows it is there.
Moreover, as flash and other non-volatile storage-class memories (or memory-class storage, depending on how you want to think of it) get OpenCAPI interfaces, along with GPU and FPGA accelerators, those platforms that support OpenCAPI will have a rich set of options to choose from to augment the processing, memory, and networking on the processor complex. Inasmuch as IBM has the inside track on CAPI and really believes in it, the company can create an ecosystem of partners that bolster OpenCAPI and therefore in turn the Power architecture. IBM has to paradoxically give CAPI away so others can use it, but then it also has to be the best implementer of the technology. This is a very delicate maneuver, indeed. But it sure beats sitting by and having Intel whip it in the CPU business.
As I have said before, IBM needs to think outside of the box and start gearing up these technologies so they can be used to accelerate IBM i databases and other functions, like virtual desktop infrastructure, within IBM i shops. (See New OpenPower Servers Present Interesting IBM i Possibilities for more thoughts on that.) Rather than trying to make customers pay a lot for a little bit of processing, IBM needs to take its own architecture to heart and come up with a new AS/400 that reflects the diversity of compute, storage, and networking that a modern datacenter has. There is no reason that something like IBM i can’t be the glue that holds it all together for the 125,000 companies that rely on that platform today. Start here and stop looking so hard elsewhere.