Power Chips To Get A GPU Boost Through Nvidia Partnership
December 2, 2013 Timothy Prickett Morgan
What is old is new again. The AS/400 and its progeny are well acquainted with the concept of coprocessors–adjunct computing elements that hang off the central processor to help the collective do more work. Using graphics processors is all the rage in the high end of supercomputing these days, and now IBM will be working with GPU maker Nvidia to marry its Tesla GPU coprocessors to future Power processors. It is not entirely clear how this will help IBM i shops, but there are some intriguing possibilities.
If there is one rule in the computer business, it is that products that are aimed at one part of the market often ricochet into other unintended roles. This is not always the case, of course. There are plenty of products, like the System/3X minicomputers and the AS/400 follow-on, that were designed for a specific set of tasks and did them brilliantly and affordably. But an equal number of times, something that starts out here ends up there. IBM’s original RISC chips way back when were designed as I/O controllers for mainframes, for instance, and these were dusted off back in the early 1990s to create the Power architecture. The X86 processor that was designed explicitly for personal computers was moved into servers before it was reworked to do those workloads, and the same thing is happening now as the ARM processors designed for smartphones, tablets, and other embedded devices are moving into the system racket.
For those who have embarrassingly parallel workloads–simulations of all kinds and data sorting–the GPU chip has become an offload engine of choice, too. The reason is simple: A GPU can deliver a double-precision floating point calculation that is on the order of one-fifth to one one-tenth the cost of the same calculation on an X86 chip. And that X86 flops is, depending on the system, about half the price of a flops from a RISC chip running IBM i, AIX, or Solaris.
At the SC13 supercomputing conference in Denver two weeks ago, IBM said it was teaming up with Nvidia to align its future Power8 processors, due in the middle of next year, with Nvidia’s Tesla GPU accelerators. While the easiest and obvious use for GPU accelerators is in conjunction with simulation applications that model weather, the airflow over a plane wing or a car, the interactions of molecules, and what have you. Nvidia has worked with third-party simulation software providers to get several hundred applications tweaked so they can offload the parallel portions of their applications to GPU-based accelerator cards, which plug into PCI-Express slots just like graphics cards. In fact, the Tesla K10, K20X, and K40 accelerators are essentially graphics cards that have been tweaked to not have graphics ports but to be used as auxiliary motors. The latest Tesla K40 engine, which also debuted at SC13 and which I covered for HPCwire, is a beast of a machine, with 2,880 cores that deliver 4.29 teraflops of floating point oomph at single precision and 1.33 teraflops at double precision. It also has a GPU Boost mode that lets the GPU crank up the clocks if it has enough thermal space and thereby boost the performance of applications by somewhere between 20 and 30 percent.
To date, GPU accelerators have been use predominantly to accelerate Fortran, C, and C++ applications, but Brad McCredie, who is vice president of Power Systems development at Big Blue as well as the IBM Fellow in charge of the OpenPower Consortium, said that the partnership between the two companies was focused on enterprise applications, not just scientific ones. IBM has made it clear, as has Oracle, that it wants to see Java get accelerated. Both are working to make Java acceleration more transparent to programmers. It is difficult to say when this work will be done, but suffice it to say that the early results are encouraging. And if the parallel elements of a Java application can be accelerated by a GPU, then it stands to reason that the same thing can be done for RPG apps running on IBM i as well as for the underlying Java application server and IBM i database.
IBM is not making any promises about what will be accelerated at the moment, but is working with Nvidia to get its CUDA parallel programming environment integrated with IBM’s Rational Developer for Power development tools. This work will take some time because CUDA was not designed to work with AIX or Linux on Power. McCredie says that the idea is to make it all as seamless as possible. If supercomputer centers have trouble programming in parallel, enterprises are going to have even more woes because, frankly, human beings are inherently serial and don’t think in parallel very well. IBM is taking its own GPU acceleration medicine first and has set up an internal software evaluation team to figure out where GPU acceleration can be added to its database, analytics, and other software.
On the hardware side, the initial plan is to plug Tesla GPU accelerators into PCI-Express slots across the Power Systems line and let Rational work through CUDA to offload work to them. As I have previously explained, the Power8 chip has two PCI-Express 3.0 controllers that have been pulled onto the die, which among other things replace the InfiniBand-derived GX++ buses that were used to link CPUs to remote I/O drawers in the system and make them look local. IBM has a clever bit of electronics and microcode called the Coherent Accelerator Processor Interface, or CAPI for short, that rides atop that PCI transport. CAPI will allow coprocessors such as GPUs, field programmable gate arrays (FPGAs, popular in the financial services and the oil and gas industries), and digital signal processors (DSPs, also popular in oil and gas as well as intelligence and the telecommunications industries) to access the main memory in a Power System and also allow for the Power8 chip to access anything stored in the accelerator memory.
That shared virtual memory space will simplify programming for hybrid systems that mix CPUs and accelerators by–wait for it–presenting them with a single address space. Right now, if you want to use GPUs, you have to constantly move data into the GPUs to crunch it and then move it back to CPU main memory when you are done. This shared virtual memory for CPUs and accelerators should make programming easier, and allow the system to move data in and out of local accelerator memory as needed and do so under the covers without programmers having to manage it. The effect will be to make GPU and other kinds of accelerators look like they are on the Power die even if they are out on PCI-Express buses.
Over time, given the expense of real estate on a Power chip, IBM could even remove the AVX vector math units from future Power chips, thereby getting the chip size and the cost of each chip down. Each Power7+ core has four AVX vector units, each capable of executing one DP or two SP floating point operations per second. Running at 4 GHz, an eight-core Power7+ chip delivers 512 gigaflops at single precision or 256 gigaflops at double precision. This is a fair amount for a CPU, but it is not a lot for a GPU, as you can see.
The IBM-Nvidia partnership seems as much about giving Big Blue an edge in the high-end analytics market as it is about giving the company a new story to tell among supercomputing customers. It will be interesting to see what kinds of Java, C, C++, and even RPG workloads can be accelerated using the GPUs as well as what can be done to make operating systems and databases do more work through the CPU-GPU collective. Hopefully, whatever IBM does will be so transparent that applications can just pull some new libraries and suddenly have certain codes run a lot faster. And hopefully IBM will soon explain how this might benefit IBM i shops in particular. For now, it is still mostly theoretical. But, more importantly, the partnership shows that IBM is investing in the Power chip for the long haul and is not giving up. And that means that IBM i will have a platform to run on in the years ahead.
I have said it before, and I will keep saying it: Anything that makes Power Systems stronger lets IBM i live longer.