Talking Power9 With IBM Fellow Brad McCredie
October 2, 2017 Timothy Prickett Morgan
The Power8 generation is at the very end of its life, and the Power9 generation is just starting to crawl and soon will be walking and then running. IBM is a bit behind its processor rollout cadence, but that delay is not as great as the one that Intel has experienced with its recent “Skylake” Xeon SP processor launch, which was expected last September and then was held off formally until July. We had heard Power9 iron would launch in July, then in October, and now we are hearing maybe in December.
It is unusual for there not to be a firm date at this point for the Power9 launch, but given all the processors and coprocessors coming out as well as faster networking, and IBM’s desire to market Power9 alongside all of them, it is not unreasonable for IBM to hold off. Given the very high prices for memory and flash right now, which is hurting all server sales, it is also perhaps smart to sell Power9 chips and systems to the HPC centers and hyperscalers that can afford to pay the early premium and soak up all of the volume now, and give Globalfoundries, IBM’s chip fab partner, time to perfect the 14 nanometer processes that are used to etch the scale-out “Nimbus” and scale-up “Cumulus” Power9 chip variants that will go into its systems. (We told you all about the Power9 processors back in August 2016, and have been waiting, somewhat impatiently because we like iron, to see them come out since then.)
It is not yet clear what IBM’s plan is, but we are starting to hear some chatter in recent weeks so something is moving, and it is not just the Summit and Sierra machines, which together will have close to 10,000 nodes across the two machines. I still happen to think that IBM will get an HPC-specific variant of the Power9 machine, code-named “Witherspoon,” out the door sometime around the SC17 supercomputing event in Denver in the middle of November. If IBM announces it then and ships it in December, that fits the latest set of data points I am hearing and also is close enough that the coherency tweaks that couple Power9 closely to Nvidia Tesla GPU coprocessors will be in the Linux kernel. This is important for HPC shops today, but hybrid computing is the wave of the future and could be important, even to IBM i shops, in the not too distant future.
With so much chatter going on and not a lot of hard facts, we reached out to Brad McCredie, who is an IBM Fellow and who is also vice president of Power Systems development and president of the OpenPower Foundation. McCredie is the architect of the opening up of the Power architecture and has also spearheaded its co-development with a number of ODMs and chip makers in Asia as well as with hyperscalers Google and Rackspace, who are working on a shared Power9 design code-named “Zaius.”
McCredie got his BS, MS, and PhD degree in electrical and computer engineering from the University of Illinois, and joined IBM in 1991, focusing on electromagnetic design and electronic packaging for IBM’s mainframe systems. A few years later, McCredie moved over to the RS/6000 systems organization and worked on developing the Power3 chip, and since that time has worked in a variety of technical and executive leadership roles within the Power Systems group. In 2013, it was McCredie, along Gordon McKean of Google, who spearheaded the creation of the OpenPower foundation. McCredie is also a member of IBM’s Technology Team, a senior executive group that sets IBM’s overall technical strategy. Suffice it to say, McCredie knows what is going on at IBM, and he is helping steer a bunch of things.
Early this year, IBM rebranded the Power Systems division the Cognitive Systems division, and it is not precisely clear if all of the machines will have a name change. It could turn out that IBM mimics the bifurcation it did with the Flex Systems converged systems, which launched in April 2012 and which are now owned by Lenovo. With the Flex Systems, the bare iron was known by that name, but systems configured up with systems and application software were branded separately as PureFlex Systems. We may see something similar to this with the future Power9 iron. McCredie was not at liberty to say, but concurred that IBM clearly is trying to move up the stack while still selling complete systems underneath applications.
Timothy Prickett Morgan: The Power9 processors have started shipping in the “Summit” and “Sierra” supercomputers being built by IBM and its partners Nvidia and Mellanox Technologies, who are supplying the GPU accelerators and InfiniBand networking for these systems. It is exciting to see the Power9 chip shipping, and in the second half of 2017 as promised, but I am more interesting to see it get more broadly deployed in commercial Power Systems that enterprises can buy and, equally importantly, underneath software stacks that are tuned for specific workloads.
Brad McCredie: The consumption model is moving up the stack, and we are going to blaze a trail integrating our cloud function, app stores, and containers. You take that, make a cluster out of it, and one of the key applications is going to be artificial intelligence.
TPM: So what does the rollout look like for Power9? I have gotten some hints, and we know that AIX and IBM i are not going to be running on Power9 until sometime in early 2018 already. I have a better sense of what the machines look like than I do the timing of them, and that is backwards from the usual case.
Brad McCredie: Here is what I can tell you. The systems that are engaged in the Summit and Sierra machines, we will make them available commercially this year. Which makes perfect sense.
TPM: That is all that I expected at this point, with the rest coming out next year. So what does that kind of schedule mean for IBM now? Is Power8 just kind of tailing off, are people pausing? Are they just consuming the Power8 machines, like the “Minksy” Power S822LC for HPC server that supports Power8+ and NVLink? I don’t have a feel for it.
Brad McCredie: I think there are a couple of trends going on that are interesting, and they tie in. If you look at our Q2 results, they were strong – which was very good results for being late in the product cycle. There are many contributing factors to this. One, there is a shift in value. What do I mean by that? As you start looking at the early versions of our Cognitive Systems, which would be our Minksy systems coupled with our PowerAI machine learning software stack, we are finding that this bundle, this collection of IP, with the highly differentiated accelerator attachment combined with the tuned software stack that is supported by IBM, is driving value and therefore driving business. This is not at all tied to where customers are at in their processor lifecycles, it is tied to giving customers the right software at the right time.
There is another lifecycle timeline that we are seeing that is independent of processor lifecycle, and this is the growth in Linux. We continue to get traction with Linux, and there are very long lead times for proofs of concept. A decision to transition from X86 to Power on a Linux platform takes time. These deals are starting to kick in here, late in the Power8 cycle.
TPM: The “Skylake” Xeon SP processors are out now from Intel, and we know what they are, we have analyzed their architecture, and calculated the huge premium that Intel is charging for them. What is your positioning of Power8 and Power9 systems against those using the Skylake Xeons? You have done a little math, too, so what is your gut instinct on this?
Brad McCredie: There are two things to consider here. One, we have our standard positioning, where we will have more performance than Skylake – we will have our usual benchmark wars and things like that. And two, in addition to that, now that we have seen Skylake and the handwriting is clearly written on the wall, is the memory and I/O subsystem advantage. We have PCI-Express 4.0, we have our OpenCAPI accelerator bus, we have NVLink 2.0, and that is going to generate a lot of value. More and more value is coming from the network, because more and more security is on that network, and of course there is compute offload to GPUs and that trend is obvious to everybody. As we have bumped up the performance on these Power9 processors, with so many cores crammed on there, the value of memory and I/O bandwidth is going up and up and up.
TPM: The good news for Power Systems, then, is that Power9 is going to be ramping into the market when memory and flash prices, which skyrocketed this year, will hopefully be coming back down. The Skylake Xeons and AMD “Naples” Epycs were not so lucky. A lot of people, I would guess, are making a compromise and using lower capacity DIMMs, which cost less per unit of capacity and also going with less capacity, but fully populating their systems to get the full memory bandwidth it offers.
Brad McCredie: This kind of math is starting to drive our future design points. We used to do look at dollars per gigabyte of capacity, day in and day out, but we are always looking at dollars per gigabyte per second of bandwidth as well. That bandwidth value is going up and up and contributing more and more to the buying decision. This kind of math is being done by the hyperscalers, and also with SAP HANA, which is by nature a memory hog of a workload.
TPM: it is kind of the point, isn’t it?
Speaking of SAP HANA, what is IBM’s Power Systems seeing in that market? I assume the processing, memory capacity, and memory bandwidth advantages are playing out with that workload.
Brad McCredie: I wish I had some precise numbers here, but what I can tell you is that we have seen tremendous growth. But the value that we are driving is also through virtualization of SAP HANA. We are the only platform that is certified to run a hypervisor – in this case, PowerVM, not the OpenKVM variant – underneath SAP HANA. That flexibility of being able to co-locate HANA workloads with other analytics workloads like Hadoop all on the Power Systems is a huge value for us.
TPM: I didn’t know you could do that. So neat. You learn something new every day, and that was my one thing.
I realize that Power9 is not formally out yet, but I have to ask about Power10. Where is it at in the cycle now? Is it still on the drawing boards, or is it off it yet?
Brad McCredie: We are maintaining our cadence. If you start with Power4, every three years or so we do a new cycle. To maintain that cadence, Power10 is well into putting the finishing touches on concept exit. Then we go into high level design from there, where we outline the VHDL.
TPM: What do you expect the Linux portion of Cognitive Systems and Power Systems to be going forward? I expect it to dominate, but maybe we should be expecting it to be one third for each platform, including AIX and IBM i. Certainly in quarters where IBM is shipping supercomputers like Summit and Sierra, and the processor is not yet launched on AIX and IBM i, Linux will dominate. But what is the long term, averaged out trend?
Brad McCredie: I think you have to look at it from a perspective of growth. The various modern Cognitive Systems workloads are on Linux. We expect tremendous things in the cognitive and AI space, of course. The trends for AIX and IBM i are well documented, and they are flat to declining and our intent is to keep them flat. If that happens, then the bulk of our market, over time, should be Linux. Not because we have a great love of a particular love of one OS over another OS. It is just that Linux is the platform that is carrying those strategic workloads.
TPM: I understand, and as I have told IBM i and AIX shops many times, anything that makes Power Systems stronger makes IBM i and AIX last longer. The other neat bit, I think, is that drawing lines between so-called legacy and modern workloads is getting harder and harder to draw. You can mix and match them on the same system or cluster of systems. This is happening as AI and HPC converge, for instance, and visualization and database acceleration will be the next convergences, I think. And there will probably be cases where all four come into play on the same platform, with a workflow knitting them all together into an ensemble. This is happening with weather modeling systems today, for instance.
Brad McCredie: We can take the weather example and replay it in banking and finance, even in retail and advertising.
TPM: I don’t know, man. I think people are much less predictable than the weather.
Brad McCredie: There probably is some truth to that. But the larger point you were on there is that our HPC team, which is largely run out of IBM Research, is 100 percent down this path. Look at what we have: The backbone of our largest HPC play, which is Summit and Sierra, is a Power9 with Nvidia GPUs, is also the backbone of our cognitive systems.