Talking Shop With Power Systems Chief Stephen Leonard
May 10, 2021 Timothy Prickett Morgan
Whether or not you were aware of it, the Cognitive Systems division of IBM – what we still colloquially call Power Systems here at IT Jungle and what everyone else still calls it for the most part – has had a new general manager for quite some time and we had not yet had a chance to talk to the new chief. But recently we did hop on the Zoom with Stephen Leonard to get some insight about the Power Systems business and how it fits into the new and evolving IBM and the rapidly changing IT sector.
Leonard dates from an earlier era, like many of us, and in fact, he and I both started in the IBM midrange in 1989. Leonard had various sales roles in the 1990s and 2000s, and was vice president of worldwide sales for Power Systems from January 2006 through October 2008 and was vice president of sales for all of Big Blue’s North American operations for a stint in the Great Recession before being appointed the head of IBM United Kingdom for three years. After that, Leonard was general manager of global markets for IBM’s Systems group, where he participated in the spinoff of the System x server business to Lenovo, and followed that up with being general manager of IBM’s North American operations from 2016 through 2019. In his current role, Leonard took over the Power Systems franchise completely, replacing Bob Piciano, who retired from IBM after more than 35 years at the company. (And who was a very good GM for Power Systems, by the way.)
Leonard knows his stuff, too. And knows IBM and its customers well.
TPM: You took over as the executive in charge of the Cognitive Systems business two years ago, and a lot has changed in the systems business in the several years before that and since.
Stephen Leonard: It was Almost exactly two years ago when I was asked to come in and help on Power Systems. And interestingly enough the remit was how are we going to take Power forward here?
To your point, there’s a lot of bifurcation and diversification happening in the in the microprocessor space. Initiatives like RISC-V, and a lot of activity with Arm and AMD and Intel.
TPM: And a lot of server CPU challengers have left the field, too. We have lost a lot of Arm vendors – AMD, Qualcomm, Marvell/Cavium, Broadcom, Calxeda, the original Applied Micro. Samsung never entered the field, and Nvidia left for a decade. Sun Microsystems is dead, dead, dead. Hewlett Packard, well maybe that Itanium wasn’t a good idea. I’m aware of what the Arm collective is doing and now Nvidia’s desire to do something about that with its acquisition of Arm Holdings. And I’ve talked to Jensen Huang, Nvidia’s co-founder and chief executive officer repeatedly and at length about his infrastructure dreams, not just servers, and I said he could just build a server chip, he didn’t need to spend $40 billion to acquire Arm. But then Huang gave me his thoughts on everything, every piece of silicon for networking and storage and compute, going through an Arm type distribution and licensing channel, because Nvidia knows it can’t make everything because there’s going to be massive diversification of compute, networking, and storage to tailor for very precise workloads.
Stephen Leonard: The thing is, it’s an interesting move and maybe some of it is defensive, because what was clearly obvious was this: Nvidia was not going to get the easy ride in the accelerator space that it has been having. Intel was starting to make acquisitions and AMD is building stuff. And there is a lot of activity from a bunch of startups all over the place looking at acceleration of different descriptions and not just the acceleration via accelerators, but acceleration in software, acceleration right down in the silicon itself, in the microprocessor.
TPM: It’s funny, I kept asking myself: Why does Arm keep talking about all of the different kinds of processors and all the different kinds of embedded vector math? And the answer is obvious, once you start thinking about it. They realize that 90 percent of acceleration is not going to be done on a GPU, ultimately.
When I talked to Brad McCredie, who ran Power chip development and started OpenPower, a decade ago, it was the other story. IBM was absolutely convinced that acceleration would be something that is done. Now, IBM’s position – to be super clear – was that there would be some acceleration on the CPU, but there would be a lot offloaded to something else in special cases. IBMers never gave percentages on it. What’s obvious to me is that there are going to be a lot of workloads where you to specifically tune a piece of iron and this idea of general purpose, high volume, cheap as chips processing is done, over, kaput. That you’re going to have to figure out – and this is what’s hard – your workloads pretty well. And it’s still going to be useful to have a device that can do many different jobs. Don’t get me wrong. So that “Ampere” A100 GPU accelerator from Nvidia, it’s a VDI engine. It’s an inference engine. It’s a training engine. It’s an HPC engine. You know, it can it can do multiple things. And that makes it valuable to put and you can spend $10,000 on it, because odds are if you’re a cloud provider, you can get VDI customers who want to do virtual desktops and you can get inference and you can get training workloads and you can decide how to change them over time. That’s useful. So you can’t tune so precisely you end up not using silicon to its max, but you also can’t just throw everything on a generic CPU and pray for the best. That’s not going to work anymore, either.
So with that as our backdrop, what’s IBM’s move with Power10, Power11, and Power12? How do you fit Power into this world? I don’t want this to be just a niche thing that just does AIX and legacy IBM i. There’s nothing wrong with that. But I want it to be more than that.
Stephen Leonard: This does spin off the conversation on GPU acceleration. You know, you start looking at where specifically AI, we get graphics acceleration and these other types of acceleration, but particularly, you know, AI math libraries and algorithms.
When we go to look at where the enterprise use case, which is where the bulk of our client opportunity is in accelerating standard commercial workflows. That can be anything from fraud detection to mortgage approvals and so on. And most of these use cases are not deep computing. It’s inferencing, it’s machine learning, but it is not super advanced math and it’s not scientific. It’s about processing the data really quickly and embedding it in the workflow. And we see there’s a big opportunity there for us both with Power9 and Power10. With Power10, we have put a lot of the libraries in am dot is 20X plus faster than Power9. If you are doing inferencing on the chip, therefore you don’t you don’t have to offload the thing. So, you know, the efficiency factor is way higher. You don’t have to write all sorts of specialized code necessarily, you know, and build complex infrastructure environments to offload and onload and do all these things. And our estimation is that is 80-plus percent of the addressable opportunity for commercial clients.
At one point, remember, people thought that every company in the world would be, would have a data science division, you know, all deep data scientists who were experts in building algorithms and everyone would all need their own specialized clusters of high-performance accelerated computing environments. And it’s just not it’s just not panned out that way. We are this seeing in places like pharma and other research heavy industries. but not generally. And the typical commercial user has a need for AI computing, but not necessarily offloading for acceleration purposes.
TPM: Is it too much grief and too much money for the amount that they need?
Stephen Leonard: Yes, exactly.
TPM: It’s much better to take a vector engine and let it do INT4 and INT8 instructions and speed it up that way. That’s enough, right?
Stephen Leonard: Right.
TPM: So my contention would be that if you are if you are Google and I’m going to throw some weird numbers out here – if you are Google, 25 percent of your computing is AI in some fashion or something like that. If you are an enterprise, maybe it’s going to be 3 percent. It’s going to be a very important 3 percent, but a database is a database, a transaction is a transaction, a Web server is a Web server, you know, most of what you’re doing is not going to change. And I’m talking about enterprise only right now. I’m not saying the HPC division of ExxonMobil. I’m not counting that in this discussion at the moment.
But most of what you’re going to do is train a model once and you might train on the cloud and run in your system. That’s what I would do. Power will be their inference engine, but they’re going to be training on GPUs over here in the cloud. Do you have to build a Power farm to help them train their model?
Stephen Leonard: We do that in a way. With Power9 we have accelerated Nvidia products with NVLink, but we haven’t really made a decision yet on Power10 about whether we want to support it. Obviously we will support PCI-Express 5.0 and that could be what our connection to fabric might be. But we haven’t figured out yet how much of a market opportunity there will be for us for NVLink or NVSwitch.
Where we see a big opportunity for our AIX and IBM i clients are with two things: modernization and hybrid cloud. Those are the two big, big opportunities. And they both kind of go hand in hand because they’re kind of similar, but they’re not the same, if you will.
What we’ve seen through the last craziness of the last however long it’s been – 14 months now? – people are sort of revisiting the idea that they are going to rewrite everything in cloud native code to the idea that cloud native code is great for extending current capabilities. So I have a set of current capabilities and if I want to extend those, leveraging multiple data sources and writing something up on cloud native can be very useful. But they don’t want to rewrite their core workloads. In fact, they have no interest whatsoever.
TPM: There’s no benefit to it because all you do is introduce a tremendous amount of risk into your business. To what end? If you’ve got code works, algorithms deeply buried in the code, which encapsulate the very business you run, you’re going to go and change all that? You spent 30 years making sure it works right? And now you’re going to just port it to Python or PHP? Why? It makes zero sense.
Stephen Leonard: If there was some magic code converter that you could just like a sausage machine, pour your RPG and one end of it and out of the other end came this wonderful cloud native application, then fine. But there isn’t a magic code converter, and there isn’t because those things are complex, complicated, deeply technical – and the cloud native stuff is still an early phase of life, if you will. So these tools are great for extending capabilities, not for changing and rewriting core business applications.
And we believe that we’ve got a real role to play there because both with hybrid cloud, you can move IBM i and AIX workloads to the clouds. And there are many cloud providers – we happen to believe that the IBM Cloud instantiation of running AIX and IBM i is the most flexible and the most complete solution out there.
TPM: I happen to agree with the production and not because I’m talking to you, but because Google is sitting on its thumbs with IBM i. SkyTap is doing a pretty good job, which means Microsoft Azure is, too, I suppose.
Stephen Leonard: Two years ago, one of the key moments for us as a team was when we realized that if we’re going to go cloud, we have to go all in. And if we have to go all in day, we have to make it as simple and as easy. And you have to be as you have to be complete. You have to be end to end. You have to be able to offer the same industrialized operating environment in the cloud as you can get on prem, you can’t sell some poor second cousin or some half solution that you can do only half of the things you need on. So resilience and backup and recovery and all the operational stuff has to remain the same. And that really got us into this.
We are not going to ride a cloud native version of AIX or IBM i. We’re going to integrate AIX and IBM i very closely into the IBM Cloud datacenters. That is what we call a frictionless migration opportunity, that you can move workloads from on prem to off prem without having to refactor applications, without having to rewrite operational procedures, because this is about the operational procedures as much as it is about the application code. It’s about what do I do to keep the thing running and what do I do to keep the service accessible? And what I do when something goes wrong? You dial up capacity in the same way, on premises or in the cloud, and you move applications around. You can build the same way, and you get billed the same way, so the experience is the same. You can’t you can’t force customers to have two different experiences.
And I have to tell you – and you have been dealing with IBM a long time – that was a little bit of a religious war within IBM, as you might imagine.
TPM: There are roughly 120,000 IBM i customers. There are tens of thousands of AIX customers. If you put a gun to my head, I’d say it was 48,000 worldwide. I don’t know the precise numbers, and I think some days IBM doesn’t really know how many there are, either. And that’s OK, because nobody else knows about their platform base sizes, either. So that’s not a slam. That’s just an observation.
How many of those companies are just going to move to the cloud? They’re done. How many companies are going to run in actual hybrid mode? How many are going to not be interested in that except for backup and recovery? When you think about the cloud opportunity, what is it shaped like and how is it going to change over time?
Stephen Leonard: It’s funny you ask that because what we’re learning as we go through this journey with clients is that it’s all of the above, right? [Laughter]
TPM: This is why I’m asking you, OK? I can say a third, a third, a third. But I don’t have a better model. And I’ve thought about this a little.
Stephen Leonard: With customers with large ERP, CRM type applications where there’s where there’s a data centricity, I think you’ll see more than 50 percent of those clients stay on prem for their production workload – not necessarily for everything, but for production. And what we’re seeing with a lot of them is disaster recovery, resilience, test/dev, all of that stuff can move to a flexible IaaS model, fantastic.
We are seeing some customers who just want to add the cloud because, you know, if you think about our cloud offering and the way I just described the way it’s architected, that it is you know, it is pretty secure because you’re not in the shared-everything architecture from a processor point of view that you are in traditional native cloud environment. And also you get over a whole bunch of the scaling, especially memory scaling and memory bandwidth issues, with large databases – let’s call them single instance databases like SAP HANA – that you are heavily restricted within a native cloud implementation.
And so you look around and you go, well, why is it that with the size of the business that AWS has, the size of the business that Google has, the size of the business Microsoft Azure has, why would they even care about Power? Because there are architectural limits with what they can offer clients and they need to be able to accommodate those types of workloads. Otherwise, clients are going to say, you can you can service all my needs and maybe I need to look at somebody else. So, I think that’s why they all went down that path.
So it’s a hybrid cloud. It is a really important part of our future. And I think it gives us a level of flexibility. And like I said, we focus really heavily on this idea that it’s seamless and frictionless because, you know, if you’re going to start rewriting and refactoring applications, then the argument might even be, well, why do you just take it off the platform completely? Why would you even bother keeping it there? Most clients – I mean, nearly all clients – don’t want to move anything off the platform block. And they just want it to be able to run more efficiently and effectively. And they want to be able to integrate it into what they’re doing as they as they extend their own capabilities. They want to be able to do that from the core out, not from the outside into the core. And I think a lot of people have made that mistake, they’ve started that innovation stuff out on the cloud and they’re working their way back in. And as you work your way back in, you suddenly get all these mismatches and latency issues and data issues. And they’re suddenly realizing that that’s probably not the most effective and efficient way to do it.
So that then brings us to modernizing on the platforms. We actually think of our Power platform as being like a three legged-stool. You’ve got IBM i, you have got AIX, and you’ve got Linux. And Linux is an equal party in this relationship within IBM. And that it’s important because it opens a whole world of tools and applications and types of environments that we might not otherwise we wouldn’t be able to get economically onto IBM i or AIX. We’re big proponents of openness. We are doing a lot with open packages and tools, bringing them to the platform as fast as we possibly can. We think that’s an essential part of that of being attractive in the long term is being able to do that.
And specifically with Linux, it’s things like containers, no prizes for guessing which one we like the best –
TPM: It’s not a Hefty bag or a piece of Tupperware. It might wear a flashy fedora. . . .
Stephen Leonard: The decision was made for us when somebody decided to spend $34 billion acquiring one. So you have to really like from them.
TPM: But it’s the best one anyway, and it’s the most commercialized and it’s the most heavily adopted in the enterprise. You made the right choice. I totally get the Red Hat thing.
Let me ask a couple of pinpoint things here. How much of the Power Systems business is being driven by Linux today? You know, roughly it was in its 20 percent range there for a while, but I don’t know how much that was big HPC deals that caused the bubble.
Stephen Leonard: I would say it’s a quarter of the business. We would like it to be a third or possibly more. You know, I can envisage it being even half of the business at some point in the future because the workloads that we are bringing to that platform drive a lot of consumption. For instance, you can’t support the number of SAP users or database sizes on any other platform that you can support on the Linux on Power platform. It’s a physical limitation.
TPM: Let’s draw the Power Systems chart a different way. How much is HPC and how much is AI? Also, what are your thoughts about HPC and do you put HPC in AI together? Because I do in my head. I understand the distinctions quite well actually.
Stephen Leonard: We separate them now, but they were together before I took over. We’ve separated them into what I would call technical computing versus enterprise computing. Enterprise computing has a bunch of AI and that’s all the inferencing stuff that we talked about. It’s all about what happens when you have something at the edge that is collecting all this data where you want to compute against that.
TPM: That is an absolutely reasonable way to look at things, because I don’t think of AI as separate. And you can’t because it’s going to be embedded.
Stephen Leonard: Everywhere.
TPM: But in your platform in particular, 90 percent of your customers are not going to buy GPUs, or 99 percent, or whatever. You know, whatever it is they’re going to say, no Power10 has got all the stuff I ever could possibly need to run inference in my banking app or whatever.
So when you look at enterprise computing and technical, when you break that pie, what’s the shape of that pie and how it is changing?
Stephen Leonard: The problem is, HPC is such a lumpy thing. It is probably 10 percent, maybe 15 percent. I don’t see it being more than that.
But here’s the funny bit: In many ways, not getting the CORAL2 exascale supercomputer contracts was a kind of blessing in disguise because it forced us to double down and focus on the enterprise and what we’re going to do with Power10 in the enterprise. And that forced us to put our development efforts and energies around hybrid cloud, and modernization, and leveraging OpenShift and CloudPaks and Ansible and all these other capabilities, which we probably wouldn’t have had the arms and legs to do had we been busy, nose to the grindstone right now, trying to get CORAL2 out the door.