Power To The Tenth Power
August 17, 2020 Timothy Prickett Morgan
This is one of my favorite times of the year, with the Hot Chips symposium usually underway this week at Stanford University and all the vendors big and small trotting out their, well, hottest chippery. In this case, hot means “extremely interesting” but it often means “burning shedloads of watts” as well. But this is the time that the chip architects show off what they have been working on for four or five years and what has already been in production in recent months or will be in the coming months.
IBM tends to jump the gun a bit with its Power processors, and is doing so a little more than usual with the Power10 processor, which we frankly had hoped would be available later this year rather than later next year. But none of that matters. What does matter is that Power9 is giving customers plenty of headroom in compute at the moment and that Power10 will, thanks to the innovative engineering that Big Blue has come up with, be well worth the wait.
This is the kind of processor complex and system architecture that we have been waiting to see arrive for a long, long time. And we will be getting into the details of that architecture in the coming weeks after IBM’s presentation is done at Hot Chips this week. In the meantime, IBM talked with us about how Power10 extends the lead that the Power architecture has over X86 and Arm alternatives for enterprise systems and we are going to focus on that ahead of the Power10 preview and talk to the top brass at Big Blue about how they had better start thinking about systems differently and get people to start thinking about them differently – and then invest in IBM’s own technology and build the best damned public cloud in the world based on it. We are talking about a moonshot-class investment the likes of which we have not seen out of IBM since it invested $100 million to create the “BlueGene” protein folding supercomputer back in 1999 to break through the petaflops performance barrier.
So without further ado, here is the wafer of Power10 chips that have come back as early silicon from the fabs at Samsun Electronics, IBM’s manufacturing partner:
The research alliance that IBM set up with Samsung, Applied Materials, AMD, GlobalFoundries, and others many years ago has contributed tweaks to the 7 nanometer process that Samsung is using to make the Power10 chips, according to IBM, which is not just using Samsung’s plain vanilla 7 nanometer etching, which is called V1 and which uses extreme ultraviolet (EUV) lithography techniques. (Similar to the ones that GlobalFoundries, the former AMD fab cut loose several years ago, was working on for Power10 when it decided in August 2018 to spike the whole 7 nanometer effort, and importantly both flavors of 7 nanometer using regular lithography and using EUV were killed off. Thus driving IBM into Samsung’s waiting arms as a foundry partner for the Power10 chips. (Intel and Taiwan Semiconductor Manufacturing Corp were not going to get the deals, that is for sure.)
Samsung started building its V1 fab back in February 2018 and invested $6 billion in the effort in the first two years and has probably spend a few billion dollars more this year. Back in April 2019, Samsung said it was going to invest $115 billion between then and 2030 to build up its foundry both for its own use and for others like IBM. And it is about the safest bet that IBM has outside of GlobalFoundries when it comes to picking a fab partner, given its long history of collaboration with Samsung and the latter company’s desire to boost its merchant foundry credentials. Everybody – including Intel – had better hope Samsung gets good at this, because there are not enough deep pockets otherwise to allay all of the risk as we move from 7 nanometers down to 5 nanometers down to 3 nanometers looking ahead in the current decade.
We are not at liberty to say much about Power10 as we go to press for the Monday issue of The Four Hundred, but we will do a series of follow-up stories to drill down into different aspects of the machines, which we have been prebriefed about under embargo for later today. Here is one thing that IBM did allow us to share with you:
I have only seen the core count of the Power10 chip detailed in a few internal roadmaps, and all of them said that Power10 would have 48 cores. This made logical sense, given that Power8 maxxed out at 12 cores and Power9 maxxed out at 24 skinny cores (or 12 fat ones) across the same 96 threads per die, mostly enabled from the shrink from 22 nanometers with Power8 to 14 nanometers with Power9. It was logical to assume that with the shrink to 7 nanometers that the core count could double up again.
What we now know from the roadmap above is that with the shrink to 7 nanometers, IBM gutted the core design and started with a clean slate to maximize the new 7 nanometer process – something that we suspect it was not planning to do with the GlobalFoundries 7 nanometer process – and crammed 16 fat cores or 32 skinny cores on a die. Only 15 fat cores or 30 skinny cores are activated to help improve the yield on the chips, assuming that at least 1 in 16 of the cores will be a dud on the new 7 nanometer process, as IBM and Samsung are assuming. At some point, when the yields on the V1 process improve, IBM could activate that latent 16th core and there is an instant performance upgrade for those using a newer stepping of the Power10 chip. The gutting of the microarchitecture is what has allowed IBM to boost the core count from 12 to 16 per chip moving from Power9 to Power10, which is considerably more than expected.
With Power10, IBM is cutting down on the number of chips it is making, which will also help lower costs but it also calls into question whether there will be a single-core or even dual-core variant aimed specifically at smaller IBM i shops. (We will fight that battle later.)
Rather than having three different chip implementations – a half skinny chip and a full skinny chip for machines with one or two sockets and a full fat chip for big NUMA iron – as it did with Power8 and Power9, IBM moving to a single chip with fat cores and putting one or two of them into a socket to get 30 cores or 60 cores into a socket. This is a much more aggressive strategy, and interestingly, either the single-chip module (SCM) or dual-chip module (DCM) variants of the Power10 chip can be run in SMT4 (four threads per core) or SMT8 (eight threads per core) mode. This mode is not switchable by users, but by IBM at the time it packages up the processor. In the past, to get 24 cores meant running in SMT4 mode, or four threads per core, and not all systems had this capability. This was just a funny way of isolating threads and caches to lower the core count and therefore enterprise software licenses for SMT8 customers, but it also meant raising the per-socket price on software running on the 24-core Power9 variant for software that was priced based on cores and not sockets. It would be useful if IBM could make this SMT level settable at system boot, but it is hard-coded into the processor microcode that customers cannot change because of the software pricing issue mentioned above.
We strongly suspect that IBM never intended to do a monolithic Power10 die with 48 cores on it, but rather a 7 nanometer shrink of the 24-core “Nimbus” part with some tweaks and then put two of them into a single socket to create a throughput monster. With the Power10 chip as it will be delivered, IBM can, in theory once yields improve, provide customers with 33 percent more cores and, if history is any guide, somewhere around 3X the raw throughput at the 4 GHz design point that IBM has used for Power chips since the Power7 way back in 2010. (The Power6 had a 5 GHz design point, which was quite impressive but not sustainable because Dennard scaling and Moore’s Law scaling were running out of steam.)
We can’t say a lot about it right now, but this memory clustering technology, and indeed the whole memory subsystem of the Power10 chip, is the killer technology with Power10. IBM will be able to do things that other architectures simply cannot, with multi-petabyte memory clustering and sharing across large numbers of Power10 systems.
And that is why IBM has to be the one to invest in building and using these systems, to demonstrate their capabilities, and to make sure Power10 systems are available on the IBM Cloud on Day One of their launch – and in huge numbers, not in prototype and proofs of concept onesies and twosies here and there around a dozen or so cloud regions. This is not about drinking the Kool-Aid, which is easy enough, but eating your own dog food first, as we say in this IT business. IBM has to move its own apps to its own cloud running on Power10 iron and be the case study that others can learn from and benefit from.
There’s plenty of time between now and the end of 2021 to make that happen, and IBM i customers as well as those running AIX and Linux should all be invited to come along for the ride.