Big Blue Is Still Talking About Future Power Processors, Which Is Good

May 18, 2026 Timothy Prickett Morgan

The details are really thin, as usual, but IBM is still talking about the advantages that the Power10 and Power11 processors have over other server-class CPUs in the market today, as Power chief architect Bill Starke did at the most recent POWERUp conference in New Orleans. And Starke always ends his presentations with a hint or two about what Power Next or Power Future – what we would call Power12 – might look like.

Many of the things that Starke talked about during his two presentations at POWERUp 2026 are things that we have been talking about with Starke for years – the advanced I/O and differential memory approach that is unique to the Power10 and Power11 chips and that bring IBM tremendous benefits in terms of CPU memory capacity and memory bandwidth. Given the built in memory area network that is part and parcel of both the Power10 and Power11 chips, we have wondered some time that IBM does not do more to package this up and sell it to help boost analytics and AI workloads. IBM could even use its BlueLink I/O ports to more tightly couple Power processors to various kinds of AI accelerators in a tightly coupled configuration with lots of shared memory across the CPUs and accelerators.

Thus far, IBM has played it cool and very slowly rolled out its Spyre accelerators, which were designed by IBM Research several years ago and which are comprised of matrix math units perfect for AI workloads. We did a report on Spyre for Power back in October 2025, but the most detailed drill down I did on Spyre was over at The Next Platform in IBM’s AI Accelerator: This Had Better Not Be Just A Science Project back in October 2022 and IBM Shows Off Next-Gen AI Acceleration, On Chip DPU For Big Iron in August 2024. The initial AI Core chip that Spyre is based on came out in 2018, so this has not really been a fast development. If you are sensing some impatience on your behalf, you are sensing right. Anyway, a portion of the Spyre chip is embedded on the “Telum II” Z17 mainframe processor as is a homegrown DPU that IBM can now offload I/O functions to rather than burning Z cores to do this work.

Without Starke saying anything about it, we already expected for IBM to consider taking out the MMA matrix math unit that is on the Power10 and Power11 processors today with the Power12 processor. This Power MMA, as far as we know, is not used by any IBM i and AIX shops. We certainly have not heard of any software that exploits it. To be sure, it is great that every Power10 and Power11 core has an MMA unit as well as vector units to do complex math. But it would be better for IBM if its two homegrown processors – the Z and the Power – had the same matrix math units and could use the same math libraries.

The Z17 processors puts the baby Spyre IP blocks in the Z17 package, but not on every core. And we expect for Power12 to do the same thing. This way, there is only one matrix math unit for both chips. It would not be surprising to see IBM unify the vector units in Power12 and Z18, too. Why not? Math libraries can be tweaked so functions are supported without having to rewrite code using them. Frankly, it is a wonder this has not already happened.

Starke presented a chart for the future Power12 that hinted at this possibility:

You will not something else interesting about this. IBM is taking a “cores out approach” with the future Power12 CPU, much as AMD does with its Epyc X86 processors, with the memory controller, NUMA clustering, and I/O die in the center of a galaxy of cores wrapping around it. Current Arm server CPUs have an “I/O out” approach with one or two and sometimes four chiplets with just cores in the center and I/O and memory controllers wrapped around those cores and linking everything to the outside world.

As you know, we are big proponents of having as much AI run natively on the CPU as possible, which is inside the security domain of the processor and is therefore the safest place in the world to have AI routines running. It is also the cheapest, since the vector, MMA, or Spyre IP blocks are built into the processor and this incremental math function is “free.” But not only is it free, it is fast.

This chart from Starke’s presentation lays it out:

On a server with a Power10 or Power11 processor that has an external accelerator like a GPU or even a Spyre accelerator, it will take on the order of 700 nanoseconds to reach out to the accelerator from the applications running on the CPU to have it process data. That PCI-Express 5.0 x16 link that is used to attach the accelerator to the CPU has 256 GB/sec of bi-directional bandwidth, which is not too shabby but it is nowhere near the current 900 GB/sec bi-directional bandwidth of Nvidia’s NVLink coherent memory ports. But, the thing is, a hop from the CPU to the Nvidia GPU will still take somewhere between 500 nanoseconds to 800 nanoseconds, depending on how close it is to the CPU and whether or not it is routing through an NVSwitch memory fabric.

Offloading a calculation from a Power10 or Power11 core to its MMA takes 1 nanosecond, sometimes 2 nanoseconds. That’s it. And if your model can be run on CPUs with MMAs because it is modestly sized, you can keep a hell of a lot more AI model weights in the main memory of a Power10 or Power11 processor, which support up to 4 TB of memory per CPU socket, compared to the few hundred gigabytes of a GPU accelerator from Nvidia or AMD. (In theory, IBM can support up to 16 TB of main memory per Power10 or Power11 socket, but economics really limits this, especially with DRAM memory being as expensive as it is these days.)

For those of you who have not drilled down into the Power architecture deeply, Starke’s presentations were no doubt informative. But we have seen this material in different ways over the many years before the Power10 processors were even launched so many years ago. IBM has been shipping chiplet implementations of the Power line since way back in 2005 with the Power5+ processors. More recently, there have been dual-chip module (DCM) implementations of Power9 and Power10 processors, but these are just putting two whole chips into a single socket, not breaking the socket into independent compute, memory, and I/O elements and then using them in different ratios to meet different needs.

As we have said before, we think that IBM wants to work with Samsung on an advanced 2.5D interposer and the chiplet design along with what we think is a process shrink to 3 nanometers rather than the 5 nanometers we all expected for 2025 with the Power11 chip. Starke has been clear that a move from 7 nanometers with Power10 down to 5 nanometers for Power11 did not provide enough benefit for IBM to do it. With the 5 nanometer process being a lot more expensive than a refined 7 nanometer process, it is not a hard decision to not move to 5 nanometers if it is not necessary. But, if IBM wants to add a lot more stuff with Power12, a shrink to 3 nanometers seems appropriate.

And if it does that, IBM can easily get to 2030 without having to do another Power processor to suit the needs of IBM i and AIX shops. Depending on what happens in the next five years in the datacenter, there may or may not be a Power13 after that. Who can say?

What I can tell you is that for the vast majority of IBM i and AIX customers, a Power10 or Power11 processor is plenty of machine, and a Power12 will be even moreso. But if native CPU GenAI takes off, then Power12 will have to be pretty impressive if IBM wants to keep AI on the CPU or its Spyre accelerators.

This Issue Sponsored By

Table of Contents

Content archive

Recent Posts

Subscribe

Pages

Search