A Bit More Insight Into IBM’s “Spyre” AI Accelerator For Power

October 20, 2025 Timothy Prickett Morgan

If anything is clear right now, it is that Nvidia does not need to get any richer in the GenAI revolution, and neither does its foundry partner, Taiwan Semiconductor Manufacturing Co. But everyone else, including IBM and specifically its Power Systems business, desperately needs to do something to catch the GenAI wave and make some money in this once-in-a-millennia opportunity.

Our thesis, as you well know, is that for IBM i shops, code assistants that can help document, update, modernize, or port RPG and COBOL application code to newer languages and modular programming techniques is the killer app for GenAI, and offer a chance for IBM to double the revenue stream for Power Systems and very likely to increase its profits. And the advent of homegrown transformer models with expertise in RPG and Java, which IBM has been working on for at least a year and a half, meant IBM could capture some software revenue, and the addition of matric math units on Power10 and Power11 processors as well as the creation of the beefier “Spyre” matrix math card meant IBM could capture some hardware revenue rather than just passing that money over to Nvidia for a bunch of its GPU accelerators. Which are hard to get anyway, and expensive. And which are not terribly integrated with the Power Systems platforms: IBM i, AIX, and Linux.

The Spyre matrix math accelerator is derived from work done by IBM Research in boosting generic AI machine learning processor, which in turn was picked up by IBM Poughkeepsie to add a different matrix math accelerator to the z16 and z17 mainframe processors than was used in the Power10 and Power11 chips. (We can debate the wisdom of this at some point in the future.) This AI Acceleration Unit was extracted and replicated massively on a single chip and then plunked down onto a PCI-Express peripheral card that we know as Spyre. Think of it as a special matrix math unit that complements the vector units and matrix units on each Power10 and Power11 core.

As we reported last week, IBM has signed a partnership with model builder Anthropic to use its Claude Sonnet 4.5 model as the central model in its “Project Bob” code assistant. The Anthropic models are widely regarded as being the best for understanding and generating code and has expertise is all kinds of languages. But IBM’s Granite models are probably the best at RPG in particular given all the work that Big Blue has done to train them.

Our point here is that with Project Bob, which is based on a mix of models, some of which have to be licensed, IBM’s profits are necessarily smaller but the appeal of the Project Bob IDE and code assistant might be broader and deeper in which case it generates more revenue more quickly than it would with distinct Watsonx code assistants for RPG on IBM i and COBOL to Java conversion on mainframes. That original Watsonx plan was always too limited, and IBM’s top brass has realized this before any damage was done. That said, it would have been better to have code assistants actually generating revenue right now, not promises of them in the future, which is what we got at IBM’s TechExchange 2025 developer conference last week. But it takes time to build out the Red Hat Linux and OpenShift stack to run code assistants and integrate them with IBM’s homegrown Spyre accelerator as well as Power and z processors.

On October 7, in announcement letter AD25-1422, IBM announced that the Spyre accelerator would be available on z17 mainframes and their LinuxONE variants starting on October 28. The announcement letter doesn’t say much in the way of feeds and speeds or pricing, but IBM tells us that the Spyre cards will be sold in eight card bundles with PCI-Express enclosures with RHEL subscriptions and enterprise support. In announcement letter AD25-1365, IBM said that the Spyre bundles and chassis would be available for Power11 servers (not Power10 or Power9) with logical partitions running RHEL 9.6 or higher starting on December 12.

On Power machinery, the Spyre hardware will include the Spyre Enablement Stack for Power, which has a PyTorch backend as well as precompiled AI models that are set up to offload processing from Power11 chips to Spyre cards as well as various runtimes, device drivers, and card firmware to make the whole shebang work. IBM has created a RHEL.AI inference server for inference, which includes all kinds of optimizations for Spyre accelerators, such as retrieval augmented generation from data sources such as relational databases running on Power Systems iron. Starting in the first quarter of next year, IBM will deliver a variant of the OpenShift.AI Kubernetes stack that is optimized for AI processing on Spyre accelerators and that runs in conjunction with RHEL.AI.

One last thing we learned from IBM. The Spyre accelerator has features to allow for live partition mobility (LPM) migration of partitions running Spyre accelerators on RHEL. You cannot do LPM of AI inference or training workloads running on Nvidia or AMD GPUs attached to Power Systems machines.

Some of the details of the configurations of the Spyre accelerator were announced in announcement letter AD25-1386. The Spyre device is a PCI-Express 4.0 card, and the feature #ENZ0 Expansion Drawer can hold eight adapters that have a combined 1 TB of memory with 1.6 TB/sec of bandwidth. This is not a huge amount of memory or bandwidth compared to even a single “Blackwell” B100, B200, or B300 GPU from Nvidia or a single “Antares+” MI325X or MI355X GPU from AMD. But that doesn’t matter so much if the inference workload expected on Power Systems applications is fairly light and IBM Spyre cards cost a lot less than Nvidia and AMD GPUs.

Let’s do some math. The B200, for instance, has 192 GB of HBM3E memory with 8 TB/sec of bandwidth and a single one costs somewhere between $35,000 and $50,000, depending on the nature of the deal. Call it $40,000 to keep the math simple. Just based on memory and bandwidth alone – and inference is very much a memory bound workload. So, eight Spyre cards, just based on memory bandwidth alone, have one-fifth the bandwidth of a single Nvidia B200 card, so those eight cards should cost around $8,000, or $1,000 a pop. It is possible that IBM could charge that little for the cards and then charge a lot of dough for the expansion chassis. But we don’t think it will price it that low. But it might be $2,000 or $3,000 a card, and the price to value here would be that Spyre is absolutely integrated with IBM i and AIX through RHEL partitions running the Spyre stack. No need to learn the Nvidia AI Enterprise stack and pay $4,500 per GPU to use that software.

We hope IBM eventually provides pricing for all of this AI hardware and software that it expects Power Systems customers to invest in. We would also like to see some real performance benchmarks on real-world applications with AI enhancements, not just a chart like this:

This is not really all that helpful. But this screen shot of the enterprise use cases in the Spyre stack for Power is interesting:

And so is this screen of digital assistants being integrated with the Spyre stack for Power:

Where is that Spyre for Power Redbook?

This Issue Sponsored By

Table of Contents

Content archive

Recent Posts

Subscribe

Pages

Search