We Want IBM i On The Future Power E1050
March 8, 2021 Timothy Prickett Morgan
We spend a lot of time at The Four Hundred thinking about the entry and midrange part of the Power Systems line and the many tens of thousands of customers who make use of these machines as their mission critical, back end, system of record platforms. But with the only Power10 machines coming out this year expected to be at the high end – call them the four-socket Power E1050 and the 16-socket Power E1080, if IBM iterates its currently used naming scheme – we have little choice but to start thinking of the big iron now and worry about the entry and midrange iron later this year as we anticipate their deliver in early 2022.
IBM confirmed that staged rollout of the Power10 processor last November at the Common Europe vCEC 2020 conference, which is about right for the big iron machines considering that such machines have tended to come at the end of the year, generally in August or September, after the entry and midrange machines come out early in the year, generally February or March, sometimes April and May – all depending on how the yields are doing on any given Power chip generation.
We drilled down into the Power10 architecture last August when IBM was talking about the processor at the Hot Chips 32 conference, which was held virtually of course thanks to the coronavirus pandemic. Based on information released at that time, we speculated on what future Power10 machinery at the high end might look like. We think that IBM always intended to use dual-chip modules with Power10, and in fact were convinced based on early roadmaps that we saw that IBM would use the shrink from 14 nanometer to 7 nanometer chip making processes to create much cheaper but essentially the same chips that had 24 SMT4 cores or 12 SMT8 cores (this is a setting in microcode, really) and then put two of these chips in a socket to get 48-core variants. We had hoped that IBM Power10 would have a true chiplet design – meaning that the cores would be broken out from the I/O and memory controller portions of the chip, as AMD has done with its existing “Rome” Epyc 7002 and impending “Milan” Epyc 7003 server processors. But for whatever reason, this has not happened. Instead, IBM started from scratch and created a monolithic Power10 chip design that has 16 SMT8 cores or 32 SMT4 cores, and is in certain circumstances putting two whole Power10 chips into a single socket to double up the performance in terms of threads and cores.
IBM has been doing dual-chip modules, or DCMs, since the Power5 generation in 2005, so we don’t think that this is a big stretch for Big Blue. (And we don’t think it will be – or should be – that remarkable for other chip makers to use a similar approach.) Given that IBM is shifting processes (from 14 nanometers to 7 nanometers), shifting foundries (moving from GlobalFoundries to Samsung Electronics), and core designs (with a completely rearchitected and revamped Power core), you can see why IBM might not want to take too many risks and also start pulling the monolithic Power chip apart and turning it into a bunch of discrete function chiplets. IBM instead used the 7 nanometer shrink to put 16 cores on the die in SMT8 mode (the one IBM uses) and then is only activating a maximum of 15 of these cores so it can increase the yields on the chips artificially. (This is done all the time on the chip racket. Normalized for the 4 GHz target clock speed of the Power7, Power8, and Power9 chips, the Power10 chip is offering 30 percent higher instructions per clock (IPC), which is a gauge of single-thread performance improvement, which is pretty impressive and which is due to the core revamp with the Power10. We are not saying IBM will be able to deliver all of that per core performance improvement – clock speeds could have to go down to improve yields or lower heat dissipation – but it is inherent in the design, and that is a pretty good jump in IPC by modern standards. IBM is chopping clock speeds from 4 GHz to 3.5 GHz to get two Power10 chips into a single socket, which doubles up the core and thread counts, but also cuts the performance by 12.5 percent per core. The net gain, in theory, is 1.75X more performance per socket.
This is why the Power E1050, as we are calling it, has four tightly coupled Power10 DCMs, and it is going to probably be the biggest IBM i box that any customer will ever need. By our reckoning based on the data we have, the Power E1050 system will have 120 cores and 960 threads and 32 TB of maximum memory capacity, and will have 2.5X the number of cores as the Power E950 based on Power9 SMT8 engines that does not support the IBM i operating system. We think customers should start lobbying now to make sure IBM i is supported on the Power E1050, especially considering that at the expected 3.5 GHz target clock speed, the Power E1050 will have about 2.85X the performance of the Power E950, driven by 2X the memory capacity and perhaps as much as 4X the memory bandwidth if it can use 3.2 GHz DDR4 memory compared to the 1.6 GHz DDR4 memory of the Power E950. It will support PCI-Express 5.0 peripherals, and perhaps both OpenCAPI and CXL accelerator links over PCI-Express 5.0.
As we pointed out last summer, many shops that might have needed a Power E870, Power E880, or Power E980 in the past are going to be able to do just fine with a Power E1050 – provided IBM allows enough I/O to hook into it and provided that IBM i is supported on the machine. Which was not, as we said, on its predecessor.
So, we need to fix that, and there is still, sitting here in March 2021, time for us to make sure IBM i is in fact supported on this future Power E1050 machine. This system will be perfect for building out IBM i clouds, either public or private, and we think would be a very interesting box as the hub in a memory area network of Power10 machines as I discussed in-depth over at The Next Platform last fall. The Power10 chip will support a kind of memory clustering that is akin to a Parallel Sysplex on a System z mainframe cluster, linking machines over their on-chip SERDES and memory protocols rather than through the I/O bus as we know it. IBM calls this capability “memory inception” and it literally allows multiple machines to share their memory. Like this:
Or, if you configure it differently, memory inception allows a whole bunch of machines to have no physical memory at all and access memory on machines in the memory area network. It is akin to a memory SAN, as the name we are giving it implies. Like this:
This could enable some very interesting IBM i clouds – and Linux or AIX clouds as well, or a mix of them. And frankly, with the latencies across the memory area network only being on the order of 50 nanoseconds to 100 nanoseconds, one could make a very powerful cloud indeed that could span all kinds of different compute and memory configurations. Imagine if you could suddenly have huge memory or huge cores, but just for a short burst and then turn them off.
This could be a serious competitive advantage, and one that much be available for IBM i shops – not just the Linux crowd.
So start pestering IBM now. And keep it up.