Power10 Midrange Machine: The Power E1050
August 15, 2022 Timothy Prickett Morgan
This is fifth part of our in-depth coverage of the entry and midrange Power10 machines that were announced on July 12. This week, we end the hardware deep dives with a look at the Power E1050 midrange machine. This is one of the best and most capable servers that Big Blue has ever designed, and it is a damned shame that it does not run the IBM i operating system.
But, as we have pointed out before, it can any time IBM is convinced by enough of you that it should. The lack of IBM i support is a marketing thing, not a technical issue. Given that the Power E1050 can support the Hardware Management Console that many large AIX and IBM i shops seem to prefer to manage their systems, that the Power E1050 almost certainly offers better bang for the buck than the “Denali” Power E1080 system launched last September that has roughly twice the performance and I/O scale, and that this machine would be perfect for clouds in that it scales far and yet is compact and relatively cheap, we think IBM i should be supported on the Power E1050. No customer should be forced into the Power E0180 when it is not needed or desirable.
Moreover, as we have pointed out when IBM did not support IBM i on the Power E850 and the Power E950 before it, support for the Power E1050 is a matter of principle. IBM i is a first-tier operating system alongside AIX and Linux, and it should be supported on every machine that Big Blue sells. Full stop.
With that out of the way, let’s drill down into this excellent machine.
The Power E1050 was revealed in announcement letter LG22-0035. As with the other Power10 entry and midrange machines announced on July 12, the Power E1050 was available on July 22, with a 256 GB memory card feature that gets these machines to their maximum 16 TB of memory capacity shipping on November 14.
Here is how the Power E1050 rack-mounted machine stacks up against its Power E950 predecessor:
Like many, but not all, of the Power10 entry servers, the Power E1050 is based on a dual-chip module (DCM) implementation of the Power10 processor package. With a DCM, two entire Power10 chips are put into a single socket and linked by their OpenCAPI buses using very low-latency links.
In this case, the individual Power10 chips have 6, 9, or 12 processors active, and that means each socket has 12, 18, or 24 cores active across the two chips in the DCM. We don’t have any IBM i performance figures to rate the Power E1050, since IBM i is not officially supported on the machine, but we do have performance specs for AIX workloads, which are gauged using the rPerf relative performance benchmark. Like the Commercial Performance Workload (CPW) benchmark used to rate IBM i performance on specific CPU configurations, rPerf is based loosely on the TPC-C transaction processing benchmark but it is not the same as CPW. There is a correlation between rPerf and CPW performance, but they handle I/O and data placement slightly differently and rPerf demonstrates better absolute performance than CPW, but we think is a less realistic absolute gauge. As relative gauges go for CPU capacity comparisons, CPW and rPerf are both fine.
In any event, the cores running in the Power E1050 offer up to 39 percent more raw performance than the Power9 cores used in the Power E950 based on the Power9 processors. When you add up the cores across the machine, the system performance is anywhere from 1.4X to 3.1X that offered by the Power E950. When you stack that up against the prior Power E850, Power E850C, and Power E950 servers that came before the Power E1050, there is a lot more capacity in the new machine, as this chart shows:
While this chart doesn’t show it, the performance of the Power E1050 is approaching that of the “Fleetwood” Power E980 based on the Power9 processor from 2018, and beats the Power E870 and Power E880 based on the Power8 processor from 2014. The Power E1050 would make an excellent upgrade for these older big iron boxes, too.
Now to give you the feeds and speeds underneath this performance. The 12-core Power10 DCM is feature #EPEU, and it has a base clock speed of 3.36 GHz and can turbo up to 4 GHz. The 18-core Power10 DCM used in the Power E1050 has a base clock speed of 3.2 GHz and can turbo up to 4 GHz. And the 24-core Power10 DCM used in the machine has a base clock speed of 2.95 GHz and can turbo up to 3.9 GHz. All of these chips, like other Power10 SCMs and DCMs, have an idle clock speed of 2 GHz, which cuts down dramatically on power consumption when they are not working hard.
The other thing you will note is the massive memory bandwidth in the Power E1050 system. This machine has half as many memory slots as the Power E950, but the IBM-designed differential DIMM (DDIMM) memory that makes use of the OpenCAPI interfaces for memory, deliver 78 percent more bandwidth across those memory controllers. That 1.6 TB/sec of memory bandwidth offered with the Power E1050 just blows the crap out of any other four-socket server on the market, and rivals the memory bandwidth of a GPU accelerator that uses special HBM stacked memory.
Again, there are reasons to choose a Power server.
Here is the layout of the system board on the Power E1050:
On the left, there are slots for ten NVM-Express flash drives. Like all other Power10 machines, there is no SAS or SATA storage on the machines and no disk drives. On the right are the PCI-Express slots for peripheral controllers and accelerators. Just like the Power10 entry machines, the Power E1050 can have NVM-Express flash drives in 800 GB, 1.6 TB, 3.2 TB, and 6.4 TB capacities. That means the Power E1050 tops out at 640 TB of local storage capacity.
IBM has PCI-Express 5.0 controllers on the Power10 chip, and therefore the Power E1050 system, but some of the I/O lanes are exposed as PCI-Express 4.0 slots. (You just use half as many lanes to accomplish this since PCI-Express 5.0 lanes run twice as fast as PCI-Express 4.0 lanes.) The aggregate I/O bandwidth of the Power E1050 is 750 GB/sec, which is only 20 percent higher than the 630 GB/sec offered in the Power E950. IBM would not intentionally constrain I/O bandwidth, which must mean that the Power E950 had more I/O than was probably necessary.
We are OK with overengineering things so long as it doesn’t add to costs.
Let’s talk a bit about DCMs and NUMA. The operating system, whether it is IBM i, AIX, or Linux, sees each Power10 DCM as a unique processor. This is not some sort of technical trickery to make two processors look like one, which can be done but is totally unnecessary for the sake of either the operating systems or the applications.
Unlike some topologies of systems that are, deep down, a hierarchy of a quad of two-socket servers that are hooked together with one or two or sometimes three hops between processors, the Power E1050 has a flat topology where every Power10 chip in every DCM has a single hop across the OpenCAPI NUMA link to every other DCM in the compute complex. This allows for very tight coupling of compute and memory and linear scaling of performance from one to four DCMs and across two to eight Power10 chips. Like this:
In many ways, the interconnect for the Power E1050 is better than for the Power E1080, which scales less linearly. The Power E1080, as we talked about last fall, is really a set of four four-way server boards hooked together using Power10 single-chip modules that have 10, 12, or 15 of the 16 maximum cores active on the die. The Power E1080 is about offering the maximum core count possible inherent in the Power10 chip balanced against the potential cores and the actual memory slots, and OpenCAPI links available on the processor for interlinking. The Power E1080 has one hop NUMA links on each four-way, and then another NUMA hop to a Power10 chip on an of the other three system boards. Depending on contention, accessing data between sockets can take more hops as data requests are routed around the system, but we think the number of times this happens is pretty small.
The Power E1050, by contrast, uses dual chip modules and has enough OpenCAPI ports to provide the memory for the system and to do an all-to-all-interconnect between the processors such that each of the Power10 chips in the system can communicate directly with the other Power10 chips and their memories. So it is a one-hop, all-to-all interconnect. This is a luxury in any architecture, and something that we historically have seen for only two-socket machines and more recently for four-socket machines. That IBM is doing this for what is in essence an eight-socket machine – how many chips are in a socket is arbitrary, and IBM could have just used eight physical sockets to build the same machine – is remarkable. Demonstrating once again that there are advantages to the Power architecture that are not available in the Intel Xeon SP, AMD Epyc, and various Arm processors aimed at servers.
Here is what the I/O subsystem looks like in the Power E1050:
All of the eight “Cirrus” Power10 chips in this machine are meant for compute and, unlike the Power S1014 and Power S1022s, the second Power10 chip in the DCM does not have all of its cores turned off and it is not being used as an I/O switch for the first Power10 chip that does have compute. If you needed more I.O and less compute in a Power E1050, we are certain you could file a PRPQ with Big Blue to get this on a special bid. For all we know, you could do this in microcode. And for all we know, IBM itself does this in microcode.
So let’s actually take a look at the Power E1050 machine. Here’s the front view of the system:
Here’s a top view with the lid off:
That is one dense packed machine, right? Big Blue is really good at that. The blue cards in the center are pairs of DDIMM memory, two side by side in each slot. They are 4U tall, and come in 32 GB, 64 GB, and 128 GB capacities today, with 256 GB versions coming in November, as we said above.
And here are the mechanical drawings for the front and back of the system:
I just like how the CAD programmed drew those out, which is why I included them. I wish we had such software back in the early 1980s when I was doing this stuff by hand in pencil.
Provided we can get pricing information, we will dive into performance and price/performance analysis to help you compare these Power10 machines to prior generations of Power Systems entry and midrange iron. We will probably do an overview of IBM’s competitive analysis for the Power10 entry and midrange machines first, though, since we have that data.