Inside IBM’s Power S924 Power9 Entry System
February 26, 2018 Timothy Prickett Morgan
It is hard to say for sure, but the two most popular Power9 machines that IBM will sell in the current and coming years will probably be the single-socket Power S914 or the two-socket Power S924. A lot depends on performance, which we don’t have metrics for yet, but one of these machines is going to be the volume leader – unless they are neck and neck.
We flipped a coin to decide which machine to cover first, and heads was the Power S914 and tails was the Power S924, and the quarter showed the American eagle and not George Washington, so that means we are going to share our thoughts about the Power S924 machine first.
In case you need reminding, this is the current family of “ZZ” Power9 entry machines, which IBM debuted on February 13:
The Power H924 is a special variant of the Power S924 that is tuned up to run SAP’s HANA in-memory database, and the Power S914 is basically a single-socket version of the Power S914 that has some limitations, particularly for the four-core variant of the machine in the P05 software tier for the IBM i operating system.
We did our initial coverage on the Power9 entry machines here, where we talked to IBMers Steve Sibley, the product offering manager for the Power Systems iron in the Cognitive Systems group, and Brad McCredie, the IBM Fellow who runs the OpenPower Foundation as well as who heads up Power chip development. We followed it up with a drilldown in the system mechanics and motherboards of the six different flavors of the Power9 ZZ systems a week later. This week, we want to do a little before and after comparison between the machines, in this case between the Power S824, based on Power8 chips from 2014, and the new Power S924 machines, based on the Power9 chips that are coming out four years later.
We do not believe that the Power9 processors are socket compatible with the Power8 chips, given the radically different I/O coming off the processors, but they are similar enough in terms of physical size and heat dissipation that the design of the systems did not have to change all that much even though four years has passed.
The Power S824 came in a 4U chassis and also let customers have one or two processor cards, each with 512 GB of main memory at a maximum configuration. The Power S824 had three different processor options. The first was a six-core Power8 chip running at 3.89 GHz and the second was an eight-core Power8 chip running at 4.15 GHz; customers could buy one or two of these for the system. The final option has a twelve-core Power8 chip running at 3.52 GHz, and customers were required to buy two of these processor cards at the same time, for a total of 24 cores in the system. The Power S824 had seven PCI-Express 3.0 slots with a single processor card and eleven slots when the second processor card is added. This Power S824 machine was rated at between 72,000 and 230,500 CPWs of online transaction processing relative performance using IBM’s Commercial Performance Workload test.
Here is how the Power S924 compares to the Power S824 across the feeds and speeds:
There are a couple of things to note here. The first is that the top-end processors in the Power S924 are running a little hotter than the top end Power8 chips did in the same 4U form factor. The Power S924 processor cards come in three flavors: an eight-core version running at a base 3.8 GHz, a ten-core running at a base 3.5 GHz, and a twelve-core running at 3.4 GHz; these chips can dynamically scale up to 3.9 GHz or 4 GHz, depending on the thermals of the machine and the nature of the workloads. It is hard to compare the processors because the Power8s were running at peak speeds and the Power9s have a base speed and then a dynamic peak. Generally, there are two more cores per processor in the Power S924, which accounts for a big chunk of the performance boost for the machines. Software, unfortunately, is priced per core in a lot of cases, and if it was priced per socket, this would be free performance. But, that incremental core count increase has a software cost to get that performance.
There are some other important differences in the machines. The aggregate PCI-Express bandwidth has gone up by 67 percent to 80 GB/sec, balancing out the fabric bandwidth on the processing complex and presumably not out of whack with the memory bandwidth feeding into it from the other side. We suspect that the Power9 machines are more balanced, when all is said and done, and are more faithful to the high I/O throughput of Power iron historically. It is not clear how much of a benefit all of the extra memory bandwidth on the Power7, Power7+, and Power8 machines was compared to Xeon iron, or even earlier generations of Power iron, from a practical standpoint. We don’t know a lot of customers who fill up memory slots to get the full bandwidth of the machines, mainly because customers have been taught to leave a little space in the machines for memory capacity expansion. Memory is so expensive, and workloads so varied in their need for main memory capacity and bandwidth, that it is difficult to give a hard and fast rule. The fact is, customers have to know if they are memory capacity or memory bandwidth constrained, and buy accordingly. Making a mistake in this task could be very costly indeed, both in terms of performance and actual money.
This table above, culled from IBM presentations to customers and business partners, also shows that both the 12-core SMT8 chip and the 24-core SMT4 processor will be available in the Power S924 and the related Power S922 (which jams two Power9 processor cards and fewer peripherals into a 2U chassis). These SMT4 chips have not, as far as we know, been announced. But they could come out if enough customers want them, and they can run IBM i in theory and, we would argue, might be better suited to those building IBM i clouds, given the modest performance needs of many IBM i shops on legacy iron. The other neat thing in the processor complex is that the fabric bandwidth – the combination of the X Bus NUMA interconnect bandwidth and the other ports like CAPI and OpenCAPI, has nearly doubled to 74 GB/sec on the Power S924 machines.
As for memory, the capacity has doubled up to 4 TB maximum across the four sockets, but the shift away from buffered memory means that the memory bandwidth across the sixteen memory slots per socket has dropped. The peak memory bandwidth on the Power S824 was 192 GB/sec, coming in at 173 GB/sec on the STREAM memory bandwidth tests; the Power S924 tops out with 20 percent lower bandwidth at 153 GB/sec peak and with 33.5 percent less on the STREAM test at 115 GB/sec. Part of the problem is that IBM is only supporting 2.13 GHz memory in systems that have ten or more of their memory slots loaded up per socket. Intel “Skylake” Xeon SP systems can have memory running at 2.4 GHz and 2.67 GHz in fully loaded machines, although you have to pay a hell of a processing premium to get machines that support 1.5 TB of memory per socket and therefore give capacity as well as bandwidth on the same order of the Power9 machines. Anyway, a two-socket, top-end Xeon machine has somewhere between 200 GB/sec and 225 GB/sec of sustained memory bandwidth on the STREAM test, which is on par with what IBM is offering. Intel has six memory controllers running a little faster, IBM has eight running a little slower and it all kinda washes out. The big point is that IBM is using the same industry standard memory sticks, so the price differential should disappear and the odds in memory will be even.
The peripheral expansion with the Power S924 are similar, but a little different. The pair of internal NVM-Express M.2 flash drives will be useful as boot devices, and having one fewer I/O drawer might be a problem for customers who want to hang a lot of storage off of these machines.
We look forward to doing price/performance analysis on the Power S924 machines compared to the prior three generations of Power iron – once the CPW ratings for the machines are out in a few weeks.