New System z15 Mainframe Takes The Heat Off Power Systems
September 16, 2019 Timothy Prickett Morgan
I don’t know if many of you work this way, but sometimes I have to say things out loud and follow that train of thought before I decide it is a good, bad, or neutral idea – or any of the different gradations in there and beyond these from absolutely wonderful on one end to improbable or worse yet impossible on the other end. It is a kind of branch prediction, and like modern processors for the past two decades, it is subject to Meltdown speculative execution vulnerabilities.
(That right there was a nerd joke. I think. Maybe. . . . )
In any event, the last time that IBM launched a new mainframe platform, the System z14 back in September 2017, I came up with the really crazy idea that instead of porting IBM i to an X86 processor or an Arm processor or anything else we can think of such as squirrels running on mouse wheels at high velocity with carbon nanotubes for fur, Big Blue could make a truly screaming data processing platform by porting IBM i to its own System z engines. I made the case that this machine would have higher single threaded and NUMA performance, and very zippy I/O and all kinds of things. I am not going to redo that whole argument here, or review the evolution of the System z product line and compare and contrast it to Power processors and Power Systems. Read that story here for all of that data.
What I am going to do, however, is breathe a sigh of relief that IBM has launched the much-anticipated System z15 mainframes, which in turn will breathe some life into the revenue streams at IBM’s Systems group now that a new mainframe cycle is starting. It also takes the heat off Power Systems a little, which is important as IBM has done a lot of research and is still doing a lot of development on the future Power10 processors and the systems that use them. IBM needs to be making good money on the z15 mainframe to better balance its overall systems business, and we have every reason to believe that this will happen once again, as it has for the more than three decades that I have been watching Big Blue. (There were a few bumps in the transition from bipolar to CMOS chips, to be sure, and we have some bumps coming in the years ahead when process shrinks all but stop. But don’t worry about that now.)
To get a sense of the new System z and now the mainframe is doing, I had a chat with Mike Desens, vice president of System z offering management, and Christian Jacobi, chief engineer of System z processor development. IBM is pretty secretive about the shipments and sales of the System z platform, and we don’t have the same read on it as we have developed for the Power Systems platform, but here is what Desens told us, and it is significant. Over the past decade, from 2008 through 2018 we presume, the aggregate installed base of MIPS computing capacity across all types of processing – z/OS engines, Linux engines, and zIIP and zAAP accelerators for Db2 and Java workloads – has grown by a factor of 3.5X. IBM’s chief financial officer, Jim Kavanaugh, has said in recent quarters that the System z14 line has had the fastest ramp and the largest increase in capacity shipped of any mainframe in history. Desens says that thanks to the z14 ramp in the past two years, overall System z MIPS capacity in the base has grown by 25 percent, and adds that new workload MIPS – zIIPs, zAAPs, and Linux – are growing at twice that rate. Finally, Desens says that over 55 percent of the installed MIPS is running on these new workloads – an others like Blockchain and AI are emerging. By the way, some customers have Linux-only LinuxONE mainframes and others are running z/OS with Linux running in what IBM calls an Integrated Facility for Linux (or IFL) partition.
What Big Blue doesn’t say – and we are not trying to rub IBM’s nose in anything – is that the unit revenue from that capacity keeps dropping because mainframe shops are increasingly deploying Linux and accelerators and because IBM is cutting System z hardware prices to keep companies interested. The mainframe cycles are a little longer, so IBM can make up the slightly downward curve in mainframe revenues over time and still generate what we think is about the same overall profits per generation. (And honestly, how else could you play this game any better than Big Blue is doing?) IBM has to cut prices to keep mainframe customers happy, and it has to keep investing in ever-more-expensive processor development. This one is not too bad because it is an incremental design using a slightly goosed 14 nanometer etching process from GlobalFoundries, which also made the z13 and z14 processors for IBM.
The z14 chip, which again we went into great detail about two years ago, was a 10-core processor – what IBM called a Compute Processor, or CP. The z15 bumps that up to 12 cores on the CP, while keeping the top bin clock speed at 5.2 GHz. Interestingly, according to Jacobi and thanks to substantial architectural work, the z15 core does about 14 percent more single-threaded work than the z14 core did, giving it a rating of 2,055 MIPS. That number is based on IBM’s Large Systems Performance Reference (LSPR) benchmark test, which is a composite commercial benchmark that the company has used to gauge the relative performance of mainframes for as long as I can remember and which, not surprisingly, runs on z/OS but accurately predicts the performance of commercial Linux workloads running on the mainframe. The maximum number of cores in a single system image for the z14 iron was 170, and with the z15 that has been bumped up by 12 percent. Add it up and take out a little NUMA overhead for multiple nodes, and the peak aggregate capacity of the biggest z15 has increased by 25 percent, to 178,000 MIPS.
Let’s give you a little perspective on that and a little context. The first CMOS mainframe engine delivers 6 MIPS. So that is a factor of 342.5X more oomph per core over the decades.
To give you a sense of what those MIPS ratings might mean in terms of Power9 performance, IBM’s own performance documents from the Power4 generation say that to calculate the rough equivalent performance on IBM i workloads, take the MIPS and multiply by seven and that will give you an approximate ranking on the Commercial Performance Workload (CPW) test that IBM uses to gauge relative performance for OS/400 and IBM i database and transaction processing work. So that z15 core is akin to having 14,385 CPWs of oomph and that 190-core z15 NUMA system is like having 1.25 million CPWs of aggregate performance. A fully loaded Power795 based on 256 Power7 chips running at 4 GHz had an aggregate capacity of 1.6 million CPWs, and a Power E880C with 192 cores of Power8 chips running at 4 GHz delivered 2.07 million CPWs. And a Power E980 with 192 cores of Power9 chips running at a base 3.55 GHz (and clocking up to 3.9 GHz as thermals permit, as they surely did on the CPW test) can hit 2.74 million CPWs. If these MIPS-to-CPW ratios are right – and I am not saying they are, I am saying these are what we have to try to compare the machines – then the Power9 architecture is winning, and largely, we think, thanks to the rich simultaneous multithreading (SMT) that the Power9 chip has.
Just for fun, here is the z15 chip, or CP:
The z15 CP has two centralized L3 cache controllers, and a segmented cache that has eight blocks of L3 cache shared across those twelve cores. (Don’t be fooled by the layout. That is a unified L3 cache.) The two X Bus links – one top right and the other bottom left – cause the core and L3 complexes to be offset from each other a little bit, but doing the offset helps improve the interconnectivity for the CPs in each drawer and links out to System Processor (or SP) chip that is also in the drawer to handle certain I/O functions for the processing complex. The memory controller runs across the top of the z15 chip, and there are three PCI-Express 4.0 controllers across the bottom. That’s 33 percent more controllers per CP and they also run twice as fast, which is a big goose in overall PCI-Express I/O bandwidth jumping from the z14 to the z15.
As with the z14 architecture, IBM has an L4 cache that is integrated into the NUMA chipset that links multiple banks of CPs to each other. Here is what that z15 SP looks like:
The two X Bus ports come in from a CP and then these are cross coupled through the SP chips using the A Bus ports. There are two X Bus ports and four A Bus ports in the z15, and that is twice as many A Bus ports. Here’s why. With the z14 machine, IBM crammed six CPs plus one SP into a drawer, and it required a custom rack that was quite a bit wider than a standard server rack’s 19 inches. With the z15, IBM is moving to 19-inch racks, but it can only cram four CPs and one SP into a server drawer in that smaller space. So it needs more drawers to scale up to that 190 maximum cores. There are from one to five drawers in the z15 system, which have from 34 to 190 customer usable processor units, as IBM’s z people call a core that can run applications. These cores can be configured to run z/OS or Linux or be employed as zIIP or zAAP accelerators.
The z14 system topped out at 32 TB of main memory, and the z15 tops out at 40 TB; that memory has RAID-style data protection across it.
By any stretch of the imagination, this z15 machine is a beast, and one that can hold its own against a cluster of X86 servers running Windows Server or Linux. The z14 brought data compression accelerators onto the cores and with the z15 IBM is adding data encryption. The combination of the two can help improve overall security and actually boost the performance and lower the latency of transactions ripping through the system. Desins says that mainframe shops can get anywhere from 8:1 to 12:1 compression of X86 instances by moving to mainframe partitions (this is presumably a Linux-to-Linux comparison). IBM says it can add 2.4 million Docker containers on a single top-end z15 mainframe and cram about 2.3X as many containers per core into its machines as can be done with current X86 servers.
It would be interesting to see how the same numbers work out for Power Systems machines.
One last thing: the z16 processor is currently in development and is expected around two years from now, etched in 7 nanometer processes from IBM’s fab partner, Samsung, and the z17 mainframe is being whiteboarded now and is expected four years from now, probably using 5 nanometer processes. This is by no means the end of the System z line.
Crazy Idea # 542: Port IBM i To The Mainframe
Mad Dog 21/21: Big Blue’s Big Beast Boost
Mad Dog 21/21: Sacred Families
Mad Dog 21/21: The Next (And Maybe Last) Mainframe
Mad Dog 21/21: The Mainframe Was The Message
Is there any word on the actual memory bandwidth of the z15’s implementation of the Centaur buffered memory?
I know that the 256GB and 512GB CDIMMs use 3DS DDR4, which can improve latency and throughput of certain workloads.
Does IBM not disclose the specific memory bandwidth figures? I can’t find it anywhere. The closest I came was something about z13 having 384GB/s drawer level memory bandwidth, but that’s two generations old.