Power Systems Inspire New z13 Mainframe
January 19, 2015 Timothy Prickett Morgan
Back in the old days, the mainframe and midrange divisions of IBM rivalled each other almost as much as they took on competition from outside the walls of Big Blue. But since the mid-1990s, when the company first started converging its system lines and made sure they could all run Java and its application server, the different system units of IBM have been collaborating and converging. Now, after selling off its System x division to Lenovo Group last fall, IBM is down to two system divisions within a single IBM Systems group.
The first machine to come out of the new IBM Systems group, which is being led by Tom Rosamilia, familiar to the IBM i community as a former general manager in charge of the Power Systems division as well as the System z mainframe division, is the System z13 mainframe, which was announced in New York City last Wednesday to much fanfare. The System z13 machine looks to be coming out a little earlier than many had expected, and I think that IBM actually moved the announcement up at some point in recent months. IBM’s System z techies were set to divulge all of the feeds and speeds of the new eight-core z13 processor at the heart of the new mainframe at the International Solid State Circuits Conference that runs from February 22 through 26. IBM did not provide much in the way of details specs for the new z13 chip, but Mike Desens, vice president of System z development in the new IBM Systems group, gave me some insight into the new processor and the systems that wrap around them. And as has been the case in the past, the Power and z processors are designed by a single processing team and are borrowing technologies from each other. This does not, however, mean that IBM is creating a converged processor that can support either Power or z instruction sets. IBM has not done that, to date, and to do so would be a Herculean engineering task. It is far easier to have two different chips that share common elements wherever they can.
The new z13 chips are implemented in the 22 nanometer process at the fab in East Fishkill, New York, that is now owned by GlobalFoundries. The System z13 machine makes use of processors with six, seven, or eight working cores and mixes and matches them to get a varying amount of active cores across the product line. There are five different models, which offer scalability from 30 to 141 total cores that are configurable by end users in the system; the largest machine, the System z13-NE1, actually has 168 physical cores in its refrigerator-sized cabinet. (This 22 nanometer process is the same one used to make the Power8 processor, which comes in a variant with six-cores that are put two per socket into machines and another with a dozen cores on the same die.)
Like other chip manufacturers, IBM uses the chip manufacturing process shrink to add more transistors, and therefore more features, to the chip. In the case of the z13, IBM had to keep one eye on boosting the single-threaded performance of its core z/OS workloads and the overall scalability of the box to run lots of virtualized workloads while at the same time goosing the performance of the chip for in-line analytics for transaction processing or making generic Linux workloads run faster. The clock speed that IBM chooses for each System z processor generation is set based on the thermals and throughput constraints of the design, and as has been the case with the Power chip family, sometimes clock speeds go up and sometimes they come down as IBM is goosing the performance. With the Power7 chips, IBM was able to double the performance while radically goosing the core count from two with the Power6 to eight with the Power7 while at the same time cutting the clock speed because of a radical redesign of the core. With the z13, IBM is similarly dropping clock speeds and yet boosting single-threaded performance.
To be specific, the z11 chips, which had four cores running at a top speed of 5.2 GHz, was implemented in a 45 nanometer process when it came out in 2010. A single z11 core delivered about 1,200 MIPS of raw computing capacity running at full throttle, as gauged by the mythical measure of mainframe oomph. The z12 chip came out in the summer of 2012, and it had six-cores clocking in at 5.5 GHz, with each core delivering about 1,600 MIPS of performance. The z12 chip was etched in 32 nanometer processes, and IBM used the process shrink to goose the clock speed by 6 percent to boost the core count by 50 percent. The z12 chip had a new out-of-order execution pipeline and much larger on-chip caches to further increase single-threaded performance. The new z13 chip implemented in 22 nanometer processes runs at 5 GHz, mainly to cut back on heat, and yet offers about a 10 percent performance bump per core thanks to other tweaks in the core design. This includes better branch prediction and better pipelining in the core, just to name two improvements.
The z13 chip also has much larger caches, which IBM feels is the best way to secure good performance on a wide variety of workloads that are heavy on I/O and processing. Specifically, the z13 core has 96 KB of L1 instruction and 120 KB of L1 data cache. The L2 caches on the most recent generations of mainframe chips are split into data and instructions caches, and in this case have been doubled to 2 MB each. The on-chip L3 cache, which is implemented in embedded DRAM (eDRAM) as on the Power7 and Power8 chips, has been increased by 50 percent to 64 MB shared across the six cores. And the L4 cache that is parked out on the SMP node controller chip in the System z13 machine has been boosted to 480 MB, a 25 percent increase. The System z13 tops out at 10 TB of main memory, three times that of the predecessor zEnterprise EC12 machine.
All told, says Desens, the changes in the cache hierarchy smooth out the SMP scalability of the system, and a top-end System z13 will have about 40 percent more aggregate MIPS than the largest System zEnterprise EC12 from two and a half years ago. I estimated that zEnterprise EC12 machine at a 75,000 MIPS of total capacity, and that puts the new System z13 at 105,000 MIPS.
To give you a sense of what that might mean in terms of Power8 performance, IBM’s own performance documents from the Power4 generation say that to calculate the rough equivalent performance on IBM i workloads, take the MIPS and multiply by seven and that will give you an approximate ranking on the Commercial Performance Workload (CPW) test that IBM uses for OS/400 and IBM i database and transaction processing work. That means a top-end System z13-NE1 model would be rated at about 735,000 CPWs. A 256-core Power 795 using 4 GHz Power7 chips had about 1.6 million CPWs, and Power E880 with 64 Power8 cores running at 4.35 GHz delivers 755,000 CPWs. Roughly speaking, the Power E880 is delivering 12,000 CPWs per core while the new System z13-NE1 is delivering around 5,200 CPWs per core, a least based on my MIPS estimates and the MIPS-to-CPW ratios. Everything comes down to cases, and the important thing is that both the Power8 and z13 systems offer lots of capacity. (IBM has sophisticated Parallel Sysplex clustering to lash multiple z13 machines into a single compute engine, too, and IBM has not really talked about its DB2 for i Multisystem clustering for about 15 years. But as I have said before, it should.) The other thing to remember is that the performance numbers for the Power 795 and Power E880 have four-way and eight-way SMT turned on, respectively, and this significantly boosts performance on thread-friendly workloads. Like by a factor of 50 percent moving from two to eight virtual threads, according to the internal IBM data that I have seen. IBM will very likely increase the SMT virtual threading on future System z processors, and will probably get to eight-way at some point, perhaps with the z14, perhaps with a z13+ if such a thing is ever announced.
Some z13 workloads are going to run a lot faster than these raw performance estimates imply, and that is because some technologies that have been in the Power chips for years are now making their way into mainframe engines. First, IBM has implemented simultaneous multithreading in the z chips for the first time. SMT is a hardware virtualization technique that allows for a single physical pipeline in the processor to be virtualized, allowing for compilers to schedule instructions and data movement into the pipeline more efficiently. The SMT in the z13 chip is two-way, meaning that it presents two virtual pipelines for the physical pipeline in the core; IBM did two-way SMT in the Power6, four-way SMT in the Power7, and has eight-way SMT in the Power8. As is the case with the Power chips, this SMT is automatically and dynamically configurable based on the workloads. For software that likes threads, these virtual threads can really boost performance. IBM has also added SIMD–that’s single instruction, multiple data–vector math units to the z13 chip, also the first time it has done so.
The two-way SMT helps Linux workloads run up to 32 percent faster than on z12 chips, says Desens, and the combination of SMT threading and SIMD units in the z13 can help Java8 applications get as much as 50 percent more throughput per core. (Those fatter caches and wider pipes into them help a lot here, too.)
The sales pitch for the System z13 machine is interesting in that Big Blue is talking about how a single box can process 2.5 billion transactions per day, and that mobile computing is driving up transaction volumes on an exponential scale. Now that we can look up anything at any time, we do, and this is driving up traffic on back-end databases and transaction processing systems for the companies that are part of our lives such as banks, insurance companies, and such. The ability to use the various kinds of computing to do risk analysis and fraud detection while transactions themselves are being composed and processed is not something that is unique to the System z mainframe. All of the hardware pieces are there to do it on the Power Systems platforms, too. The question is this: Will IBM’s marketing point this out, and will it similarly peddle its Power-based systems?