Power7+ Chips Juiced With Faster Clocks, Memory Compression
August 13, 2012 Timothy Prickett Morgan
I have my ticket booked to head west to the Hot Chips 24 conference hosted by Stanford University, where IBM, Oracle, Advanced Micro Devices, and Intel are expected to talk about just announced and impending processors. But Big Blue seems unable to contain its enthusiasm for the Power7+ chip that it will talk about alongside its next-generation zNext processors for its System z mainframes.
A little more than a month ago, I told you about some of the details on the forthcoming chip that could be scrounged from poking around the Intertubes. From a die shot of the Power4 through Power7+ families of processors that IBM has shown to customers and partners, I was able to discover that the Power7+ chip had eight cores, just like the Power7 chip the precedes it.
It wasn’t clear how much on-chip, shared embedded DRAM L3 cache memory was on that chip from looking at the die, but it was clearly more and thanks to a performance document published on IBM’s developerWorks site, we know that IBM is boosting the L3 cache size from 4 MB for each local core segment on the Power7 chip (for a total of 32 MB) to 10 MB per core on the Power7+ chip (for a total of 80 MB). This is a tremendous amount of cache memory and is four times what Intel has put on its latest “Sandy Bridge” Xeon E5 server processors.
All that extra cache memory, which should have a dramatic effect on performance, is enabled because of the shrink from the 45 nanometer processes used to etch the Power7 chips to the 32 nanometer processes used for the Power7+ chips. But there are some other changes to the chip in addition to making the cores smaller (the cores are basically the same) and wrapping more cache around them. IBM’s roadmaps have been talking about accelerators, and if you poke around patches to the Linux kernel, you can see what some of them are. As previously reported:
IBM’s chipheads were talking to the Wall Street Journal about the upcoming Hot Chips conference, and Satya Sharma, an IBM Fellow and CTO of the Power Systems line who leads the development of the Power7 and Power7+ processors, let slip that the clock speeds on Power7+ chips would be 10 to 20 percent higher than those on the Power7, which range from a low of 3 GHz on a four-core chip used in the Power 720 entry server to a high of 3.92 GHz in the Power 780 with all eight cores turned on, of 4.14 GHz in that chip running in turbo boost mode with half the cores turned off, to 4 GHz in an eight-core chip used in the Power 795 and 4.25 GHz in a four-core variant also used in that big box. That puts the possible range of clock speeds for Power7+ chips between 3.3 GHz and 5.1 GHz, but there could be wiggle there as IBM might get more clocks on the smaller chips and less on the larger ones. (Traditionally, IBM revs the processors on its biggest boxes faster to boost single-thread performance, so this would be a departure.)
I was guessing that IBM would boost the clock speed on the Power7+ chips by between 25 and 30 percent, with the top bin parts spinning at above 5 GHz and in the same range as the current z11 engines used in the System zEnterprise 114 and 196 machines, a quad-core chip that spins at 5.2 GHz. (IBM will also apparently be boosting the clock speed on the zNext processor to 5.5 GHz, up from the 5.2 GHz used on the top-end z11 processor used in the current System z line.) We’ll find out about the clock speeds in two weeks from the presentations at Hot Chips.
After I got back from holiday last week, I asked Big Blue for clarification on the statements made about the Power7+ chip in the WSJ, and this is the statement I got from Big Blue:
“Power7+ leverages 32 nanometer technology to provide increased frequency, 2.5X L3 cache, security enhancements, and memory compression with no increased power over previous generation Power7 chips.”
The interesting bit in that statement is a reference to “memory compression.” The AIX 6.1 operating system from 2010 was given a feature called Active Memory Expansion, a data compression algorithm implemented in software and tied to the Power7 processors that could do 2:1 squeezing on main memory. This data compression does two things: it allowed more stuff to live in main memory, and it also allowed for CPU utilization to be driven up in the system, pushing more work through it.
On one benchmark test (PDF) running SAP ERP applications on a 12-core Power 7 server with 18GB of physical memory, the memory was maxxed out but the CPU was only at 46 percent and the machine only handled 1,000 SAP users and delivered 99 transactions per second of performance. With Active Memory Expansion turned on running AIX 6.1 on this system, the box was able to boost main memory by 37 percent to 24.7 GB. The SAP workload could then push CPU utilization up to 88 percent (some from the memory compression), but now the machine supported 1,700 users and did 166 transactions per second. That’s 70 percent more users doing 65 percent more work.
Active Memory Expansion imposed overhead on the Power7 CPU, but it is possible that IBM has etched the algorithms for crunching memory into the Power7+ chip, therefore eliminating the overhead on the cores in the processor. Also, if this memory compression is etched onto the chip, then it presumably could be used by Linux and IBM i operating systems, which do not currently support it. It will also presumably be a free feature instead of a charged feature, as it was with the AIX-Power7 combo.
“There should be nothing surprising here, as IBM has always followed a model of mapping processor architectures in the next generation of silicon to improve the value to the customer,” explained Ron Kalla, chief engineer at IBM for both the Power7 and Power7+ processors, in an email exchange. “If you go back all the way to the RS64 processors, we mapped those into multiple technologies, adding a few new features along the way. This time, between Power7 and Power7+, we used the technology slightly differently. We decided to hold the power envelope and die area constant so we can easily plug upgrade existing systems while providing increased frequency.”
So the Power7+ chips will slide into the current Power7 sockets, which is a good thing for customers and IBM alike.
“We also invested the additional transistors provided by 32nm technology in a few ways,” continued Kalla. “We added eDRAM cache, which provides a high performance return on area and added on chip accelerators to offload work from the processor cores so more workload can be done by the existing cores–this has the same effect as adding cores. We also made security enhancements to provide higher levels of protection for our customers’ data.”
IBM doesn’t publish thermal ratings for its various Power processors, which come with four, six, or eight cores with varying clock speeds. (There may also be differences in L3 cache, but IBM has never said so.) I will try to get some sense of where they are at in terms of power consumption and heat dissipation when I am out at Hot Chips in a few weeks.