Power6 Gets Second Silicon, IBM to Crank the Clock
February 13, 2006 Timothy Prickett Morgan
Some of the mystery surrounding the future Power6 processors from IBM, due in 2007 in a new generation of pSeries, iSeries, and OpenPower servers, has been dispelled as Big Blue presented a bunch of papers at the International Solid-State Circuits Conference in San Francisco last week. The Power6 chip is in second silicon, and IBM felt comfortable enough about the state of the chip to do a little bragging.
As I reported last fall, after talking to Vijay Lund, vice president of server and storage development at IBM’s Systems and Technology Group, the Power6 chip will have approximately 750 million transistors, a feat made possible by the transition from chip making processes with wires 90 nanometers in size to those with 65 nanometer widths. Such radical shrinking in process allows transistors to be etched on silicon much smaller, which, in general, allows processor cycle times to be jacked up to boost performance. I actually held a single-chip version of the Power6 chip (presumably what IBM calls a dual-core module, or DCM), and including the packaging, it was about 1.5 inches square and about a quarter inch thick. The entire bottom of the chip was covered with gold knobs that Lund referred to as IBM’s “C4” chip interconnect; a technology that he says is derived from Big Blue’s mainframe technology. Lund said back in October that Power6 was due in 2007 and hinted that many system functions are going to be incorporated into the Power6 chip–things that might have otherwise ended up in custom ASICs on the systems or inside low-level microcode.
What IBM would not say last year is how many processor cores and other goodies would be in the Power6 chip. And it would not talk about the instruction pipeline, and would not say anything about the caching structure of the processor and the related servers. There was rampant speculation in the server market that IBM would boost the clock speeds on the Power6 chips and would do so by lengthening the pipeline–something that all chipmakers have done many times. IBM could, of course, boost performance by moving from two to four cores on a die. There were a lot more questions than answers.
But last week, after giving a presentation at ISSCC, IBM’s Joel Tendler, director of technology assessment for the Systems and Technology Group, and Brad McCready, one of the many engineers working on the Power6 processor, lifted the veil on the Power6 chip a little higher. As it turns out, the Power6 chip is a dual core processor, and it will look very much like a Power5 and Power5+ chip conceptually; with the Power5 and Power5+ chips having 276 million transistors, clearly there is a lot more stuff going on inside the Power6 chip than just two cores, given that there are nearly three times as many transistors in the package.
Exactly what is going on with all of those transistors is unclear, and neither Tendler nor McCready were in any mood to spill all the secrets. But they did say that the clock speed on the Power6 chips would be in the 4 GHz to 5 GHz range, and in a surprising move, the pipeline in the Power6 chip would be about the same length as in the Power5 and Power5+ chips. So how does the chip do approximately twice the work of the Power5+ chip it will replace next year? One of the secret sauces, says McCready, will be larger cache memories. “The caches will scale up, like the clock speed, but the structure will be similar to what you see with Power5,” he said. And as with the Power4 and Power5 chips, buses between cache and main memory and I/O interconnections to the outside world that the Power6 chip uses to get data will scale up with the clock speed, too.
IBM is also adding electronics that allows it to take a central clock and use it to branch out to other parts of the chip that dynamically reduce or increase the cycle time of the electronics as needed by the workloads. By having a distributed clock, the chip can run on a lot less power (you have to boost the clock signal a lot as you add more and more, and this consumes a lot of juice) for a given amount of performance. Tendler said that IBM has also added sophisticated circuits that get each transistor to do more work. Using a measure called a gate delay, Tendler offered this comparison. On one test IBM ran, what used to take 22 stages of logic to accomplish in the Power5 and Power5+ chips (and logic is expressed in transistors) now takes 13 stages in the Power6 chip. This means that IBM can move stuff through the Power6 pipeline and keep it better fed without lengthening it, which in turn means it can jack up the clock frequency and do twice as much work in the same or lower power envelope. And, because IBM hasn’t messed with the pipeline, it can also product lower clock speed versions of the Power6 chip that will throw off a lot less heat than the current Power5+ chips. IBM has hinted in the past that with the Power6 generation, it would be able to put a full-blown, enterprise-class Power processor into a blade server, and it looks like it is going to accomplish this goal. While the PowerPC 970 chip is a decent processor, it does not have the performance of a Power5+ chip.
McCready confirmed the speculation that IBM would probably deliver Power6 chips in similar packaging as the Power5 and Power5+ chips, too. That means we can expect dual-core modules with either one or two cores activated and plugging into a single socket; a quad-core module (QCM) with two whole chips sharing a single socket (new last October with the Power5+); and multi-chip modules that cram eight cores and four 36 MB L3 caches on a single piece of ceramic. IBM has not said it if will add L4 cache to the Power6 architecture, but it might.
It also seems unlikely that IBM will scale its Power servers beyond 64 cores; SMP scalability is not that useful beyond 64 cores, and operating systems and databases can only handle so many threads anyway. Right now, IBM is offering simultaneous multithreading that delivers 128 threads in its largest system and which adds approximately 30 to 40 percent performance improvement on thread-sensitive workloads. IBM could add more threads with the Power6 chip, of course, but there is diminishing marginal returns.
All of this begs the question of what all of those extra transistors in Power6 are doing. I think there is probably a much larger L2 cache–and for all we know, each core has its own cache now instead of a shared cache–and the L3 cache memory controller has been moved on the chip. I also think that IBM has thrown in a very large number of features that will allow its z/OS operating system to run in some sort of hardware-assisted emulation mode on the Power6 processors. Whether IBM will admit this, or even turn it on in 2007, remains to be seen. But there has been far too much talk about Project ECLipz, the supposed consolidation of IBM’s iSeries, pSeries, and zSeries servers onto the Power architecture that started in 2001, for it to be all hogwash. Remember, vendors may not outright lie, but they don’t exactly always tell the truth, either, about future products. Intel would not talk about 64-bit memory extensions in the Pentium 4 and Xeon chips, and kept denying it was working on them, even though if you did the transistor counts, you knew something was going on starting with the “Prescott” Pentium 4s. IBM has an awful lot of transistors that are unaccounted for in the Power6, especially considering that it is using transistors more efficiently.
The important thing for IBM’s pSeries and iSeries customers is that they will get Power6 machines that can do about twice the work of a Power5+ machine. IBM roughly doubled performance moving from the Power4 to the Power5 generation, so this is consistent. If a lot of those extra circuits are used to emulate zSeries instructions and allow z/OS applications to run in heavily degraded mode, so what? A 5 GHz Power6 chip running at one-third efficiency will deliver about the same MIPS as the current System z9 mainframes. Everybody wins–especially IBM, which can presumably make a more profitable mainframe this way. Consolidating the iSeries and the pSeries line has been a boon for IBM, and this will be too–if this is indeed IBM’s plan.