IBM's Power6 Gets First Silicon as Power5+ Looms (Continued)
<<< Click to return to the first part of this story
Contrary to what a lot of people have written based on earlier roadmaps and IBM's own statements, the initial Power6 chips will use the same 90 nanometer process that is used for the Power5+ chip. Further down the road--perhaps in the late spring or late summer of 2007--IBM will roll out its Power6+ chips using a future 65 nanometer processes.
Earlier this year, in clarifying Power6 clock speeds, IBM sources told me that the leap from Power5 to Power6 will involve a big jump in gigahertz--more than the jump from Power4 to Power5. The fastest initial Power4 clocked at 1.3 GHz, and the fastest Power5 clocks at 1.9 GHz, which is a bump of 46 percent. It seems likely that Power6 chips will probably start out at 3 GHz and then push up to 4 GHz. If it can keep the Power6 in the same thermal envelope of the Power5s, there is no reason not to do this. I think it highly unlikely that IBM will try to push clock speeds to 6 GHz as the initial Power6 specs suggested a number of years ago. Rather than do this, I think IBM will probably have brought more electronics onto the Power6 core to boost performance. If L3 caches are not shrunk and then integrated on the chip with the Power5+, you can bet IBM will do it with the Power6, and then possibly add an external L4 cache to keep those hungry processors fed.
IBM could, of course, add more processor cores to the Power chip with the Power6. Intel is trying to get its "Tukwila" Itanium chip out the door in 2007. Tukwila is expected to have at least four Itanium cores per chip and, like the Power6+, it will use a 65 nanometer manufacturing process. IBM could take this four-core approach with Power6+ and keep the clock speed relatively low on Power6 core and dial up the number of cores on the chip from two to four. Could is the key word here.
IBM has plenty of time to change its mind with Power6+, even if Power6 is done. Remember, Intel was going to ship Montecito a year ago, but them, after taking a drubbing from IBM with the Power4 chips, decided to make Montecito a dual-core rather than a single-core chip. IBM could redraw its roadmap for Power6+ in the same way, keeping the clock speeds low and doubling the cores. On multi-threaded jobs, a four-core Power6+ chip would have four physical threads and four virtual threads though SMT, and keeping the chip count the same as the Squadron boxes, that would mean a big Power6+ box would have 256 threads. This would help databases a great deal, but its value to big batch jobs would be limited. What seems clear is that we are going to have to figure out how to thread batch jobs on all computer architectures.
Having said all that, given IBM's whole "system on a chip" philosophy, I think Big Blue might put off four cores until Power6+ in 2007, and maybe even Power7 in 2008.
Take a look at the history (and breathe deeply before you read this): The Power4 chip put what were essentially two S-Star PowerPC cores with their own L1 caches, the L1 cache controllers, a shared L2 cache, and a single L2 cache controller onto the chip and put the L3 cache off the die. With Power5, IBM added simultaneous multithreading (SMT), doubled the speed of the distributed switch interconnection on chips so it ran at full clock speed (it was half speed on the Power4s), boosted the size of the L2 cache, went from two-way to four-way set association for the caches, moved the L3 cache controller into the chip, moved the L3 cache into the chip package and, most importantly, hung that L3 cache off the L2 cache with a direct link rather than making it go through the interconnection fabric of the MCM, which it did with the Power4. (This wickedly reduced memory latencies.) I think Power6 will include an on-die L3 cache for each core (or maybe shared by two cores), hung off of individual L2 caches (one per each core), plus an integrated L4 controller, and L4 cache that is implemented in the MCM packaging like L3 caches are today on the Power5s. As I speculated a few months ago, I think there is also a possibility that IBM ditches this hierarchical cache structure and creates a whole new scheme above the L2 caches in each core that boosts memory bandwidth beyond what is possible with a staged cache architecture.
Here's another interesting idea: Imagine if IBM used its thermal conduction module (TCM) technology from mainframes to put an entire 32-chip, 64-core machine in four blocks of ceramic, thus shortening many of the wires in a server complex and significantly reducing interprocessor and memory latencies to the very limits of physics? IBM could do this TCM packaging with the Power6, or hold off until the Power6+. IBM seems to have removed the distributed switch with the Power6 design and replaced it with "advanced system features." What is more advanced than a mainframe's TCM?
What seems clear is that the Power6 chip has been a major redesign, according to my sources, and much of this redesign is being driven apparently by the necessities of moving to a 65 nanometer chip making process. But it may also be done so IBM can do the full tilt TCM integration like it does in mainframes for its very high-end i5 and p5 boxes, as well as deliver single chip, dual-core Power6 chips for volume markets where a TCM is overkill. I think IBM is also committed to getting low-power, dual-core Power6s into entry and midrange servers, blade servers, and even embedded devices. IBM is concerned about power management, which is why it is merging simultaneous multithreading and multiple cores in the Power5 design. Both of these technologies make better use of transistors, and deliver performance without having to add significantly to clock speed.
IBM has also hinted that the Power6 chips will add a lot more functions for self-management from the microcode underpinning OS/400 and AIX, and now its Virtualization Engine hypervisor, into the Power6 chip itself. It would not be surprising for the large pieces of the virtualization embodied in the Virtualization Engine to somehow be implemented in chip transistors and firmware loaded into the processor. Intel and AMD are embedding X86 instruction set virtualization in their chips using their respective VT and Pacifica technologies. IBM could do something similar, providing electronic assist to Virtualization Engine.
The Power6 chip could, being implemented as a TCM, also consolidate the iSeries, pSeries, and zSeries lines down in some way to support mainframe as well as i5/OS, AIX, Linux workloads on the same processor complexes. This is the fabled "Project ECLipz," which IBM has not confirmed and has weakly denied. Exactly how mainframe workloads might be supported is unclear, but there is certainly a prospect of mixing and matching zSeries and Power6 processors within the same complex or TCM. Using mainframe simulation software from Transitive is also an option. That is how Apple is going to be supporting Power-based workloads on Intel's chips in its future machines. QuickTransit, Transitive's emulation software, can already support mainframe workloads on Power, Xeon, Itanium, and Opteron processors. IBM might go so far as to license Transitive's QuickTransit, implement much of its features in silicon, and put that inside a Power6 or Power7 TCM to make a hybrid mainframe-Power box.
For ECLipz, IBM could also implement zSeries processor instructions in "millicode," a kind of on-chip microcode that would create a CISC mainframe instruction from a bunch of RISC instructions. The zSeries processors already do this a little, by the way, and so does an Itanium chip do this when it is running HP-UX workloads since the Itanium doesn't support PA-RISC instructions. Even the Pentium chip that is probably on your desktop uses similar technology; that Pentium is not using the 80486 instruction set, but has a RISC-like core that assembles these 486 CISC instructions out of smaller RISC instructions. It just tricks the software into thinking it is running 486 instructions.
Whatever IBM has decided, with the Power6 chip in first silicon, whatever it is going to do in terms of core count and mainframe support can now be found out. It is now just a matter of time.
More on IBM's eServer i5 Plans for 2005 and 2006
IBM's eServer i5 Plans for 2005 and Beyond
IBM Plots iSeries Machines Out to 2010