Chip Makers Strut Their Stuff at ISSCC
February 19, 2007 Timothy Prickett Morgan
Chip geeks and semiconductor mavens from around the world converged on San Francisco last week to show off their latest innovations at the annual IEEE’s International Solid State Circuits Conference. This is one of the two annual events where a lot of advances in chip design are first revealed to the world–the other being the Hot Chips conference hosted in the summer. At last week’s event, Intel, IBM, Advanced Micro Devices, Sun Microsystems, and PA Semi showed off future server microprocessors.
IBM’s dual-core Power6 chip was the first chip to be detailed at ISSCC. The Power6 chip, which is expected to first appear in IBM’s System p AIX and Linux servers this year and in its System i proprietary servers possibly in 2008, will be made using a 10-level 65 nanometer copper/SOI/low-k chip process that is being perfected right now in the company’s East Fishkill, New York, chip fabs. IBM is now confirming that the Power6 chip will have a clock speed in excess of 5 GHz and will consume less than 100 watts. IBM is also confirming that the chip has 700 million transistors and will have a die size of 341 square millimeters.
As previously reported, the Power6 chip includes VMX vector co-processors for each core and a new decimal floating point unit that does “money math” instead of the normal base 2 math done by processors. This is the first time anyone has put a decimal FP unit into a production chip, and given the commercial nature of the System p and System i product lines, this is not surprising. (It would also not be surprising to see the decimal FP unit as well as VMX co-processors appear in future processors for IBM’s System z family of mainframes.) Each Power6 core will have its own dedicated 2 MB L2 cache and a shared 32 MB L3 cache for each chip (which has two cores).
Intel provided a few more details on a research project it created to push the limits of number-crunching on a single piece of silicon by putting 80 RISC processor cores onto a chip. The company talked very generally about this project back in September 2006 at Intel Developer Forum, which is it now referring to as a “network on a chip.” This chip consists of 80 tiles, as Intel is calling the floating point cores, arranged in a 10 by 8 grid. These cores, which only do mathematical calculations and which are not based on either the X64 or Itanium instruction set architectures (but which probably are a subset of the i960 RISC processor Intel created more than a decade ago), operate at 4 GHz. The chip includes routers that link the math units together so they can share the results of calculations with each other, much as nodes in massively parallel supercomputers do today to model weather, chemical processes, physics experiments, and other kinds of complex natural systems.
The Intel RISC chip includes fine-grained clock gating, which is a technique that allows sets of transistors to be turned down to their idle state if they are not required by current transactions. Fine-grained clock gating is something that all chip makers will eventually work into their designs because it radically reduces the power consumption on the chip. The Intel chip can deliver 1.28 teraflops of aggregate peak floating point performance–about what you can cram into a rack of X64 servers these days–and Intel says that running on 1 volt of juice, it can deliver 1 teraflops of performance and only dissipate 98 watts of heat. This is stunning. This chip is 275 square millimeters and has 100 million transistors–that’s pretty big for a chip with relatively few transistors, and particularly so given that the chip was made using Intel’s 65 nanometer chip process. The chip also needs to have a memory controller and memory chips grafted on it–something that the company is working on right now.
Intel also talked a bit about its dual-core “Merom” Core 2 Duo chip for laptops, which are due to be tweaked this year with faster front side buses and higher clock speeds. The Merom chip Intel will show off at ISSC has 291 million transistors and has an area of 143 square millimeters. It has a shared 4 MB L2 cache for the two cores, with clock speeds that range from 1 GHz to 3 GHz and a bus speed that ranges from 666 MHz to 1333 MHz.
AMD was sandwiched in between the Intel RISC and Merom chip presentations, and revealed some more details on its future quad-core “Barcelona” Rev F Opteron processors. AMD put out some of the specs on the Barcelona chips a few weeks ago, boasting about the true quad-core nature of its design, which includes a revamped Opteron core with faster math processing and what it says will be better support for virtualized environments.
But last week at ISSCC, AMD wanted to talk about the power-saving features of the Opteron chips. The initial Opteron processors–the so-called C0 stepping in the industry lingo–did not have AMD’s PowerNow power management features, which were originally created for the laptop variants of its Athlon processors. But in February 2004, PowerNow was added to the CG stepping of the Opterons, which plug into the 940-pin sockets AMD created for the Opterons. The PowerNow features allow the voltage and clock speeds of the older Rev E and the current dual-core Rev F Opteron processors to be stepped down into 200 MHz increments in five stages (for a total reduction of 1 GHz off the top-end clock speed), and then drop down to a base idle speed of 1 GHz. The average Opteron processor running a heavy load might be rated at 95 watts, but it only burns about 70 watts of juice typically and with PowerNow features, that number can be dropped to around 32 watts in the idle state. That is a power reduction of 75 percent from the maximum load to idle load in the Opteron chip. This is a tremendous reduction in power consumption and heat dissipation for systems that have uneven workloads–which is true of most computers.
While the Opteron processors were designed to have two and four cores from the getgo, the PowerNow features were not designed to gate the power consumption of each individual core on the chip, but rather the chip as a whole. With the future quad-core Rev F Opterons, AMD is going to be able to step down in 100 MHz increments with PowerNow–providing more granularity in the performance and power consumption reduction of a chip–and will also allow each core on the chip to be individually gated using PowerNow features.
The new Rev F design also splits up the other connectivity circuits on the chip (called a northbridge) that link processor cores to the on-chip main memory controller. So even if a chip has a workload that requires a lot of CPU but little memory, the memory subsystem can be stepped down to an idle state. Or, conversely, if a workload is memory or I/O intensive but is not doing a lot of calculation, then the CPU cores can be idled. The memory controller accounts for around 10 watts of the 95 watts in a standard Opteron part.
By adding these technologies to the Barcelona chip, AMD is confident that it can increase performance (probably around 60 percent or more on generic workloads) and still keep within the same thermal envelope. That includes 2 MB of on-chip shared L3 cache and 1 MB of L2 cache per core, by the way.
Sun was on hand at ISSCC to talk about its second generation of “Niagara” Sparc T1 processors. Sun went over its Sparc processor roadmap at its security analyst meeting two weeks ago, and said that it was adding a two-way variant of the Niagara-2 chip called “Victoria Falls.” The Niagara-2 chip has eight cores, each with eight threads and each with a floating point unit. The current first generation Niagara chips have eight cores with four threads each and a shared floating point unit for the entire chip. The Niagara-2 chip that Sun is detailing at the show has 4 MB of L2 cache, one x8 PCI-Express slot, two 10 Gigabit Ethernet ports, and 8 FB-DIMM memory slots driven by an on-chip memory controller. The Niagara-2 chip has 500 million transistors and a 342 square millimeter die size; it is implemented in an 11-layer, 65 nanometer process by Sun’s fab partner, Texas Instruments. Niagara-2 chips are expected to be in servers in the second half of 2007.
Finally, upstart PowerPC chip cloner PA Semi showed off its dual-core PA-6T processor, which runs at 2 GHz, has its own crossbar interconnect implemented on chip for scaling out single system images, and has 2 MB L2 caches on chip for the cores. The initial PA-6T processor has a tiny 115 square millimeter die size and only consumes 25 watts on peak workloads; it runs at between 5 watts and 13 watts on normal workloads. The chip uses clock gating techniques to offer considerable performance, and will be in production in the fourth quarter of this year.