Intel Talks Up X64, Itanium Roadmaps Ahead of IDF
Published: March 18, 2008
by Timothy Prickett Morgan
It's a few weeks before the Intel Developer Forum will capture a lot of the headlines in the IT trade press, but the company decided yesterday that it just can't want to talk about its future processors and their roadmaps. Especially with rival Advanced Micro Devices still not shipping bug-free quad-core Opterons and Athlons, other rival Sun Microsystems talking about its "Rock" chips slipping a year into 2009, and final rival IBM doing a ramp of its dual-core Power6 chips in slow motion, Intel can rub it in a little.
But only a little. Pat Gelsinger, senior vice president and general manager of Intel's Digital Enterprise Group, the unit that makes PC, laptop, and server processors and platforms, said yesterday that the quad-core "Tukwila" kickers to the dual-core "Montvale" Itaniums, would be demonstrated at IDF Shanghai in early April. As has been previously divulged by Intel, Tukwila will cram four Itanium cores with a second generation of simultaneous multithreading onto a single die, including the QuickPath Interconnect (QPI, short for Intel's riff on AMD's HyperTransport point-to-point interconnection scheme for processors, memory, and I/O), integrated memory controllers, 30 MB of on-chip L3 cache memory, and more than twice the performance of the current 1.66 GHz Montvale Itaniums, which top out at 24 MB of L3 cache.
The newsy bit that Gelsinger let out yesterday is that the Tukwila ramp is "preceding quite smoothly" and the chip is now due for shipment at the end of 2008 and will begin to appear in systems in early 2009. Montecito was originally expected in 2005, but shipped late in 2006, and Tukwila was expected in 2006 or maybe early 2007 and is now slipping into systems in 2009. Gelsinger, like other chip stewards, never said that the chip slipped, of course. This is ever the way in the chip business. (Sorry about that, Hewlett-Packard, Fujitsu-Siemens, NEC, Bull, Unisys, and Hitachi. You are going to have to wait for your Itanium kickers.)
No worries, though, because Intel has been able to get its act together in the X64 processor space, and is getting ready to ship a kicker to its Xeon 7300 "Tigerton" processors for high-end multiprocessor systems. With the shrink to 45 nanometers, Gelsinger said that Intel looked at adding eight cores and relatively small caches to a die for the future "Dunnington" Xeons, four cores and lots of cache, or six cores and a moderate amount of cache with perhaps a little extra clock speed. And, as it turns out, Intel has gone with what might look like an unconventional choice with the Dunnington Xeons and picked the six-core design. "A six-core, big cache part was the right solution," said Gelsinger in a conference call with the press yesterday.
The Dunnington chip will have the first L3 cache in the Xeon architecture, a total of 16 MB that can be shared by all six cores. The chip will have a total of 1.9 million transistors, and will be socket compatible with the "Caneland" platforms that the Tigerton Xeons plug into; the Tigerton Xeons are the first multiprocessor Xeons that sport the Core microarchitecture from Intel, and they were launched last September. Dunnington will be available sometime in the second half of this year. It is not clear if Dunnington chips are based on the new "Penryn" cores implemented in the new "Harpertown" Xeon 5400 processors, also using a 45 nanometer process. But it is a fair guess that Dunnington is just a shrink of Tigerton with some L3 cache woven in.
Gelsinger also gave a little more detail on the future "Nehalem" kickers to Penryn, which are the Xeon processor companions to the Tukwila Itaniums that also implement on-chip memory and graphics controllers and the QuickPath Interconnect. The Nehalem chip is designed in a modular fashion so Intel can take an eight-core design with memory controllers, graphics controllers, on chip L1, L2, and L3 caches, and Quick Path Interconnect buses and create a line of chips that spans from two to eight cores, with and without on-chip graphics as need be. (Servers will not get on-chip graphics, but some desktop and laptop variants will; high-end workstations will presumably still use even more powerful discrete graphics processors.)
The first Nehalem chips will go into production in the fourth quarter of 2008 for two-socket boxes using the 45 nanometer Hi-K processes Intel is ramping now. These chips will have four cores, a shared 8 MB L3 cache for the cores. Each core will have its own 32 KB L1 instruction and 32 KB L1 data cache and a 256 KB L2 cache as well. Intel is adding second-level branch predictors and translation lookaside buffers with the Nehalem architecture. The Nehalem chips should provide more than four times the memory bandwidth as the current Harpertown two-socket boxes. Each Nehalem chip has two QPI links per socket, and offers up to 25.6 GB/sec of bandwidth per QPI link. "The bandwidth coming out of the system is truly stunning," says Gelsinger.
The Nehalem chip will also get a substantially improved virtual thread architecture that Intel is currently calling Simultaneous Multi-Threading, or SMT, which is the industry name for virtual threading. The larger caches and higher memory bandwidth of the Nehalem chips will mean that SMT offers much better performance increases than HyperThreading, the implementation of SMT that Intel created for the NetBurst architecture. Just how much better, Gelsinger did not say. The Nehalem chip will also have an integrated, three-channel DDR3 main memory controller on the chip, and it will support up to three DIMM memory modules per channel per socket using 800 MHz, 1.07 GHz, and 1.33 GHz DDR3 memory. The memory controller will also support memory mirroring. The related I/O southbridge chip for the Nehalem chip will be called Tylersburg, and because memory and graphics functions are being pulled onto the chip, there is no need for a northbridge chip.
If you think Intel's going to rest after this, forget it. The "Westmere" 32 nanometer kickers to Nehalem are slated for 2009, with the "Sandy Bridge" follow-ons, also using 32 nanometer processors, due in 2010. Sandy Bridge chips will sport new 256-bit vector math electronics called Advanced Vector Extension (AVX), a kicker to the SSE 4.2 instructions in the current Xeons. The new wider vector units will double floating point performance, and other instructions will boost performance for supercomputing and media manipulation workloads.
Gelsinger also talked quite a bit about discrete graphics and the "Larrabee" graphics chip Intel has been working on for two years or so. While people asked a lot of questions on the conference call that Intel hosted, Gelsinger did not provide a lot of details about this chip, which Intel wants to take up against IBM's Cell broadband chip and AMD's ATI graphics chips. The Larrabee chip will support the IA instruction set used in the Core architecture, will have many cores (how many, Intel will not say), and a large NUMA-style coherent cache for all the cores to share, and a new vector instruction set that does not appear to be the AVX functions in Nehalem. (But it could and probably will turn out to be a superset of AVX.) Larrabee will come out in 2009 or 2010.
Rock and Tukwila Are the Stars of ISSCC This Week
Intel Quietly Releases 'Montvale' Itanium Kickers
Core Transition Complete as Intel Ships 'Tigerton' Xeon MPs
Intel Bangs the Itanium Drum, Draws Out Roadmap
Intel Shows Off Future Penryn and Nehalem Chip Designs
The X Factor: One Socket to Rule Them All
Dual-Core "Montecito" Itanium Chips Launched Today
Intel Pushes Out Itaniums, Replaces Future Xeon MPs
Intel Fleshes Out Server Chip Plans for Post-NetBurst Era
Intel Previews Dual-Core Montecito Itanium Performance
Intel Maps Out Its Server Roadmap
Intel Stands By Itanium, Positions It Against IBM's Power