|
Intel Previews Dual-Core Montecito Itanium Performance
by Timothy Prickett Morgan
Intel doesn't ever want to preannounce a forthcoming chip, but the company is smart enough to leak out information from time to time to keep IT buyers thinking about the future technology it is promising to deliver as they mull over their current product purchases. To that end, Intel last week lifted the curtain a bit on its dual-core "Montecito" 64-bit Itanium processors, releasing Linpack Fortran benchmark test results for machines using the chip.
After a number of roadmap rejiggerings and some delays, the Montecito processor is expected to start rolling out of the Intel factories sometime in the second half of 2005--that started last week--and depending on how fast Intel can ramp up production and how quickly vendors qualify their systems for the chip, it could end up in systems at the end of 2005 or in the beginning of 2006. Most people expect Montecito processors to appear in servers early in 2006.
The Montecito chip is really two of the current "Madison" Itanium 2 cores implemented on a chip. Each core has 16 KB of L1 data cache, 256 KB of L2 data cache and 1 MB of L2 instruction cache, plus 12 MB of L3 cache. Intel is not unifying the L2 or L3 caches, as it could have done. Each core will have two virtual threads, enabled by HyperThreading, yielding four virtual threads per chip socket. Montecito will be made using a 90 nanometer process and will pack some 1.72 billion transistors on a die--with most of those transistors being used to make the 24 MB of L3 cache. The Montecito chip is expected to have a 667 MHz front side bus, just like the 1.6 GHz/9 MB single-core Madisons do today. The dual-core Montecito Itaniums will be the first chips that Intel ships that sport the "Vanderpool" VT virtualization features as well as the "Pellston" error correction technology, the "Foxton" technology--which boosts the clock speed on the Itanium chip when the workload demands it and the server can take the heat; it will also include DBS power management features. Because of these power management features and the move from 130 nanometer to 90 nanometer processes, a top-end Montecito chip is expected to consume 100 watts of juice compared to 120 watts for the biggest Madison chip.
While Intel demonstrated a "Tiger4" Montecito system with four processors and eight threads at Intel Developer Forum in the spring of this year, the company has been mum about clock speeds and performance. The word on the street is that various incarnations of the Montecito chip will run at 1.2 GHz to 2.2 GHz, offering a wide span of performance and heat profiles to IT vendors and customers who are frustrated by the relatively high heat output in the Itanium 2 chips.
On the Linpack Fortran benchmark test, which is used to gauge the sustained and peak theoretical performance of supercomputers, a four-way Tiger4 box from Intel was able to deliver 45.8 gigaflops of sustained number-crunching power, according to the company. If you look at past ratios of peak to sustained performance on the Linpack test for Itanium iron, then peak performance on this box is probably 64 gigaflops for the four-socket, eight-core box. This works out to 8 gigaflops per core, and that probably means this chip was running at 1.6 GHz and each chip probably only pumped out around 50 watts, or 200 watts for four processors; this is 320 megaflops per watt (peak). Jacking up the clock speed to 2.2 GHz would increase the peak performance to 88 gigaflops, but the wattage for just the processors shoots up to 440 watts, reducing the ratio to 200 megaflops per watt. That's not a good tradeoff, which explains why Intel is moving to a multi-core strategy for all of its chips.
So how does the Montecito compare to prior Itaniums? Well, a first-generation "Merced" Itanium server with two chips running at 800 MHz had one tenth the peak performance of the four-socket Montecito box--that's 6,400 gigaflops--and delivered only 3.9 gigaflops of sustained performance on the Linpack test. While Intel did offer a four-way Merced box, no one ever got around to testing it, mostly because it was too hot, too expensive, and too depressing given all the promises of future Itanium performance from the late 1990s. Three years ago, when the 1 GHz "McKinley" chips saved the Itanium from oblivion, a four-way McKinley box delivered 16 gigaflops of peak Linpack performance and 11.43 gigaflops of sustained performance. With the first iteration of the Madison Itaniums, Intel boosted L3 cache size to 6 MB and cranked up the clock to 1.5 GHz, and the peak Linpack performance of a four-socket system rose to 32 gigaflops and the sustained performance hit 22.7 gigaflops. The interesting bit with Montecito is that through sophisticated caching and increased bus bandwidth, a chip with a tiny bit more clock speed (1.6 GHz versus 1.5 GHz) that has twice the number of Itanium cores is delivering twice the sustained and peak performance.
This is, of course, what Intel has been promising with the Montecito. Officially, systems based on Montecito chips will deliver up to twice the performance and up to three times the system bandwidth as the Madison generation of machines.
What Intel is really excited about, of course, is that a four-socket Montecito system running at 1.6 GHz can beat a four-socket Power5 system from IBM with the same number of processor cores. The p5 575 server with four dual-core 1.9 GHz Power5 chips delivered 34.57 gigaflops of sustained performance on the Linpack test, with a theoretical peak performance of 60.8 gigaflops. Chip for chip and core for core, a slower Montecito seems to be beating a faster Power5 on the Linpack test. Looks like it is time for Power5+ to come out, eh?
|