|
|
![]() |
|
|
Itanium 2, With Up To Twice the Oomph of Itanium, Due in Early July by Timothy Prickett Morgan Intel raised the curtain on the forthcoming "McKinley" Itanium 2 processor last week, which more or less fulfills the promises Intel made back in 1996 for the "Merced" first generation Itanium chip. The company says that the 1 GHz Itanium 2 will offer from 1.5 to 2 times the performance as the 800 MHz Merceds, which shipped last year, and that Itanium 2 will ship in mid-year. The scuttlebutt is that Itanium 2 will debut on July 8.
The Itanium 2 processor is not, like many chips built by the remaining processor makers such as Intel, Hewlett-Packard, IBM, Sun Microsystems, Fujitsu, and AMD, just a shrunken version of the prior generation of chip using an advanced manufacturing process that allows a vendor to crank up the clock speed. The Itanium 2 has substantial architectural changes that yield the 50 percent to 100 percent performance improvement on compiled Itanium applications running on the first generation Merced chips. At 1 GHz, the clock speed on the McKinley is only 20 percent higher than the 800 MHz clock speed on the Merced, which was also available at a lower 733 MHz clock speed. This increased clock speed does not come from an improved chip making process, however, and it is in fact that target top clock speed of the original Merced chips. Both the Merced and McKinley chips are made using a 0.18 micron process, and Intel is not expected to move to the leading edge 0.13 micron process used in its low-power laptop and server processors until the "Madison" and "Deerfield" generation of Itaniums--presumably to be called the Itanium 3--in 2004. Here are the big differences between Itanium and Itanium 2. The Merced chips could process six instructions per clock cycle, in theory, but in practice this probably didn't happen. Let's go from the outside into the guts of Merced. The chip had 4 MB of off-chip L3 cache memory that was connected to main memory through a 64 bit, 266 MHz system bus that yielded 2.1 GB/sec of bandwidth. The Merced had 10 pipeline stages and nine instruction issue ports that in turn fed into 328 registers. These registers fed into a four integer units, three branch units, two floating point units, two SIMD units, and two load/store units, which ran at 800MHz. With McKinley, Intel has tripled the system bus bandwidth, moved a smaller (but still quite large) L3 cache onto the chip, removed a few pipeline stages, added issue ports, and tweaked the various computing units inside the chip so a 1 GHz processor comes closer to actually processing those six theoretical instructions per second. Specifically, the McKinley chip has 3 MB of L3 cache on the chip, which is linked to a 128-bit, 400 MHz system bus with 6.4 GB/sec of aggregate bandwidth. The McKinley has eight pipeline stages feeding into eleven issue ports, which in turn feed into the same 328 registers. These pass off instructions and data to six integer units, three branch units, two floating point units, one SIMD unit, two load units, and two store units. The increase in integer units and clock speed alone accounts for close to 90 percent more throughput on integer workloads. Having the L3 cache on die, rather than within the Itanium packaging on separate chips, reduces the L3 memory latency--how long it takes to move data from L3 to the chip or from the chip up to L3--by half. The increased system bus bandwidth is what has allowed the dual floating point units in the Itanium chip to do the work they could have been doing all along. The Merced was bandwidth crippled from the get-go, and McKinley proves it. Why this was the case is unclear, but my guess is that Intel's i870 server chipset was late and buggy, so Intel had to graft Merced onto the i460GX workstation chipset just to get it out the door last June. This chipset was fine for two-way and four-way workstations, but it was not designed for servers in the same way that the i870 chipset, now called the E8870 chipset, was designed to handle server workloads. Java and in-memory database applications have benefited most among all applications in the jump from Merced to McKinley, according to initial benchmark test results provided by Intel, with almost twice the performance. On security programs like SSL encryption, the McKinley offered about 50 percent more performance--20 percent coming from the higher clock speed and the remaining 30 percent coming from increased bandwidth and tweaks in the guts of the chip. The performance improvement on Linpack Fortran benchmarks was a little more than 50 percent better on McKinley than on Merced, and performance on the SPECint2000 integer and SPECfp2000 floating point benchmarks was up about 70 percent and 75 percent respectively. Computer-aided engineering and ERP applications that were compiled for Merced will see around 75 percent more performance. Oddly enough, online transaction processing performance (by which we presume Intel means the TPC-C benchmark) will only increase by 50 percent. However, this could yield a big improvement in scalability for Itanium-based enterprise servers. If these per-chip OLTP improvements can be passed through to the full systems, which were presumably designed with enough memory and system bandwidth to feed McKinley in the first place, the performance of a 16-way Itanium server could go from around 140,000 transactions per minute with Merced chips to around 210,000 TPM with McKinleys. This is the performance level that all but the largest RISC/Unix enterprise servers offer today. The Merced Itanium processor had 25 million transistors, with the off-chip cache accounting for another 300 million transistors. The McKinley chip has a total of 221 million transistors, mostly coming from the reduction in L3 cache memory size from 4 MB to 3 MB. The Merced chip had 32 KB of L1 instruction and data cache, 96 KB of L2 cache, and either 2 MB (733 MHz) or 4 MB (800 MHz) of L3 cache. The McKinley chip has the same 32 KB L1 cache, a bigger 256 KB L2 cache, and a 3 MB L3 cache.
|
Editor
Contact the Editors |
|
Last Updated: 6/5/02 Copyright © 1996-2008 Guild Companies, Inc. All Rights Reserved. |