Intel Goes After AMD with New Chip Architecture
Published: March 14, 2006
by Timothy Prickett Morgan
As the semi-annual Intel Developer Forum kicked off last Tuesday, the head techies at the chip maker opened up the show by going through the new 'Core Microarchitecture' that will be at the heart the company's impending line of mobile, desktop, and server processors. And while Intel was loathe to invoke Advanced Micro Devices by name, its main rival was clearly in attendance and seen as the major driver for the changes Intel is forging into its chips.
The new Core Microarchitecture, which is a derivative of the Pentium M processor for laptops, is the foundation of the future Intel chips, which put as much emphasis on energy efficiency as they do on raw performance. The press always sits down in front on the left side at these events, close to the stage, and as we waited for Justin Rattner, Intel's chief technology officer, to take the stage and talk about the Core Microarchitecture, a half rack of servers with 10 1U machines and two 4U machines was one of the loudest things in the room. While Intel never talked about these machines that were whirring away on the stage, no better demonstration could have been found for what is wrong with the current generation of RISC and X64 processors. They take too much energy to do their work, and they take even more energy to be kept cool, and they cannot be packed as densely as server makers and their customers would like. And the Core Microarchitecture is Intel's answer to these problems.
In many ways, the Core Microarchitecture represents Intel's third big shift in the microprocessor racket. The X86 generations of machines from the 8088 to the 80486 were really targeted at the PC market, even though they were used in servers starting in the late 1980s. With the shift in the Pentium architecture in 1993, Intel put a RISC-like core at the center of the chip and implemented out-of-order instruction execution on a deeper instruction pipeline, among many other technologies, and created a processor that was truly suitable for a server as far as performance is concerned. And the high-volume manufacturing engine that is Intel went on to capture well over 90 percent of server annual shipments with Pentium and then Xeon processors within a decade. But the Pentium chips were big, and boosting performance required Intel to keep cranking the clock, deepening the pipeline, and doing other unnatural acts to boost performance. Each performance significantly boosted the power used to run and cool the chip, up to the point where a 100-watt chip became normal. A few years ago, server vendors, pushed into a corner by their customers, told them no more heat, and Intel had to scramble to get performance in a new way.
As Rattner pointed out in the opening keynote, the company had fortuitously created a chip design team in Israel that developed a low-powered Pentium for laptops, the "Banias" Pentium M chip. To illustrate how good this chip was in terms of performance and performance per watt, Rattner displayed a chart that showed the energy consumed per instruction, ignoring the positive effects of process technologies as Intel has moved from 800 nanometer to 65 nanometer process technologies from the 80486 to the new "Yonah" Core Duo laptop chips. The chart showed the performance and power consumption of these chips if they had all been implemented in the current 65 nanometer process technology Intel will be using to make the Core Architecture processors, and set the voltages of all of the chips to 1.33 volts. This is a way of comparing various chip architectures.
On that curve, if you normalize for 486 performance and power consumption (meaning setting them to 1), then the 486 has an energy per instruction (EPI) of 10. The Pentium chip from 1993 had about twice the performance, but consumed 2.7 times the energy, which raised the EPI to 14. The move to the Pentium Pro boosted performance relative to the 486 by a factor of 3.6, but power use went up by a factor of 9, driving the EPI up to 24, and the "Williamette" Pentium 4 chip had six times the performance, but consumed 23 times as much energy, yielding an EPI of 38. The "Cedarmill" Pentium 4 chip had 7.9 times the performance, but consumed 38 times as much energy, for an EPI of 48. The Pentium M may not have had blazing performance, but it sure did have pretty good power efficiency and decent enough performance for a laptop, where companies would willingly sacrifice performance for longer battery life. The second-generation "Dothan" Pentium M chip had about 5.4 times the performance of 486, but only consumed 7 times as much energy, giving an EPI of 15. And the new dual-core "Yonah" Core Duo chip, which actually is made using a 65 nanometer process, has 7.7 times the performance of the 486, but only consumes 8 times the power, which gives it an EPI rating of 11. Which is only 10 percent higher than for the 486.
That, in a nutshell, is why Intel is excited about the new Core Architecture. It has been able to turn back the clock on power consumption, and because of the 65 nanometer process, it can cram more cores onto a chip, keep the clock speed low, and still boost performance enough to make its server customers happy.
"Even though we have been under tremendous competitive pressure, you might think that we have lost some enthusiasm," Rattner said as he rolled out details on the new architecture. "That's far from the truth." He then went on to explain why Intel was moving from single to multiple cores, looking again at the energy budget of the processors.
Rattner showed another chart that explained what happens when you crank the clock speed on a 1 GHz chip to 1.2 GHz. While the clock speed goes up 20 percent, the performance you get only rises by 13 percent, and the power consumption rises by 73 percent. "This is not a very good tradeoff," explained Rattner. Then he showed the effect of underclocking the processor, dropping it from 1 GHz to 800 MHz. While performance drops by 13 percent, power consumption drops by 51 percent. Now, you have enough thermal room to add a second core to the chip, which only adds 2 percent to the power but increases the performance of the chip by around 73 percent. This is why Intel has found multicore religion. "You cannot get these kinds of performance increases in a power budget any other way," he said.
Intel demonstrated quad-core processors running its presentations at the show yesterday, using a desktop chip code-named "Kentsfield," which is expected to hit the market in 2007. Rattner was non-committal on whether Intel would try to drive more cores onto the chips. "You won't see us deliver mediocre core performance just so we can add cores," he said, adding that Intel will determine how many cores a chip should have based on the state of multithreading of the software in the market at the time.
Rattner said that the 65 nanometer process Intel has in production at its four fabs is what is going to allow it to compete with AMD, which is still using 90 nanometer processes in its Athlon and Opteron processors but which is expected to move to a 65 nanometer process in mid-year with the "Rev F" Opterons. AMD has been selling dual-core chips using 90 nanometer processes for a year, while Intel has been selling two chips packaged side-by-side as dual-core modules with the Paxville DP and MP processors. The new "Sossaman" Xeon LV processor, due this week, is a true dual-core chip, but it is only a 32-bit processor, so it has limited appeal even though it has very low power consumption.
This 65 nanometer process, explained Rattner, allows transistors to switch with about 30 percent lower energy than they did in 90 nanometer technologies, and to do so 20 percent faster. "We believe that we are over a year ahead of the competition on process," he said, referring obliquely to AMD yet again. Intel is already demonstrating 45 nanometer processes, which will yield the same improvements in transistor speed and lower power consumption and which is expected to go into production on quad-core chips in the second half of 2007.
Aside from the benefits of the 65 nanometer process and the multicore designs it allows, the Core Architecture that Intel unveiled last week includes many other microarchitecture changes, five of which Rattner outlined. The first innovation is called Dynamic Wide Execution, which is a wider 14-stage instruction pipeline--which he called a four-wide pipeline, meaning it could do four instructions per cycle. Like other Pentium chips, it has the microfusion feature, which allows two internal, RISC-oid microinstructions to be packaged up and processed in one clock cycle. But the Core Architecture now has 'macrofusion,' which means two higher-level X64 instructions can be packaged and executed at the same time. Another new feature is that all so-called SSE instructions, which are used for graphics and multimedia applications, are implemented in 128-bit wide instructions and can execute in one cycle; in the past, only a subset of the SSE instructions were designed this way, which means they were slower. The Core architecture also implements shared on-die L2 cache for the cores, and if one core goes inactive, the other core can take over the whole cache, thereby boosting its performance. The architecture has a feature called Smart Memory Access, which is a new set of pre-fetch algorithms that improves how data is moved into cache memory and which can also move loads in ahead of stores without corrupting data. Both of these capabilities improve performance without requiring more energy. Finally, the Core Architecture has a feature called Intelligent Power Capability, which has perfected power gating technologies that Intel has already deployed in its chips such that smaller blocks of components on a chip can be shut down if they are not in use, thus lowering overall power consumption.
When you add it all up, the changes that the Core Architecture is enabling for Intel's processors are impressive. The future "Merom" laptop chip will have a 20 percent improvement in performance relative to the Core Duo T2600 chip, but use the same energy. The "Conroe" dual-core chip for desktop machines will have 40 percent better performance than the Pentium D 950 processor, but use 40 percent less power. And the "Woodcrest" Xeon DP chip for servers will have an 80 percent performance increase compared to the "Paxville" Xeon DP (which is also a pseudo-dual core chip) and 35 percent lower power consumption.