Newsletters   Subscriptions  Forums  Store  Media Kit  About Us  Contact  Search   Home 
two
Volume 1, Number 4 -- March 17, 2004

Brace Yourself: Major Intel Architectural Shifts Ahead


by Timothy Prickett Morgan

It is the job of the chief technology officers at the major IT companies to be enthusiastic about technology for its own sake (they are nerds, after all) and for how it can shape our daily lives. But behind the scenes, CTOs also have to be practical about the limits of various technologies and then pick those that their companies will use to build the future products that we buy as consumers or businesses. If they choose wrongly, their companies will suffer.

In one sense, a CTO that announces the beginning of a major architectural shift, as Pat Gelsinger, CTO at Intel, did at the most recent Intel Developer Forum, is like a kid in a candy store. Such architectural shifts are disruptive to the existing technologies and business models that IT suppliers have. With each new wave of technology, it is only the companies that adapt that can continue, grow, or thrive. In today's keynote, Gelsinger explained how he and Intel see the coming "era of tera" that will not only transform how we use digital technology, but will require a rethinking of the way that systems--be they cell phones, PDAs, laptops, desktops, or servers--are built.

One of the prevailing themes in technology over the past 25 years that Intel has been a player in the IT market is that at a certain point, enough is enough. IT history predating Intel is littered with comments from prominent IT innovators who incorrectly called the eventual pervasiveness of a given technology. IBM founder Tom Watson, who was making a killing selling punch card devices thanks to a monopoly, thought that IBM might sell five or six electronic computers. Ken Olsen, the founder of Digital--the company that essentially created the minicomputer market and which has been subsumed into Hewlett-Packard--thought that the PC was rubbish. The technology companies that only see as far as their initial successes and/or their founders will eventually hit a wall. If Gelsinger's job at Intel means anything, it is a political one in that he must always get the company to practice the art of the possible to avoid hitting walls.

Gelsinger, like other IT component and platform makers, has come to the conclusion that in this so-called era of tera--where Intel foresees end users having access to teraflops of computing power, terabits of communications bandwidth, and terabytes of data storage--systems are going to have to go multicore, multithreaded, and consume a lot less power. While there are not, in theory, any limits to Moore's Law, which allows chip makers to double the performance of processors every 18 months or so as transistors shrink and run cooler, there are practical limits to boosting the clock frequency on processors that have been discussed at length in recent years in the computer business. They bear repeating, since Gelsinger talked about how Intel would be circumventing these problems with new architectural approaches.

First, memory and I/O subsystems have not kept pace with the higher clocks on central processors and their L1, L2, and L3 caches. In the past 25 years, Intel has taken the X86 architecture from 5 MHz to 4 GHz, the clock speed of the "Prescott" Pentium 4 chip by year's end. That clock speed improvement has been enabled by ever-shrinking chip processes, but making sure performance scales along with Moore's Law (or does better than it might suggest is possible) has also required Intel to add features to the X86 architecture such as pipelining in the 486, superscaling in the Pentium, MMX media processing in the Pentium Pro, out of order and speculative execution in the Pentium III, and hyperthreading with the Pentium 4. Were it not for these advances, performance probably would not have scaled along with transistor count. The architectural efficiencies that these technologies yield is what has kept the X86 architecture moving ahead. Moore's Law is only part of the equation.

We are very likely hitting the practical limits of clock speed increases. Each time Intel cranks up the clock, it has, for architectural reasons relating to the efficiency of how that pipeline is used, lengthened the instruction pipeline, too. The net effect is that for many workloads, the move from Pentium III to "Williamette" Pentium 4 to "Prescott" Pentium 4 cores has not provided the boost that you'd expect from moving from 1.4 GHz to 4 GHz.

So can Intel continue to push the gigahertz? No, it can't, and it knows that. In a humorous graph that Gelsinger has shown at previous IDFs, the Pentium processors from the mid-1990s had a power density that was effectively the same as an electric hot plate (that's a measure of watts per square centimeter). If present gigahertz and power profiles continued unabated, Pentium chips will have the power density of a nuclear reactor core within a year, and would reach that of a rocket engine nozzle by 2008 and the surface of the Sun by 2010.

Even if it were possible to run chips that hot, the gap in speed between processor logic and memory logic is widening and means that the extra clock cycles end up being wasted on waiting for instructions and data to process. In the 486 generation, Gelsinger said that it took around 10 clock cycles to fetch data from main memory, but with the Pentium 4 chips, this delay has ballooned to 220 clock cycles. As chips get faster, this gap will widen unless there is a dramatic improvement in memory speeds. (Putting main memory on the processor chip is something Intel would love to do, but the amount of memory that processors would require would make such a chip very, very large and the odds that it would work flawlessly would be very, very small because of imperfections in the manufacturing process.) There are other delays that impede the reliance on clock cycles to improve performance, too. The receive clock (RC) interconnect delay is going up, too. In Pentium 4 chips today, it takes 15 clock cycles just to have a circuit on one end of the chip talk to the other end.

"There needs to be a major architectural paradigm shift," explained Gelsinger. "We need a fresh approach." Just as the past decade has seen Intel borrow ideas and technologies from the IBM mainframe and RISC/Unix midrange markets to improve X86 processors, the next era of computing envisioned by Gelsinger will see Intel borrow ideas from the high performance computing (HPC) and supercomputer industries. The future will be about putting many processors running at more modest clock speeds on a single chip, and perhaps massive numbers of cores as Intel moves from 65 nanometer to 45 nanometer to 25 nanometer processes to even smaller processes.

Exactly how many cores and their associated L1/L2/L3 caches Intel can cram onto a single chip is unknown, and Gelsinger did not offer any suggestions in his keynote today. But given Moore's Law, the company ought to be able to double the number of cores on a chip every 18 months and thereby be able to keep clock cycles constant and still essentially double performance of a multicore chip. More importantly, if Intel gets really good at using advanced chip making processes, it could quadruple the core count every 18 months and even lower the clock speed. Roughly speaking, a four-core processor running at 3 GHz should have about twice the performance of a two-core processor running at the same speed. But a six-core chip running at 2 GHz and an eight-core chip running at 1.3 GHz would yield the same doubling of performance in terms of raw clock speeds while at the same time decreasing the latency between processors and main memory by a huge factor. The core count that vendors like Intel decide on will be determined by and large by the heat profiles they want in the resulting machines. All computer makers are going to shift to these minimalist, multicore approaches. IBM's Blue Gene/L and Sun Microsystems' "Niagara" and "Rock" multithreaded machines all have a huge number of simple cores. It would not be surprising at all if the kickers to Xeon and Itanium feature radically simplified cores, too.

Intel has other tricks up its sleeves, too, and Gelsinger showed off one of them in his keynote called helper threads. Right now, memory latency on the high-end server versions of the Pentium 4 Xeons eats up half the processor cycles. Half of the time, the CPU is tapping its hot little foot, waiting for data from memory. This is wickedly inefficient. With helper threading, Intel's compilers sift through application code, looking for places where it can prefetch data for the application and get it into cache memory before the application actually calls for it--this is called warming the cache. In a benchmark test on a Xeon server cluster using IBM Corp's DB2 database simulating a transaction processing workload for 7.5 million customers, the helper threads reduced cache misses by 23 percent and boosted performance by 8.9 percent. "In the TPC-C race, 10 percent is to die for," he said, putting this improvement into perspective. Another future technology called active body biasing reduces power leakage in circuits by 20 to 40 percent and allows a 90 nanometer circuit to perform about as well as one implemented in 65 nanometer processes. "In effect, we make it less leaky than the current generation and make it more efficient than the next generation."

Another trick that Intel has up its sleeve might be more tightly integrating networking protocols with processors. Right now, the TCP/IP stack running on Intel boxes has to communicate through main memory to cache and then to the processor. A network interface card only accounts for about 22 percent of the latency in the TCP/IP stack. Moving data through the main and cache memories in the processor accounts for about 23 percent of latency, which system overhead inside the system itself accounts for 55 percent. By designating one core in a multiple core system (in this case a prototype multicore Itanium chip), Intel lets the TCP/IP NIC talk directly to the processor and one core on the chip is dedicated to running the TCP/IP stack. Adding in helper threads for the TCP/IP stack to a direct cache access and dedicated TCP/IP processor, Intel has been able to cut the number of processor clocks necessary to move a 1 KB packet of data from 21,000 clock cycles down to 2,100 clocks. That is a factor of ten improvement, and it also means that a current generation of Intel processor, which has trouble filling a Gigabit Ethernet pipe, will be able to keep a 10 Gigabit Ethernet pipe full.

But getting to this teraflops-terabit-terabyte world that Gelsinger envisions is going to take some different approaches to software, too--and luckily for Intel, it is going to take software that eats an incredible amount of computing power. In the mega era of the mid-1990s, a PC could store and manipulate a 32-bit color image, and in the current giga era, machines using "true color" can analyze a sequence of high-resolution images. In the tera era, says Gelsinger, we will want computers that can generate 3D, high resolution models of objects, whether they are virtual images through time of the systems in our bodies that are used as medical records or to help us get a better fit for our jeans and show us how the Levis on the screen will look on a virtualized representation of our bodies that we recognize as ourselves on the screen.

To demonstrate the kinds of future graphics capabilities Intel foresees in the marketplace in the next few years, Gelsinger brought out Philipp Slusallek, a professor of graphics at Saarland University in Germany and a founder of a company called InTrace Gmbh that specializes in ray tracing software for doing 3D modeling. Slusallek says that the rasterized graphics that we all know from video games and some of us are acquainted with through high-end CAD systems do not generate correct images, but rather ones that seem good enough. On closer inspection, raster images do not show reflections and shadows properly and they take a huge amount of graphics card and processor computing capacity to render images. (This is why computer animated films take so long to produce.)

Slusallek's company has developed a suite of programs that runs on X86 processors and employs ray tracing techniques that have been around for years but which are very compute intensive and therefore of limited appeal to business and of no appeal to consumer end users. However, ray tracing techniques create reliable and credible reflections and shadows and, once you have the right software and enough compute power, can more simply create a visual image that looks correct to the human eye because the image is created by modeling how light really bounces off objects in space. Perhaps most significantly, the ray tracing technique is scalable whereas rasterization is most definitely not. With an image comprised of half a million polygons, rasterized images can be strung together to deliver images that move at 60 frames per second. But as finer detail is required and tens or hundreds of millions of polygons are required to more accurately simulate and object, rasterized image techniques quickly drop through 40, 20, and 10 frames per second (with about 2.5 million polygons per frame). With the software ray tracing techniques developed by InTrace, a cluster of 23 two-way Xeon servers using 2.2 GHz "Prestonia" Xeon DP processors can deliver four frames per second on an image composed of one billion polygons per frame. (This collection of machines is rated at about 400 gigaflops). By doubling or quadrupling the number of processors in the cluster, the number of frames per second can be doubled or quadrupled. This image was created with simple light and shadow ray tracing; adding more processors to the complex could allow indirect illumination and other features that would further improve the fidelity of the images shown. As you can see, it won't take long before such an application has burned through many teraflops of computing power and is storing many terabytes of data. Nothing could make Intel happier.

"My personal vision and goal is that our technologies touch every person--all 6.5 billion people in the world--in all modalities of life," Gelsinger declared at the beginning of his presentation. As if we need another acronym, he called the future software technologies that would drive the need for this much computing power RMS, short for recognition, mining, and synthesis. These workloads are associated with capturing massive amounts of multimedia information and using tools to not only sift through it and make it instantly accessible, but also to finally give us a virtual reality that our eyes tell us looks real. Whether or not the world is ready to embrace full digitization of their lives and a virtual reality that is indistinguishable from the one we see is--and will continue to be--the subject of much debate. What is clear is that Intel is convinced that it can make money from creating the servers, desktops, and other devices that will enable such a world, and it will build these technologies whether or not the world is ready for them.

Sponsored By
HEWLETT PACKARD

DEMAND MORE...

Demand more from IT than its ever delivered before. Make it prove its value, make it pay.

Demand a new IT architecture: one that is open, modular and flexible; one that adapts, and adapts quickly, to every IT event triggered by every business decision.

Demand that technology yield to the disciplines of business and be subject to the same practices and return analysis as any other business decision.

Demand an alternative to the way IT and IT services have been purchased, implemented and operated for the last two decades.

Demand accountability, rather than account control, from your IT partner.

Demand on-demand computing, the real thing, right now. On-demand computing really does exist, right now. You can see it.

Demand the ultimate state of IT fitness: Insist that business and IT be perfectly synchronized, and speed the evolution toward an adaptive enterprise.

Demand more from IT. And find out who, really, can deliver.

Click the links below for more information on:

adaptive enterprise
IT consolidation
business continuity
management
utility data center


Editor: Timothy Prickett Morgan
Managing Editor: Shannon Pastore
Contributing Editors: Dan Burger, Joe Hertvik, Kevin Vandever,
Shannon O'Donnell, Victor Rozek, Hesh Wiener, Alex Woodie
Publisher and Advertising Director: Jenny Thomas
Advertising Sales Representative: Kim Reed
Contact the Editors: To contact anyone on the IT Jungle Team
Go to our contacts page and send us a message.

THIS ISSUE
SPONSORED BY:

Hewlett-Packard
Unisys/Microsoft
Geekcorps
Stalker Software
Winternals Software


BACK ISSUES

TABLE OF
CONTENTS
Brace Yourself: Major Intel Architectural Shifts Ahead

Microsoft Delays Future Versions of SQL Server, Visual Studio

Microsoft Rolls Out Betas for Management Middleware

As I See It: The Path of Service

But Wait, There's More



Copyright © 1996-2008 Guild Companies, Inc. All Rights Reserved.
Guild Companies, 50 Park Terrace East, Suite 8F, New York, NY 10034
Privacy Statement