AMD-ATI Merger Bears Stream Processor Fruit
Published: November 14, 2006
by Timothy Prickett Morgan
While the central processing unit, or CPU, gets all the glory because it runs the operating system and is the thing that end users most identify with as they think about hardware, for more than a decade, the most powerful computing engines inside of some computers--and by some measures--have been their graphics processing units, or GPUs. Because of this, people have been trying to make use of GPUs as adjuncts to CPUs to boost their performance on real workloads, not just painting pictures on screens.
Today, at the SuperComputing 2006 show in Tampa, Florida, where all kinds of vendors are showing off their gigaflops and teraflops, Advanced Micro Devices launched the first product in the wake of its acquisition of ATI Technologies, a specialist in graphics processors that has been experimenting with this idea and has created what it thinks is a new kind of product called a stream processor. This product, which turns a GPU into a math co-processor with immense bandwidth and floating point processing capability, was one of the reasons why AMD was so keen on acquiring ATI in the first place.
According to Dinesh Sharma, director of enterprise stream computing at AMD, there is more to creating a stream processor than rebadging a graphics card. Tricking a graphics card's rendering engine into running matrix math takes quite a bit of programming expertise, and this has been one of the factors that has limited the uptake of the idea of a GPU as a co-processor, despite the obvious benefits. But with its "Close to Metal" or CTM approach, which gives applications running on servers the ability to natively address a modified GPU--that's the stream processor--ATI has been able to radically simplify the programming model. Instead of having to hack cluster or workstation applications to talk to a graphics card to load data and do math on it--which is not exactly natural--now developers creating homegrown code as well as those working for commercial supercomputer application providers can use tools that allow them to talk to the stream processor as easily as they do a CPU today.
"People who a few years ago doubled up in laughter at the suggestion that this was not just possible, but preferable, are now calling me back into their offices to talk about it," says Sharma.
The reason is simple. The stream processor that AMD announced today costs $2,599, and it delivers approximately 360 gigaflops of computing power in a single card that plugs into a PCI-Express x16 graphics slot. (And that is any graphics slot, whether it is an Athlon or Opteron motherboard, or an Intel motherboard. Presumably, any server of any architecture that has PCI-Express slots will be able to use the stream processor eventually.) The GPU has 48 cores and 1 GB of GDDR3 memory.
To put that 360 gigaflops into perspective, at the "Clovertown" launch today, Intel was bragging that 44 of these quad-core Xeon 5300 processors could deliver about 1.8 teraflops of computing power. Which suggests that a single chip (with four cores) can do around 41 gigaflops. You need six Clovertowns--or 24 cores--to do the same work that the AMD stream processor can do. Six top-end Clovertown chips would cost over $7,000 at list price, and would take up at least three pizza boxes in a rack. The stream processor is leaner, and cooler, and cheaper. By AMD's own math, it takes 84 rack servers to deliver 2.4 teraflops of computing power, and a dozen stream processors can do the same floating point work.
Of course, the stream processor doesn't have many applications that can run on it yet, either. That's what the partnerships AMD is forging with independent software vendors are all about--creating the tools that allow popular applications to be ported to this hybrid architecture. "We now have direct access to the GPU through CTM," explains Sharma. And that makes programming easier.
Which is why AMD has been able to get 60 companies signed up to help create the tools and the applications for Linux or Windows workstations and servers that will allow them to reach out and tap into the stream processor to do certain floating point algorithms at lightning speed. "What we really want to do is foster a robust ecosystem of tools and applications for the stream processor," Sharma says. That will give customers the confidence that this hybrid approach is not just a fluke. Helping in this regard is the fact that AMD is now a respected CPU vendor and that it has acquired ATI and pledged to not only push the envelope on graphics processing, but to integrate graphics and other co-processor functions into the compute complexes it creates.
There is more to a stream processor than gigaflops. Memory bandwidth is perhaps more important, and nothing has more memory bandwidth in a computer than a graphics card.
"The gigaflops are what seduces people when it comes to stream processors, but the memory bandwidth is what is really exciting," says Sharma.
Graphics cards have tweaks in their electronics that are optimized for graphics functions--including special memory paths that can move data quickly to paint screens. To do computing, these access paths have to be tweaked, and the graphics card has to get considerably more memory. But the payoff is huge. The stream processor can move data into that 1 GB GDDR3 memory at 45 GB/sec, while an L1 cache memory on a CPU can do it at around 4 MB/sec. That is three orders of magnitude faster. "The memory in the streaming processor looks like one giant, very fast L1 cache."
Another benefit of the stream processor, of course, is that it can still do graphics. With this first stream processor, a company could have a cluster do calculations on the data set using the modified GPU, and then turn around and use the GPU to render a simulation based on that number crunching. Moreover, multiple stream processors can be ganged up inside a single machine, and they can be used in cluster configurations, too.
Right now, Sharma says that most customers who have been looking at the stream processor are interested in running Linux, not Windows. But that could change as this market evolves and Microsoft gets traction in supercomputer clusters and in scientific workstations. Today, the stream processor supports Windows XP (both 32-bit and 64-bit) as well as the 64-bit version of Windows Server 2003. Red Hat Enterprise Linux 4 has been certified on the stream processor as well, and Novell's SUSE Linux Enterprise Server 10 is due to get support in early 2007.
SGI Sues Graphics Chip Maker ATI Technologies
The AMD-ATI Acquisition: Integration and Freedom for Customers, IHVs