Cray Lands $200 Million Linux-Opteron Super Deal with DOE
Published: June 16, 2006
by Timothy Prickett Morgan
A few weeks ago, supercomputer maker Cray said that the next quarter or two might be a little bumpy, but that it was pretty confident that it would make its numbers for the year. And, apparently, with good reason. Yesterday, the company announced that it had landed a whopping $200 million deal to build an Opteron-based parallel supercomputer that runs a hyped-up version of Linux for the Oak Ridge National Laboratory.
Whether or not this is the most powerful supercomputer in the world when it goes operational in 2008 or not remains to be seen, but this is the first commitment that any research institution or government has made to pay for a petaflops computer so far. There will, undoubtedly, be others, since the race is on to petaflops-scale computing. A petaflops is equivalent to 1,000 teraflops of number-crunching power, and it is about three times the performance of the current champion of the supercomputer set, the 367 teraflops BlueGene/L super at Lawrence Livermore National Laboratory.
Both LLNL and ORNL, along with Sandia National Laboratories and Los Alamos National Laboratory, are the foundations of supercomputer research and usage in the United States, and build machines that are largely funded by the U.S. Department of Energy and steered by the Defense Advanced Research Projects Agency's High Productivity Computing Systems initiative, which was launched a few years ago to define the computing architectures for 2010 and beyond. IBM, Sun Microsystems, and Cray have all received grants from DARPA to help create designs, with Sun working on programming tools and IBM and Cray working on system designs. The petaflops-class machine that Cray proposed to ORNL was based on its "Cascades" project for DARPA, which essentially entails the convergence of various Cray computing lines into a single line using its SeaStar 3D interconnect to support various kinds of processing elements.
The "Red Storm" supercomputer that Cray created for Sandia out of Opteron processors and a tweaked Linux operating system had a $90 million price tag and delivers about 43.5 teraflops of peak performance. It was a bumpy project for Cray, but one that resulted in a new product line for the company, the XT3. As part of its DARPA work, Cray has been working on converging its XD1 product line, which is based on another Linux-Opteron design, and the XT3s, and the "Baker" machine will combine elements of these two machines as well as being able to support processing nodes from Cray's X1 vector supercomputers. This convergence is probably important, considering that ORNL has existing XT3 and X1 supers and is, in many ways, Cray's biggest and most important customer. And, thankfully, it is near the electricity-generating capacity of the Tennessee Valley Authority, and moreover, the DOE has the cheapest electricity in the world. Its machines sure do use a lot of it.
Cray was the first company to break the gigaflops barrier, which it did in 1989, and to hit the teraflops barrier, which it did in 1998.
The exact configuration of the Baker machine that ORNL will get by 2008 is unclear, mostly because a lot depends on what the core count of the then-current production Opteron processors are at the time and what their thermal characteristics are. "This contract is about delivering a petaflops machine," explains Jan Silverman, senior vice president of corporate strategy and business development at Cray. "We don't yet know what the frequency of the Opteron chips will be then, but we are guaranteeing that we can deliver a petaflops machine." The best guess is that the machine will have about 24,000 processor sockets. The design calls for each node to be built from a four-socket SMP server, and each socket to have dual-core Opterons in it. In the Opteron design, each core has a dedicated HyperTransport bus, and the SeaStar 3D interconnect is what allows each core to talk directly to the network at the heart of the parallel supercomputer.
The Cascades project is focused on building scalable and low-latency interconnects that take advantage of HyperTransport as well as tweaking the Linux kernel so it is streamlined and puts as little chatter out on that SeaStar interconnect as possible. When tens of thousands of cores are all trying to talk to each other and to storage, Linux can't be making a lot of noise. The Baker machine that ORNL gets will be running Fortran and C applications, and will be used in a variety of research areas, including nanotech, genomics, and high-energy particle physics.
Cray said that it expects that ORNL will upgrade its XT3 system to 50 teraflops later this year, doubling the performance it has installed today on this Linux-Opteron machine. At the end of 2006, this box will be upgraded to 100 teraflops, and then in late 2007, it will be pushed to 250 teraflops. In 2008, the new Baker design will be installed, pushing to 1 petaflops. Cray said that it expects to recognize some of the revenue from this $200 million deal in the second half of this year, with one-third of the total recognized in the next two years and the remainder after the Baker machine is accepted by ORNL in late 2008.
Cray Warns Q2 Down Significantly, Affirms Guidance for Year
Cray's CTO Plans Its Future Converged Iron
Cray to Attack HPC Space with Red Storm, X1 Kickers