|
Cray to Attack HPC Space with Red Storm, X1 Kickers
by Timothy Prickett Morgan
The top brass at Cray were in New York last week to present the company's position in the high performance computing market to Wall Street investors and bankers at the sixth annual growth conference put together by investment banker Needham & Company. Cray, which has big plans for 2004, gave an overview of how it intends to differentiate its products--bandwidth has always been what makes Cray machines worth the money--and why this will allow it to grow and profit in an increasingly commoditized HPC server arena.
Last year was all about getting the Cray X1 hybrid parallel vector supercomputer, formerly known as the SV2, out the door and in the hands of eager government agencies. And Cray did that, said Jim Rottsolk, chairman and CEO of the company. In a sense, 2003 was the first "real" year of business for the new Cray, which was created when Tera Computer acquired the former Cray from supercomputer rival SGI Inc in March 2000. Since that time, Cray (as the company was renamed) has been getting its ducks in line, getting the X1 out the door and winning a massive contract with the U.S. Department of Energy and Sandia National Laboratories to build a massively parallel Linux cluster that can scale to 100 teraflops.
Plenty of people were skeptical about the X1, especially Cray's competitors (mainly IBM, Hewlett-Packard, and Sun Microsystems), which peddle massively parallel Unix boxes that are linked with proprietary interconnection fabrics. (NEC, which has built a massively parallel vector super for the Japanese government called Earth Simulator that is the most powerful machine in the world on many benchmarks, is probably less skeptical of the X1.) The X1 design, like the SV1s ahead of it, relies on creating powerful virtual vector processors from four smaller and less powerful processors. These hybrid processors, called multistreaming processors or MSPs by Cray, can be ganged up with a high-bandwidth interconnect that allows them to resemble the old-fashioned vector processors that made Cray famous the world over. A fully loaded X1 machine has 64 cabinets with 1,024 MSPs (that's 4,096 actual processors) and shared main memory that ranges from 16 TB to 64 TB. A top-end Cray X1 would be rated at a about 52.4 teraflops in terms of theoretical peak performance, and it would cost anywhere from $200 million to $300 million in a usable, balanced configuration.
This, as Rottsolk explained at the Needham conference, is considerably more expensive than clustered Unix servers of similar peak power, but the X1is capable of delivering sustained performance in the teraflops range with only 256 processors and has been shown to have real-world efficiency of anywhere from 20 percent to 50 percent. Parallel clusters based on RISC or Intel microprocessors, says Rottsolk, are lucky to have efficiencies of 5 percent to 10 percent of theoretical peak performance on real jobs. That means Cray can charge more money for an X1 than peak performance ratings might suggest because the machines are more efficient. He explained that one of the reasons why Cray X1 machines were more efficient is that they simply gave applications a lot more bandwidth to play with. The communication rates between nodes in an X1 are running at a sustained 11.5 GB/sec, whereas Gigabit Ethernet, Myrinet 2000, Quadrics and IBM's "Colony" SP2 Switch are lucky to yield a little more than 200 MB/sec to 300 MB/sec. Even IBM's next-generation "Federation" High Performance Switch is only rated at 707 MB/sec according to Cray's figures, and InfiniBand 4X (which is technically a 10 Gb/sec link) is only delivering a 850 MB/sec bandwidth. Sun's "Wildfire" Sun Fire Link is very high bandwidth at 857 MB/sec, but it only scales to four nodes, so its appeal is limited. Cray's X1 architecture is delivering ten times the bandwidth, and it makes a real difference. When you look at how many bytes of information have to be moved to do a calculation (bytes per flop), the Crazy X1 and Red Storm architectures really shine compared to parallel Unix and Linux clusters. The order of magnitude is closer to 30 times more oomph.
This is probably why Cray has been able to book $180 million in X1 sales worldwide in 2003, despite the weakness in the world economies. So far, classified and defense organizations in the Western governments have been the most eager buyers of the X1 machines, particularly because these organizations were among the earliest adopters of the Cray machines in the 1970s and later and they have lots of vector applications that do not lend themselves to parallelization. A lot of them also have black budgets, which means no one knows how much they are spending on supers or on what. Rottsolk said that Cray shipped a bunch of 32 processor machines in Q1, a few 64 CPU machines in Q2, a few 128 CPU machines in Q3, and had booked three 256 CPU sales in Q4. He fully expects to sell X1 machines with 512 CPUs or more in 2004, too. He singled out Oak Ridge National Labs, which has just ponied up $30 million to buy a half-populated eight-frame X1 machine with 256 CPUs, and which is seeing some of its applications run at 25 times the speed of the prior Cray machines the X1 replaced.
According to the Cray roadmap, the X1 will be improved with a mid-life kicker called the X1E in the second half of 2004. This machine will deliver 2.5 times the oomph of the current X1. Further out in 2006, Cray is working on a follow-on in the vector super line code-named "Black Widow" that will employ new MSPs that are created using IBM Microelectronics' 90 nanometer copper/SOI/low-k chip making processes. Further out in 2008 or so, the "Black Window-II" kicker will come out, and beyond that in 2010, Cray will have a newly architected system dubbed "Cascades" that aims to bring sustained petaflops performance to high-end supers.
Cray is not just about the X1, however, which allows it to target an approximately $1bn high-end HPC server market. (Cray has about a quarter of this market, more or less.) With the advent of the Red Storm product line, which marries the 64-bit Opteron processor and HyperTransport processor interconnect with Cray's expertise in parallel clustering and high-speed interconnect, Cray reckons it has an additional $1bn opportunity that it can chase. Red Storm, which will be operational in mid-2004 (a bit earlier than expected), is being developed under a $90 million contract with the DOE's ASCI nuclear weapons design and monitoring program, will result in a first phase machine that should hit either 20 teraflops or 40 teraflops of peak performance. Last October, when Cray hired away Peter Ungaro from the head of IBM's HPC sales efforts to take the helm as top salesman at Cray, the company said it would productize the Red Storm clusters for the masses. Red Storm, by the way, will replace the aging (yet still peppy) ASCI Red massively parallel supercomputer built by Intel for Sandia. That machine currently has 9,632 333 MHz Pentium II processors and is rated at 2.4 teraflops.
Rottsolk was a bit more specific last week in the New York briefing about how the Linux-based Red Storm would go commercial, saying that the machine is being productized under the code-name of "Strider." (Cray has chosen SuSE to for its Linux for Red Storm rather than create its own distribution of the open source operating system.) He said that a key custom component of the architecture (undoubtedly relating to system and memory interconnects) had been taped out and delivered to IBM Microelectronics for manufacturing in late December. He said that as AMD delivers faster speed bumps for the Opteron processors (moving from 2 GHz to 2.6 GHz to 3 GHz and higher), Cray would launch kickers to the Striders that make use of these faster processors. He added that in general Cray was "processor agnostic," but that obviously with the Red Storm architecture, it was heavily dependent on the AMD Opteron and HyperTransport interconnect. That might mean Cray could build Red Storm-like machines using other chips. The PowerPC 970/G5 immediately comes to mind, since it is a very efficient 64-bit processor with great flops ratings. But Rottsolk did not give any indication as to whether or not Cray might do this. He was merely pointing out that Cray had options.
As for the future Cascades architecture, Cray is shooting for a machine that can scale up to 1,000 teraflops, but which could also scale down to much smaller machines for the masses. The Defense Advanced Research Projects Agency is fully funding the second phase of the Cascades project, and Cray is in the running with IBM and Sun to get a contract in 2006 to build an operational machine by 2010. Only two of the three vendors will be chosen. Rottsolk didn't just throw down the gauntlet--he also picked it up and slapped either IBM or Sun with it.
"We fully intend to be picked for phase three," he stated emphatically. "We intend for Cascades to be our product in 2010." Exactly what that product might be--an amalgam of technologies derived from the X1 and Red Storm is the most likely scenario--is as yet unclear.
Cray chief financial officer Scott Poteracki said that Cray would probably book $230 million to $235 million in sales for all of 2003, and that its profits would be at the high end of the 5 percent to 10 percent of revenue range it had told Wall Street it would hit. He also said that the company had no debt, and that he expected Cray's declining annual maintenance revenues on its older kit, which hit $80 million in 2000, to bottom out to about $50 million in 2004. As Cray sells more new gear, new maintenance contracts will start picking up the slack.
|