Cray, IBM Win Final Round of Giant DARPA HPCS Program
Published: November 28, 2006
by Timothy Prickett Morgan
The Defense Advanced Research Projects Agency, which is the research and development arm of the U.S. Department of Defense, has finally, and somewhat belatedly, awarded the third and final phase of the High Productivity Computing Systems (HPCS) supercomputer development program. The total budget for the final phase is substantial, weighing in at $494 million, and the funds are being distributed to the winners to develop the computing systems and programming tools for petaflops-scale supercomputers.
The initial HPCS program got rolling in the summer of 2002, and back then five vendors--IBM, Cray, Silicon Graphics, Hewlett-Packard, and Sun Microsystems were all awarded a few million dollars each for a chance to compete for the petaflops grand challenge. A year later, in July 2003, Cray, IBM, and Sun won the phase two part of the HPCS contract from DARPA, which totaled $146 million.
The phase two part of the HPCS contract was managed by the legendary Lincoln Labs at MIT. Under that program, Cray started development on a petaflops-scale machine called "Cascade," which as we know is the product line that Cray has been working toward completing for the past several years. Only a few weeks ago, Cray announced its XT4 Opteron-based and XMT MTA-based supercomputers on a converged SeaStar2 interconnect, and these boxes are the foundation for the future Cascade boxes. Cray was awarded a $43.1 million for its phase two HPCS contract, and as we now know, Cray is creating a hybrid architecture with Cascade that will bring Opteron, MTA, vector, and FPGA computing elements all into a shared, global memory system that can span to 10 petaflops of performance and--most importantly--have a single Linux operating system and program development environment that knows how to deploy the right kinds of applications on the right kind of processing elements.
IBM was awarded a $53.3 million contract to fund research into a future Power-based supercomputer called PERCS, which is short for Productive, Easy-to-use, Reliable Computing System. Back in 2003, IBM didn't say much about what it was up to, except that PERCS would be a Power-based system, just like the ACSI Purple supercomputer, part of the ASCI program that was created by DARPA to drive the delivery of indigenous American multi-teraflops scale supercomputers in the 1990s. IBM's Microelectronics Division, Systems Group, Software Group, and IBM Research groups have all been in on the PERCS design, which is being developed in collaboration with Los Alamos National Lab and a dozen academic institutions with HPC expertise.
Even though HP (thanks to its Compaq acquisition) and SGI had substantially more experience in supercomputing, Sun made it into the phase two portion of the HPCS program, getting $49.7 million to work on its "Hero" project. The goal of the Hero project was to create development tools that can mask the underlying parallelism of a machine with thousands or tens of thousands of processors and make it look--at least as far as applications are concerned--like one big processor. This would, of course, take a heroic effort, and such a capability is the holy grail of computing, in fact. DARPA seems to have come to the conclusion that the capabilities of the Hero system would not come to market--at least not in the 2010 timeframe the project calls for.
This fact, as well as the goal that the United States government always has to have at least two indigenous suppliers of high-end supercomputers, explains why IBM and Cray were awarded the final HPCS contracts. The fact that simulating nuclear weapons, which is what the ASCI and HPCS machines spend a lot of their time doing in the government-sponsored supercomputing labs in America, turns out to be a lot more difficult and requires a lot more computing power than many had been predicting a decade ago, explains why there will undoubtedly be a follow-on program to HPCS that will push well beyond petaflops computing--and maybe even work on some of the goals that Sun laid out in its Hero project. But ease of programming is not nearly as important to Uncle Sam as raw computing power to do simulations of nuclear weapons, and the consensus seems to be that all the computing power we can muster today is not sufficient to the task.
Cray's $250 million phase three HPCS award is obviously a much-needed lifeline to the company. The phase one and phase two HPCS awards plus its $105 million "Red Storm" supercomputer prototype for Sandia National Laboratories provided the foundation for the Cascade machine, but it is not clear that Cray made any money on the effort, even after Red Storm was commercialized as the XT3. (There is no dishonor in breaking even on such a massive engineering project, but Wall Street is not enlightened about such matters.) The $250 million award basically pays Cray to do what it has already been promising to do--unify its products into a single interconnect supporting four different types of programming elements in a single Linux environment. Cray and its partners have also committed additional funds beyond the phase three award to making a 10 petaflops Cascade machine and unified programming environment happen.
While Linux has taken over a substantial portion of the supercomputing market in recent years, Unix and RISC architectures are still used, and the largest RISC/Unix box in the world, the $190 million, 75.8 teraflops ASCI Purple machine at Lawrence Livermore National Laboratory, needs a kicker. IBM is being vague about how it will spend the $244 million it was awarded for its phase-three part of the HPCS award, but it has said that the machine it will build will be delivered initially in 2010 and upgraded in following years. The box will be based on its Power7 processors, and the machine will run its AIX Unix variant as well as IBM's General Parallel File System, its Parallel Environment for Fortran and C programs, and use unspecified interconnect and storage subsystems that will blaze a path for its future commercial servers.
IBM has not made any specific commitments in terms of performance, but says it needs to go beyond the 2X performance boost every 18 months, which is driven largely by processor technologies. In fact, IBM is committed to delivering a factor of 100X improvement in performance in four years. The National Center for Supercomputing Applications (NCSA) at the University of Illinois will be IBM's development partner for the final HPCS system--whatever it will be called. Other laboratories and universities will also be involved with the project, of course, and there will be a lot of work on compilers, middleware, and databases since the HPCS project is about making these behemoths easier to program.
Cray Announces XT4, XMT Supercomputers
Cray, IBM, Sun Split Phase Two of $146 Million DARPA Super Deal