|
Lawrence Livermore Installs Fourth Appro Linux Cluster
Published: June 12, 2007
by Timothy Prickett Morgan
With the consolidation in the server business over the past decade, it is easy to think that the top five vendors get all the deals. But they don't. Here's a case in point: Appro, a specialist in building high performance supercomputer clusters that is based in Milpitas, California, was chosen to build a bunch of clusters at Lawrence Livermore National Laboratory, which has been known to buy a lot of gear from the dominant server makers.
LLNL is, with Sandia National Laboratory, in the process of re-designing nuclear missiles for the United States government under the guidance of the Department of Energy, which is probably the largest consumer of supercomputing power on the planet. Because of the Nuclear Test Ban Treaty, the U.S. cannot design and actually blow up a real nuclear bomb, so LLNL and Sandia have been asked to work together to improve the design of warheads and to simulate the designs and their explosions. This job, as well as many other workloads supported by the national computing labs in the States, require an immense amount of computing power. But, labs like LLNL are picky about who they choose to build and support their clusters, and as you might imagine, Appro is thrilled to have closed its fourth deal with LLNL and to have beat out much larger competitors to get the deals it has.
According to Mark Seager, manager of the integrated computing and communications department at LLNL, which is also home to the monster Blue Gene/L Linux supercomputer built by IBM as well as myriad other systems, the lab started a new procurement process for Linux clusters at the end of 2006 that involved getting vendors to put together what it calls "scalable units" as part of their bids. The idea that LLNL has is it wants a basic cluster building block--in this case, a cluster of 144 server nodes that has two InfiniBand ports for each node. One InfiniBand link is used to link the server nodes together and the other one is used to manage the servers. By creating the scalable node unit, LLNL and its vendor can think about clusters, which can and will be interconnected, in terms that are larger than a single server. They don't have to recreate the wheel each time a cluster is built or built out, and moreover, the server vendor can standardize and streamline a process to build a cluster. Appro won the deal, says Seager, because it could demonstrate that it could get its supply chain together and build fairly large clusters in a predictable manner. While the technology in the server is important, it turns out to have not been as much of a differentiator as one might think--particularly when you know that all of the tier one server makers and a bunch of tier two players with HPC specialties did not beat out Appro for the LLNL contracts.
The cluster that Appro is just now delivering LLNL is called Minos, and it is comprised of six scalable units based on Appro's 1U Quad XtremeServer Opteron machines, which are four-socket boxes that use the Rev F dual-core Opteron 8000 series processors. The InfiniBand fabric consists of Mellanox dual-port InfiniBand host channel adapters and Voltaire InfiniBand switches. The Minos cluster has 6,192 cores and 13.5 TB of aggregate main memory, and is rated at about 33.2 teraflops.
Minos, like the other Linux clusters that LLNL runs, is based on a homegrown variant of Red Hat Linux called Cluster High Availability Operating System, or Chaos, which is deployed across all Linux servers at LLNL. Chaos Linux is based on RHEL 4 and has extensions for HPC clustering and high availability clustering as well as security and kernel tweaks that LLNL researchers have made for the specific hardware they use.
A few months ago, LLNL installed a four scalable unit cluster named Rhea, which has 576 nodes and is rated at 22.1 teraflops, a smaller cluster called Zeus that is rated at 11 teraflops, and a larger cluster called Atlas rated at 44 teraflops. This is 20 scalable units of power, and according to Seager, the theoretical limit of the cluster design is 144 scalable units. This would be a whopping 20,736 nodes, which would probably generate too much heat to be useful given current processors. Unless, of course, we put dams across the Mississippi River for hydro power. . . .
Financial details of the four cluster purchases by LLNL were not disclosed.
RELATED STORIES
U.S. Energy Department Gives Away 95 Million CPU-Hours on Supers
IBM Shows Sustained 207 Teraflops Performance on Blue Gene/L with Qbox App
Appro Preps XtremeBlades for First Quarter
Lawrence Livermore Lab Turns on World's Biggest Linux Cluster
Intel Talks Up HPC Prowess With Big Itanium Win
Post this story to del.icio.us
Post this story to Digg
Post this story to Slashdot
|