SGI Partners to Launch 'Bright' Linux Clusters
by Timothy Prickett Morgan
The idea of having a preconfigured server shipped to you is old hat: Every server vendor worth its salt lets you configure a machine online before you buy it, and you get a finished system shipped right to your data center door. Customers in the high performance computing (HPC) market, where parallel supercomputers have been hand-made for decades, have had about enough of this. They want clusters that behave like, are preconfigured like, and sold like monolithic systems. These are so-called "bright clusters," and supercomputer maker Silicon Graphics is partnering to target this fast-growing market with its Linux-based Altix machines.
SGI this week will announce it is offering two new clusters that go after this bright cluster market. Bright clusters account for about a third of the HPC server market, and this portion of the market doubled to $1 billion in aggregate sales in 2004. It is reasonable to assume this portion of the market will continue to grow, since no one wants to spend time integrating an HPC setup unless the workload is so unique that it requires it.
That, says Ron Renwick, cluster product manager at SGI, is what the new Altix 1350 and Altix Hybrid Cluster are all about. SGI has partnered with Norwegian cluster management specialist Scali AS to have its Linux-Itanium Altix cluster nodes managed by Scali Manage, one of the most popular such programs in the HPC market. SGI has also expanded its partnership with Voltaire to provide InfiniBand interconnection between nodes in SGI's new Altix 1350 clusters. Voltaire is also providing the InfiniBand interconnection switching in the "Columbia" massively parallel supercomputer SGI recently sold to NASA and which has 10,240 Itanium 2 processors. (Columbia is really an InfiniBand cluster of Altix nodes that are themselves NUMAflex clusters.)
The Altix 350s, which were announced a year ago, are HPC cluster nodes based on a two-way Itanium server. These nodes are linked together into HPC clusters using SGI's NUMAflex links, which allows main memory in the nodes to be pooled into a single storage space for applications. SGI's high-end 64-way Altix 3700s, which were announced two years ago, can now be extended to 256-way processing with up to 24 TB of shared memory. While this is a lot of computer, it is not necessarily the right architecture for every HPC workload. And neither is a cluster of smaller Altix 350s necessarily the right machine for the job. That is why SGI is rolling out the Altix 1350, which is a new node that scales up to 16 Itanium processors with up to 192 GB of single shared memory space. With a special router (derived from technology used in the Altix 3700s), customers can glue two of these Altix 1350s together to get a 32-processor node with 384 GB of shared memory. This router has a 6.4 Gb/sec, bi-directional, eight-port NUMAflex 4 link.
While many organizations install clusters with hundreds of processors, what often happens is that the workload managers on the clusters and the nature of the jobs themselves cause organizations to carve up their clusters into pieces with 16 or 32 processors. If they are using two-way server nodes, as most organizations do, 8 GB of main memory is put on each node. However, it is often tough to cram the data sets that are being worked on into 8 GB, and that means the machines are swapping data in and out of the nodes to run simulations. If this is going on, it makes much more sense to have a 16- or 32-processor node with a single shared memory, which will run the same workload faster. Moreover, for applications that are priced based on the node, a scaling up a node (rather than adding more smaller nodes) lowers software costs. Which is why SGI is rolling out the Altix 1350s.
The Altix 1350 will run SGI's own Advanced Linux Environment with the SGI ProPack, which is a variant of the open source Linux kernel with tweaks to take full advantage of the Altix shared memory architecture, which is a variant of the NUMAflex architecture in SGI's MIPS-Irix Origin parallel servers. SGI will also be supporting Novell SUSE Linux Enterprise Server 9, and when Red Hat makes it available some time in March, the Altix 1350s will also support Enterprise Linux AS 4. The entry Altix 1350 cluster will sell for $200,000 in a base configuration; the router to take a 16-processor node up to 32 processors will cost $20,000. The Voltaire InfiniBand interconnect is the default fabric for clusters made of Altix 1350 nodes, and Scali Manage is the software that SGI is recommending to manage these clusters.
As part of the focus on bright clusters, SGI is also announcing the Altix Hybrid Cluster, which is an HPC solution that extends the Altix architecture and the SGI tools and Linux environment it embodies, the Scali Manage system management tools, and Voltaire InfiniBand interconnect in such a way that other 32-bit and 64-bit X86 servers can be put under the umbrella of the Altix systems. While SGI is not reselling another vendor's X86 servers and is focused on its own Itanium boxes, the company understands some workloads are better suited to 32-bit and 64-bit X86 architectures than they are for Itanium architectures, at least for now. There is a lot more code written for the X86 than for the Itaniums right now, and SGI has to deal with this fact, as does every server maker.