tug
Volume 4, Number 24 -- June 28, 2007

Sun Gets Serious (Finally) About Supercomputing

Published: June 28, 2007

by Timothy Prickett Morgan

When Sun's top brass brought Andy Bechtolsheim, its former chief technology officer and the first employee hired by the company's founders, back to the company in February 2004, they got a lot more than a techie who knows about chips, servers, and operating systems. They also got one of the smartest people in the world when it comes to networking technology, and someone who was probably going to give the big server makers a run for the money in the media and high performance computing space with his little company, Kealia. This week at the International Supercomputing Conference 2007 show in Dresden, Germany, we got a peek at exactly what Bechtolsheim was working on at Kealia.

Between the time Sun was founded back in the early 1980s and when Bechtolsheim came back to Sun three years ago, Sun acquired a large amount of supercomputing experience. It bought the carcasses of massively parallel supercomputer makers Thinking Machines and Kendall Square after they went bust as well as the Cray 6400 server line from Silicon Graphics--which may just go down as the best acquisition in the server space, given the popularity of the rebranded "Starfire" Enterprise 10000 servers that fueled the dot-com boom. It is important to remember that Sun had planned for dual-core UltraSparc-V servers to be available in servers with more than 1,000 processors in a single system image and delivering about 6 teraflops of number-crunching power--back in 2002. This didn't happen for a lot of reasons--chip delays, the dot-com bust, and the rise of X64 servers running Linux that offered much better bang for the buck than Sparc boxes ever could.

In the past several years Sun has embraced X64 processors and re-ported its Solaris operating system to this processor architecture, and it was well on its way down this path when Bechtolsheim was brought back in. But what is clear from the "Galaxy" server designs and now the "Constellation" blade server and related parallel supercomputer designs is that, for whatever reason, Sun is back in the game in the server racket in general and is determined to not be at the bottom of the Top 500 list of supercomputers any more. Because Bechtolsheim understands, of course, that the network is the computer and that designing the interconnect in any cluster is more important than the elements of the compute nodes themselves.

At ISC this week, Sun took the wraps off of the Constellation System, which brings various pieces of the Galaxy server line as well as the new "Niagara" Sparc T1-based blade servers together with a high-speed, massive InfiniBand switch that creates a giant supercomputer cluster, one that utterly dwarfs whatever Sun was planning with its UltraSparc designs of years gone by with their "WildFire" interconnect. While the Sparc designs from a decade ago--which Bechtolsheim didn't really participate in--had giant server nodes and a very fast interconnect to lash the machines together, the Constellation System goes the direction that the HPC market has gone, which is toward commodity rack or blade servers and lots of connectivity between server nodes. But the Constellation System that Sun announced this week has a few interesting tweaks that make it different from normal X64-InfiniBand clusters.

The InfiniBand switch, code-named "Magnum," that is literally at the heart of the Constellation System has 3,456 double data rate (DDR) InfiniBand ports. Bechtolsheim, looking at the way people connect servers and storage together for media and HPC applications, took a simple approach with the X4500 "Thumper" data servers, putting 48 SATA ports on a motherboard and turning a two-socket Opteron server into a massive, dense data server. Similarly, he looked at the clustering of InfiniBand core and leaf switches, which are necessary to lash servers together with InfiniBand these days, and though that the best thing to do was to get rid of all of these layers of switches. Servers plug right into the Magnum switch, and there is no hierarchy of InfiniBand gear to buy. (Which may not make Bechtolsheim's prior employer, Cisco Systems, very happy.) To do what the Magnum switch does would take 12 core InfiniBand switches and 288 leaf switches, and by moving to this simplified arrangement, Sun can cut down the number of cables in the cluster by a factor of six and cram a 3,456-node cluster into 20 percent less space. The Magnum is a box that is twice as wide and half as tall as a standard rack, and it has a bisection bandwidth of 110 Tbps. Sun is using a 12X InfiniBand cable coming out of the Magnum switch, which splits down to four 4x InfiniBand links as the wire gets closer to the server nodes.

The server nodes in the Constellation System are, of course, the new Constellation class blade servers, which plug into the Sun Blade 6000 chassis and which use dual-core Opteron, dual- or quad-core Xeon, or Sparc T1 processors. (The latter is not much good at number-crunching.) With quad-core "Clovertown" Xeon chips, Sun can deliver 6 teraflops of computing (768 cores) per chassis and that works out to 24 teraflops per rack. The way Sun is pitching the Constellation System, the nodes run Solaris, but obviously the X64 nodes can run Windows or Linux should customers opt for that. The HPC Cluster Tools and Studio 12 compilers are tweaked for Solaris and Linux, and Sun's Grid Engine grid computing middleware is also in the Constellation System if customers want it. Other workload and cluster management systems, such as Rocks and Ganglia, are also supported.

As for storage, the Constellation System uses X4500 storage servers, and using 1 TB disks (which are just becoming available), Sun can cram 1 petabyte of storage into two racks. These storage servers hook into the same InfiniBand switching structure as the server nodes, which was, after all, the whole point of InfiniBand. The storage servers run Sun's Solaris 10 Zettabyte File System, which has a fault tolerant data protection algorithm Sun calls RAID Z, and layers the open source Lustre object file system on top of that.

When you add it all up, Sun can today deliver a 1.7 petaflops supercomputer with up to 10 petabytes of disk capacity. Such a configuration would include four of the Magnum switches daisy-chained together and would have 13,824 blade server nodes and over 110,592 processor cores. There is absolutely nothing embarrassing about such scalability, and the real question is can Sun deliver this at a competitive price.

Last October, Sun announced that the Texas Advanced Computing Center (TACC) at the University of Texas at Austin had commissioned Sun to build a Solaris supercomputer rated at 400 teraflops using Galaxy servers. As it turns out, TACC is actually buying a Constellation System, and nicknaming it "Ranger." Since last October's announcement, the Ranger cluster machine has been upgraded to over 500 teraflops. TACC is waiting, like many customers, for Advanced Micro Devices to deliver the quad-core "Barcelona" Opteron processors for its machine. TACC plans to use 15,700 of these processors, 125 terabytes of main memory, and 72 of the Thumper arrays, which will have a total of 1.7 petabytes of disk capacity. Ranger is being built through a $59 million grant from the National Science Foundation. About $30 million of that is going to the university for hardware acquisition (it is unclear what Sun's take is) and the remaining $29 million is for ongoing support costs for Ranger, which is expected to be operational on December 1.


RELATED STORIES

Sun Broadens Its Blade Server Lineup

Sun's X64-Based Streaming Server Runs on Linux

Sun Gets 400 Teraflops Supercomputing Deal with Galaxy Servers

Where Are Sun's Big Galaxies and Opteron Blades?

ClearSpeed Ships Advance Co-Processors in Giant Sun Supercomputer

Sun Aspires to Have a Bigger HPC Business

Cray, IBM, Sun Split Phase Two of $146 Million DARPA Super Deal



                     Post this story to del.icio.us
               Post this story to Digg
    Post this story to Slashdot


Sponsored By
MKS

MKS Takes Risk Out of Change Management
for Puget Sound Blood Center

At Puget Sound Blood Center (PSBC) we need our systems to run
24/7 x 365 days a year - lives depend on it.

Software change can be risky business. MKS gives us
one change management solution across all of our platforms.

With MKS for System i and distributed change control,
our systems run risk free. With MKS, we are one.

Read our story


Editor: Timothy Prickett Morgan
Contributing Editors: Dan Burger, Joe Hertvik,
Shannon O'Donnell, Timothy Prickett Morgan
Publisher and Advertising Director: Jenny Thomas
Advertising Sales Representative: Kim Reed
Contact the Editors: To contact anyone on the IT Jungle Team
Go to our contacts page and send us a message.

Sponsored Links

Vibrant Technologies:  Quality Used Servers, Storage & Networking Hardware at up to 80% off new
World Data Products:  FREE 84-page Unix/Midrange Server Spec Book
COMMON:  Join us at the Annual 2008 conference, March 30 - April 3, in Nashville, Tennessee


The Four Hundred
The AS/400 at 19: Predicting the Future--Or Not

IBM Kills Off System i ServerProven, Standard Edition Rebates

VoIP and the Search for Single Points of Failure

As I See It: Dare to Be Rich

The Linux Beacon
Mandriva, Ubuntu Not Interested in Microsoft Deals

SGI Launches Blade-Style Altix Linux Supers

Fujitsu Adds New Blade Chassis, Quad-Core Server

The CIO Is the Hammer, and Everything IT Vendors See Are Nails

Four Hundred Stuff
MPG Helps to Size Boxes in a User-Based Pricing World

Vision's Product Plans Change Little Post Lakeview

Don't Overlook Hardware-Based High Availability Alternatives

Halcyon Boosts Spool File Manager, Company

Big Iron
For Some Users, Multiprise and VSE May Have a Bright Past Ahead

Top Mainframe Stories From Around the Web

Chats, Webinars, Seminars, Shows, and Other Happenings

Four Hundred Guru
PHP on i5/OS: A Whole New Stack

Performance of Function Subprocedures

Admin Alert: Meditations on Full System Backups

System i PTF Guide
June 23, 2007: Volume 9, Number 25

June 16, 2007: Volume 9, Number 24

June 9, 2007: Volume 9, Number 23

June 2, 2007: Volume 9, Number 22

May 26, 2007: Volume 9, Number 21

May 19, 2007: Volume 9, Number 20

The Windows Observer
Microsoft Back on the Top 500 List of Biggest HPC Systems

Is Windows Vista Really More Secure Than Linux or OS X?

Mandriva, Ubuntu Not Interested in Microsoft Deals

Microsoft Concedes to Google, Will Scale Back Search with Vista SP1

Four Hundred Monitor
Four Hundred Monitor's
Full iSeries Events Calendar

THIS ISSUE SPONSORED BY:

MKS
Lakeview Technology
Roaring Penguin
Arkeia
Vibrant Technologies



TABLE OF CONTENTS
Sun Gets Serious (Finally) About Supercomputing

Top 500 Supers: Moore's Law Is Alive and Well

HP Promotes Transitive Tool to Port Solaris Apps to Integrity Servers

As I See It: Dare to Be Rich

But Wait, There's More:


The CIO Is the Hammer, and Everything IT Vendors See Are Nails . . . Sun to Take 'Full Moon' Clustering Open Source . . . IBM Previews Virtualization Management Tool for Power-Based Boxes . . . Database Sales Grew By 14.2 Percent in 2006, Says Gartner . . . AC Capital Partners to Run Portfolio Models on Sun's Grid . . . Xangati Detects Application, Network Problems with New Appliances . . .

The Unix Guardian

BACK ISSUES





 
Subscription Information:
You can unsubscribe, change your email address, or sign up for any of IT Jungle's free e-newsletters through our Web site at http://www.itjungle.com/sub/subscribe.html.

Copyright © 1996-2008 Guild Companies, Inc. All Rights Reserved.
Guild Companies, Inc., 50 Park Terrace East, Suite 8F, New York, NY 10034

Privacy Statement