Volume 1, Number 7 -- February 26, 2004

Sun Aspires to Have a Bigger HPC Business

by Timothy Prickett Morgan

Sun Microsystems has always been popular with the techies, having created the scientific workstation business that allows them to do a lot of their work for more than two decades from their desktops rather than having to share server resources. Sun didn't really get into the server racket until the mid-1990s, and then, when it bought the Starfire server business from Cray and ate the carcasses of Kendall Square and Thinking Machines, the company was suddenly flush with very smart people who understood high performance computing.

It is a bit of a wonder, to be honest, why Sun hasn't taken over a much larger piece of the high performance computing market as it relates to servers. That was clearly a part of the plan. If there is one thing that the UltraSparc-III processor was supposed to be about, it was big iron. If you go back to the original announcements for the UltraSparc-III processors from October 1997, Sun committed to delivering "Serengeti" servers that would have more than 1,000 processors in a single system image using 600 MHz UltraSparc-IIIs in late 1999, 1 GHz UltraSparc-IVs with two cores in early 2000, and 1.5 GHz dual-core UltraSparc-Vs in early 2002. If Sun had done this, the Sparc architecture would be well ahead of any other platform right now. Do the math. A 1,000-processor Sun Fire machine using dual-core UltraSparc-Vs would have had an aggregate peak performance rating of 6 teraflops on the Linpack benchmark that is used to rank the top supercomputer systems in the world. This would have been one smoking machine. But, alas, for a lot of complicated reasons, this did not happen.

Hope springs eternal, though, and nearly a year ago, Sun tapped Shahin Khan, who hails from that deep supercomputing background at Cray and who was chief competitive officer for the company for many years, to be vice president of a new High Performance Technical Computing unit that will span many different lines of business at Sun. Specifically, the HPTC group that Khan controls has some control over Sun's sales pitch and product roadmaps for workstations, X86 and Sparc servers, and various software lines. But given the nature of the HPC market, the HPTC unit is focused mainly on Sun's Grid Engine grid middleware, its Sun Blade workstations, the "WildCat" Sun Fire Link switched fabric interconnect (which is a variant of Fibre Channel) and the related MaxCat HPC variants of the UltraSparc-III Sun Fire commercial server line. The HPTC unit is also pushing entry-level V60x and V65x Linux machines (based on Intel's Xeon DP chips) as well as the new Sun Fire V880z visualization servers based on the V Series of pseudo-commodity Sun Fire Sparc machines. And it will undoubtedly embrace the new two-way V20z Opteron-based servers and their kickers as they are announced later this year because of the superior floating point performance of the Opteron chip compared to the Xeons.

When you add up all the stuff that Sun's customers buy, the HPC market accounts for about $2 billion of Sun's sales each year. A few years ago, when the dot-com boom was allowing Sun to make money hand over fist, HPC sales were not a big priority for Sun since it was a much smaller piece of the pie. But now HPC is a big deal for Sun, since it has a very large and generally happy base of HPC customers who can spend more money than they did a few years ago.

Sun has quite a few Linux clusters in the Top 500 supercomputer list that was just announced this week, and many of the Starfire 64-way servers that were clustered together with fast interconnect to make Sunplex HPC machines have been pushed off the list by the growing number of bigger Linux clusters from lots of other vendors, including Sun. The largest pure Sun box on the recent Top 500 comes in at number 151, a cluster of Sun Fire 6800 and 15000 servers at the University of Aachen with 672 processors that has a peak performance of 1.2 teraflops (but a maximum performance of 891 gigaflops running the Linpack Fortran benchmark), but Pennsylvania State University has a cluster of 160 Sun Fire V60x servers with two Intel 2.8 GHz Xeon DP processors that is rated at a peak of 1.8 teraflops but only 872 gigaflops on Linpack. This machine uses Gigabit Ethernet interconnect, which is less efficient than other fast interconnects. Penn State also has a Dell PowerEdge cluster made from 128 servers using slower 2.4 GHz Xeon DPs, but with Quadrics interconnect that has only a peak rating of 1.2 teraflops, but which yields 878 gigaflops on Linpack.

As you can see, two things matter in HPC: a fast processor and fast interconnect. And that is why the alliance between Sun and Advanced Micro Devices is going to be very interesting for Khan's HPTC organization at Sun. If Sun and AMD collaborate to extend the WildCat interconnect, which is already pretty impressive when it comes to bandwidth and low latency, with AMD's HyperTransport technology, there are some interesting possibilities. But the advent of an Opteron line of two-way and four-way servers that run Linux and Solaris plus off-the-shelf interconnect from Myrinet, Quadrics, and others, will do something that is very important: Get Sun in the game on the big X86 clusters that now dominate the Top 500 list and increasingly dominate the HPC computing landscape.

While WildCat interconnect is impressive, the Sun Fire servers are expensive and the WildCat switch has scalability limits. The current Sun Fire line has an eight node clustering limit, so with 106-way Sun Fire 15000 servers, that means only the biggest super Sun can build is about 2 teraflops peak (about 1.5 teraflops on Linpack) using 1.2 GHz UltraSparc-III processors. Even doubling the processor core count using UltraSparc-IV processors, which were just announced and will ship starting in March, would only get Sun to 3 teraflops on the Linpack test. But using Opteron servers, Sun could easily build clusters with tens of teraflops--and it could probably sell them competitively, too.

Of course, to Sun, HPC is about more than the box. "HPTC is not just technical computing," says Khan. "It is the market that tries out new things first and sees how they work." The banner that Khan waves at Sun everyday has the word "grid" on it, but there are other elements that his team sees as converging in the HPC market: Java, peer-to-peer computing, and Web services, specifically. Khan says that these technologies will converge first in HPC, much as other technologies (like the Internet) were adopted and transformed by the academic and government research institutions of the world before they went mainstream in commercial computing. In the end, Sun and its partners envision a future where companies and institutions create "automated exchanges" for computing power. "Grids are trade exchanges for computing capacity and data access," he says.

But the grid technology is not fully cooked yet, and Sun knows it. That is why Sun is only now extending a grid practice that it started in Europe six months ago to a worldwide operation. Right now, about half of Sun's HPC revenues come from non-profit government and academic agencies, with the other half coming from commercial institutions. When that balance between for-profit and non-profit HPC customers more accurately reflects the split in the computing market at large, then we will known that grids have gone mainstream. "There are many more experiments to be conducted," says Khan. "You have to start small, and you have to go from A to B to C. You have to figure out what works and what does not." Getting a kicker to WildCat out and delivering very powerful, very inexpensive Opteron machines will give Sun a decent chance of taking down more big HPC deals.

