Where Are Sun's Big Galaxies and Opteron Blades?
Published: April 6, 2006
by Timothy Prickett Morgan
Unless you have designed and made your own servers and built manufacturing facilities or the supply chain of partners necessary to mass produce them, you probably do not have a good idea of how difficult it is to get a new server to market. I've built a few machines in my time--and am in the process of designing and building some more--so I have some sense of why there can be delays. Even still, I wonder sometimes what is taking Sun Microsystems so long to flesh out its "Galaxy" line of Opteron-based Sun Fire X series servers.
It is a little more difficult to pin Sun down on a server product launch date these days, since it quietly killed off the whole Network Computing quarterly announcement schedule that its then-new top management team created two years ago when they took over the company. Under this marketing scheme, Sun tried to cut costs and boost its press profile by putting the vast majority of its announcements into a single quarterly extravaganza. As 2006 began and after Sun figured out that some product schedules move around and do not fit neatly into preordained time slots four times a year, the company decided to stop this whole NC thing. From a PR and marketing standpoint, the NC approach made some sense in terms of bang for the buck, but in others it didn't. Sun was trying to tell too many stories on NC day, or didn't have product launches that hung together in a theme and had to force one. Now, Sun is back to making server, storage, operating system, middleware, and other announcements as products are ready for market.
While Sun is not in any mood to pre-announce the high-end Galaxy machines, the company does want to let people know they exist and that they are indeed coming--and apparently soon. "These Galaxies are not PowerPoint slides," said John Fowler. "These products physically exist, and they are in metal." He says that hundreds of the expected big Galaxy boxes, which will range up to a 16-core, eight-socket Sun Fire machine and will probably include a smaller eight-core, four-socket machine, have been built.
In fact, although Sun did not say it explicitly last year, the big 50 teraflops computing cluster deal that Sun did in partnership with NEC and AMD for the Tokyo Institute of Technology is comprised of 655 of these 16-core Sun Fire Galaxy machines plus some 360 floating-point accelerator cards from ClearSpeed. Tokyo Tech is using dual-core Opteron processors, and presumably they are either 68-watt or 55-watt 800 series variants of the Opteron chips because with 5,240 processor cores, heat is a big issue.
If Sun's naming conventions for the Sun Fire Galaxy boxes mean anything, then the X stands for an Opteron-based machine, the next number is the maximum number of Opteron cores that can be put into the box, and the number after that is the number of industry-standard rack units that the box takes up. That makes a single-socket Galaxy the "X2100," and the two-socket boxes the "X4100" and the "X4200." All three of these machines were launched last September and have been ramping up sales since then. A four-core box will presumably fit into a 2U and/or 3U form factor, making it a Sun Fire X8200 or X8300, but Sun could do a 4U box, which would be the Sun Fire X8400. With the 16-core box, it is hard to imagine Sun getting it much smaller than a 6U or 7U form factor, which would make it the Sun Fire X16600 or X16700.
In addition to these big SMP boxes, Fowler confirms that Sun was working on blade server designs in the Galaxy product family, but again, he didn't want to say too much about them. "We have a whole different concept called modular computing," he explained. "We consider the current crop of blade servers to be 'wambly' Web servers," he joked, riffing on an old National Lampoon routine that few people probably know--but I did, to his surprise. He said that these blade machines also exist, and that there have been beta and now there are early access unit of these machines out there in the field.
When pressed about the feeds and speeds of the machines, Fowler said that Sun was looking at leveraging InfiniBand interconnect, and was not restricting the bigger Galaxy boxes to the future "Santa Rosa" Rev F Opteron processors, which are expected in July or so.
One of the reasons why Sun is so big on InfiniBand now is because the company believes that it can storm the high performance computing server market (which is growing at many times that of the server market in general) with a low-power server stack with large main memories and fast interconnect. Fowler estimates that about 30 to 40 percent of Sun's Opteron server sales to date have been in the HPC market, with the rest being for Web and other infrastructure workloads. "Regardless of all the talk about database clusters, I do not think Opteron is used today for Oracle databases," he said.
Again, the Tokyo Tech deal gives a sneak peak at the future big Galaxy boxes. On the InfiniBand front, that Galaxy cluster at Tokyo Tech will be doubled to 10,480 Opteron cores and having over 100 teraflops of computing capacity (thanks in part to the ClearSpeed boards), and one of the reasons why this cluster can be extended so far is that it is based on an InfiniBand switched fabric called the "Grid Backbone" from Voltaire. The Tokyo Tech cluster will have 64 GB of main memory per server and dual 4x InfiniBand ports as well as integrated Gigabit Ethernet ports. (The current low-end Galaxies have four Gigabit ports, and it seems likely that the bigger boxes will have at least that many.) In aggregate, the 655 servers in the Tokyo Tech cluster will have well over 1 petabyte of disk storage in total.
Fowler said he is excited about deploying big Galaxy machines in the HPC space because academic and research institutions have come to the conclusion that having a large number of fatter nodes fits a lot more supercomputing applications than a large number of skinny nodes. The issue has always been cost, and with the advent of dual-core--and soon quad-core--processors, companies like Sun can now deliver a reasonably large SMP machine with maybe eight or 16 cores and a fairly large shared memory of 64 GB or 128 GB for a lot less than what a similar RISC/Unix box cost only a few years ago. "We think HPC is going back to fatter nodes, and our upcoming Galaxy machines will cater to this," said Fowler. "You want as much local memory as possible, and when you do the price/performance, this scenario wins over skinny boxes."
Fowler is also promising that Sun will have "an interesting I/O architecture" with the future Galaxy boxes. "The ratio of memory to CPU to I/O has been very carefully thought out on these boxes so we can go after a wide range of workloads." Sun plans to support Gigabit Ethernet, 10 Gigabit Ethernet, and InfiniBand ports on these machines. This way, companies can choose the appropriate latencies and bandwidth to match their workloads. Fowler said that Gigabit Ethernet has latencies of around 40 to 100 microseconds, and 10 GigE reduces that to around 10 to 20 microseconds. But InfiniBand 4x ports drop that down to 1 to 2 microseconds, which is very low, and will enable bandwidths in the range of 40 to 60 gigabits per second. With latencies that low and bandwidths that high, this is the point where a cluster starts acting like an SMP cluster (in some ways) and it is appropriate to run back-office applications like ERP suites.
The one thing that is not holding up the Galaxy launch, but the way, is AMD's move to the Rev F Opterons. These Opteron processors will be based on a 90 nanometer chip process and will include an integrated DDR2 main memory controller, which is great because DDR2 main memory is faster and emits less power. However, Sun cannot wait until its next fiscal year to start selling bigger Galaxy boxes, and so these machines are being architected to use both the current Rev E Opterons, which support DDR1 main memory, and the future Rev Fs. "We designed our mechanicals so we could handle both processors," said Fowler. "And besides, there are a whole range of customers who will not want or be able to support Rev F Opterons immediately because they need to qualify the chips."
AMD Says It Can Still Beat Intel Cores with Opterons
Sun Cuts Price Tags on Galaxy Opteron Servers
Sun Launches the First Three "Galaxy" Opteron Servers