Newsletters   Subscriptions  Forums  Store  Media Kit  About Us  Contact  Search   Home 
mid
Volume 3, Number 7 -- February 18, 2004

OctigaBay Takes Opteron-Linux to New HPC Heights


by Timothy Prickett Morgan

A relatively new upstart is entering the fiercely competitive high performance computing (HPC) server market. OctigaBay Systems, a two-year-old startup founded by VCs and experts in HPC and telecoms systems, is based in Vancouver, British Columbia, and its new OctigaBay 12K machine, which runs Linux and can scale to 12,000 64-bit Opteron processors in a cluster, presents an interesting challenge to IBM, Hewlett-Packard, Cray, Sun Microsystems, and SGI as they chase the HPC dollars.

OctigaBay is not a real place in Canada, by the way, but a made up name to reflect that there are eight major investors in the company. The company is the third startup for Paul Terry, who works as chief technology officer at the company; he was co-founder and CTO of Abatis Systems, an IP networking specialist that was eventually bought by Redback Networks. OctigaBay's CEO, John Seminerio, was a partner at venture capitalist Magellan Angel Partners and has two decades of telecom experience, including stints with Nortel Networks and DSC Communications. With the launch of its 12K system, OctigaBay is opening a sales office in Richardson, Texas, suburb of Dallas. This is near one of the hottest hotbeds of supercomputing in the world, thanks to the never-ending processing needs of the oil and gas exploration industries. OctigaBay has 60 employees, and the 12K is its first product.

"It is pretty clear that X86 processors and a handful of RISC processors have won the war for HPC," says Terry. He says that industry after industry is adopting parallel clusters of commodity servers based on X86 or cheap RISC processors because these machines offer an order of magnitude better price/performance than parallel machines based on big, wide SMP servers. "The only thing that is really scalable about wide SMP architectures is the price," quips Seminerio, adding that people were saying the same things about vector supercomputers in the 1990s when RISC-based parallel machines first came into vogue. And while OctigaBay is big on using commodity components for its 12K machine--in this case, Opteron 246 processors running at 1.8 GHz or 2.4 GHz clustered together and running a modified Linux--the company knows that the real problems with these machines is that they are difficult to use efficiently and they are tough to manage. So the OctigaBay 12K product includes not only sophisticated hardware to make them more efficient than regular Linux clusters, but also has systems management programs that the company has invented.

The OctigaBay architecture is a bit funky compared to a plain jane Linux cluster using Gigabit Ethernet or Myrinet interconnect, and it reflects the telecom heritage of its founders. The base OctigaBay component is a shelf, which is a 3.5 U rack-mounted chassis that has a total of 31 processors crammed inside. A dozen of the processors are AMD Opterons, which are organized as six two-way servers that can deliver 58 gigaflops of aggregate, raw computing power. There are another dozen communications processors that link to the HyperTransport buses on the Opterons to provide a high-speed link to the switching fabric that connects all of the shelves in a massively parallel machine to each other. Each motherboard in the shelf has six FPGA processors that can be configured on the fly as either a compute co-processor or as a switch fabric processor to accelerate the running of jobs on the system, including vector math if necessary. The final processor is an AMD AV1000 embedded processor, which is used to run the Active Management System programs that control the machine. The base shelf has 96 GB of main memory and 8 GB/sec of system I/O bandwidth.

The Rapid Array Interconnect switch fabric is one of the things that is going to set tongues wagging about OctigaBay. The switch fabric can provide 1 Tbit/sec of bandwidth. This is the heart of the machine, really. In many conventional parallel machines, as you move further away from the processor, the bandwidth of system components actually decreases. The effect of this is that processors are often waiting for data to come from cache memory, main memory, or heaven help you, disk storage. The OctigaBay design has an interconnect that runs faster than the main memory in modern SMP servers, and the bandwidth increases as you move away from the chip. This should make the machine more efficient, as latencies in waiting for data to be passed from one compute node to another are driven way down. Terry says that the state of the art for parallel supercomputer latencies is somewhere around 5 to 8 microseconds. The OctigaBay 12K has latencies of around 1 microsecond in its first iteration.

Those low latencies are one reason why the machine can scale to a ridiculously large 12,000 processors in a single machine. The system management processors and control software allows such a large machine to be administered from a single console. Right now, OctigaBay is running the SuSE Linux Enterprise Server 8.0 release of Linux on a modified version of the Linux 2.4.19 kernel, which has had its CPU scheduler changed to provide a 100 nanosecond heartbeat to keep all of the Opterons in synch with each other. This scheduler, in essence, helps make the cluster behave more like a big SMP box than a bunch of servers clustered together. The combination of the new CPU scheduler for Linux and the low latencies makes the box easier to manage and perform work more efficiently. Just how much, OctigaBay is not yet prepared to say.

A single OctigaBay 12K shelf, with 58 gigaflops of power will be an interesting entry machine for a lot of customers, says Terry. The base shelf comes with 6GB of main memory and 200GB of disk for storing Linux instances. The sweet spot in the market, he believes will be sales of one shelf to two racks full of these systems. A three shelf system with 36 Opterons will sell for $600,000, while a single rack machine with 12 shelves will sell for under $1 million, including the cost of the proprietary switches that OctigaBay has created for the system and that deliver 12 Tbit/sec of I/O bandwidth in that configuration. The full-blown system would have 1,000 shelves and over 1 Pbit/sec of aggregate bandwidth and over 58 teraflops of computing power and 96 TB of memory. The company did not say what such a large machine could cost, but it is probably on the order of $50 million to $75 million, depending on how deeply OctigaBay wants to discount.

The OctigaBay machines will be installed for early field trials at Sandia National Labs and Lawrence Livermore National Lab, and will be generally available sometime in early 2004.

Sponsored By
STALKER SOFTWARE

COMMUNIGATE PRO MAIL SERVER BY STALKER SOFTWARE, INC.

Stalker Software is the technology leader in messaging and provides email solutions for thousands of Telco's, ISP's and corporations worldwide. Our flagship solution, CommuniGate Pro, is a comprehensive messaging solution incorporating high performance, speed, reliability, security and an extensive feature set. It supports over 30 hardware/OS combinations.

KEY FEATURES: Anti-spam, Calendaring, IMAP4rev1, ESMTP, POP3, WebEmail, MailList, Central Directory LDAP services and much more.

FREE TRIAL: www.stalker.com


Editor: Timothy Prickett Morgan
Managing Editor: Shannon Pastore
Contributing Editors: Dan Burger, Joe Hertvik, Kevin Vandever,
Shannon O'Donnell, Victor Rozek, Hesh Wiener, Alex Woodie
Publisher and Advertising Director: Jenny Thomas
Advertising Sales Representative: Kim Reed
Contact the Editors: To contact anyone on the IT Jungle Team
Go to our contacts page and send us a message.

THIS ISSUE
SPONSORED BY:

Hewlett-Packard
Unisys/Microsoft
Winternals Software
Stalker Software
Acucorp


BACK ISSUES

TABLE OF
CONTENTS
Windows Source Code Appears on the Web

Microsoft Fights Unix, Linux with Free SFU

Linux 2.6: Let's Take a Look Under the Hood

OctigaBay Takes Opteron-Linux to New HPC Heights

As I See It: Censoring the Self



Copyright © 1996-2008 Guild Companies, Inc. All Rights Reserved.
Guild Companies, 50 Park Terrace East, Suite 8F, New York, NY 10034
Privacy Statement