|
Newisys Readies Chipset for Big Opteron Iron
by Timothy Prickett Morgan
While formerly independent Opteron server maker Newisys became a unit of contract hardware manufacturer Sanmina-SCI in July 2003, it has not stopped innovating and is making good on its promise to deliver more scalable Opteron servers than its initial two-way and four-way Opteron machines. At the Hot Chips show last week at Stanford University, Newisys showed off a new chipset code-named "Horus" that will allow it to create the big iron systems that it was hinting it could deliver when it burst onto the scene in January 2003.
Newisys may not be a household name, but Sun Microsystems is reselling its machines as the Sun Fire V20z and V40z and Verari Systems (formerly known as RackSaver) is also peddling its designs. Hewlett-Packard and IBM have opted for their own Opteron server designs--so far, at least. The latter is somewhat mysterious given that the founders of Newisys back in August 2000 were Phil Hester and Clay Cipione, who were key designers of IBM's RISC and Intel workstation products, and that Rich Oehler, the chief technology officer at Newisys, was the lead architect for IBM's RISC and then PowerPC processors in the 1980s and 1990s. Oehler was also one of the designers of IBM's "Summit" family of server chipsets for Xeon and Itanium processors. Why IBM has not just rebadged Newisys machines (and moved beyond the two-way eServer 325 it currently sells) is a just plain odd, especially considering that the xSeries line, with the exception of the BladeCenter blade servers and xSeries Summit boxes, are actually manufactured by Sanmina-SCI.
You might be thinking that the advent of truly powerful Opteron servers from Newisys that scale from 4 to 32 processors might encourage IBM and HP to more fully embrace Opteron and just adopt the future Horus designs from Newisys. But with their own Power-Squadron and Itanium-Integrity server lines to protect, IBM and HP will put off adopting and endorsing these Horus machines. (It is truly funny that Horus is the Egyptian god of the rising sun, so maybe that portends that Sun Microsystems will stop trying to build its own big Opteron servers and just keep on using Newisys designs. Maybe Newisys has gallows humor, or Sun had a change in plans once it bought startup Kealia to get founder Andy Bechtolsheim back in the business of creating servers, in this case based on Opteron processors instead of Sparcs.)
According to Oehler, who gave the presentation at Hot Chips last week, Newisys has been working on the ASIC that comprises the Horus chipset for almost three years. He says that while Advanced Micro Devices's Opteron architecture, with its integrated memory controller and HyperTransport interconnect, can "gluelessly" scale (meaning it doesn't need a sophisticated chipset, like the Horus chipset Newisys created) from 2 to 8 processors in a single system image, the ring architecture of the resulting systems does not scale linearly as processors are added to the machine. The main problem is keeping cache memories coherent, which means ensuring that any data in cache has not been updated by any of the local processors on the cell board to which it is physically attached or remote cell boards adjacent to it in the system (and which have access to that cache). This cache coherency is what makes many processors look like one virtual processor to the operating system, and as you add processors gluelessly using HyperTransport, the overhead from managing the extra caches stresses out the system and it does not scale as linearly as many server vendors would like. (Which is probably why no major server vendors are making eight-way Opteron machines, why eight-way Pentium and Xeon machines were difficult to sell back in the 1990s, and why it takes a clever architecture like IBM's Summit or Unisys' ES7000 to scale well above four X86 processors.) Moreover, while HyperTransport can scale to eight processors, Oehler says that HyperTransport was created for only short links, which means vendors have to pack all the main memory (which is dedicated in blocks to each CPU on the cell board) and processors for an eight-way machine in a very tight space, which is then tough to cool.
The Horus chipset takes a different approach. Instead of creating one giant communications ring structure on which all of the processors and their cache memories are linked to each other, Newisys has decided to adopt a four-way cell board architecture based on standard Opteron chips and then use the Horus chipset as an intermediary. Each cell board uses HyperTransport to cluster four Opterons and keep their caches coherent and also uses the Horus chipset to keep track of the state of caches on remote cell boards in the systems. Conceptually, this is very similar to the architectures IBM is using in its Summit xSeries and various Power machines, and bears some resemblance to the means Unisys uses in its ES7000s and indeed in most modern Unix architectures. Horus is a ring that can support up to 32 sockets, which means up to eight four-socket cell boards. Today, AMD only sells single-core Opterons, but when AMD jumps to dual-core Opterons in 2005, the Horus servers will scale to 64 cores. This will be as big of a box as any other server vendor can put on the market.
The trick to any NUMA-like architecture, says Oehler, is keeping the latency between the cache memory on the cell boards down. To keep from having to rewrite an operating system and its applications, Oehler says a server design has to have a 3:1 ratio or less between the time it takes for a CPU to reach into the cache of a cell board on the other side of the server compared to the time it takes for that CPU to reach into the local cache memory on its own cell board. He won't say how low the Horus designs will go, but says that Newisys has added a 64 MB L3 cache to the Opteron architecture--Opterons include a main memory controller as well as L1 and L2 caches on chip--that it uses as a remote data cache to keep track of what CPU is using what cache lines. The ring of Horus ASICs are basically reading this cache very quickly and allowing the call boards to work through it to reach the cache in adjacent cell boards. The Horus ASIC also has a remote directory, which keeps track of what cache lines are being accessed and controlled by cells outside of a given cell board. He also adds that AMD helped Newisys minimize the cache coherency traffic, which further reduced latencies. Since the Horus chipset creates another cache hierarchy above that built into the Opteron chips, Newisys will be calling it the Extended Scale Architecture when it becomes a product next year.
Oehler says that the Horus chipset has taped out, and that Newisys expects to get ASICs back from the foundry in early 2005 and into systems for OEM customers to examine by the middle of 2005. If all goes well, server makers could OEM the product and have it for sale by the end of next year. "My first job is to get a server built," says Oehler. "It will be straightforward for us to get to 8, 12, and even 16 processors, but getting to 32 processors will be more of a challenge because of software. The hardware always leads the software in the right direction," he adds, and he knows a thing or two about that trend. "This has the potential to change the game bigtime against RISC/Unix systems," says Oehler. That is something that Sanmina-SCI is clearly counting on. It will be interesting to see what IBM, Dell, Sun, and HP do.
The Horus servers will certainly support Linux and Windows, very likely Solaris, and the open source variants of the BSD Unix platform. What The SCO Group does to support the Horus machines is also unclear, but UnixWare can scale up to 16-way and 32-way processing in SMP configurations today and would probably only need moderate changes to create a UnixWare that could compete, in terms of hardware, with big RISC/Unix iron. It is interesting to contemplate support for AIX and HP-UX, but these are very likely not going to be supported on Horus machines. Sun, which will have its Unix ported in 64-bit mode to Opteron processors when Solaris 10 is released in a month or so, presents the most interesting marriage in the Unix market with the Horus servers. But again, Sun may want to focus on its own designs, which would leave Newisys out in the cold.
|