Chip Makers Embrace Co-Processors, Again
October 2, 2006 Timothy Prickett Morgan
If chip makers could put enough transistors on a chip, you could bet that by now they would have hard coded operating systems, middleware, and applications onto their circuits. Perhaps someday we will see such a feat, but in the meantime, the current crop of chip makers are looking to move beyond the familiar central processors with their cache hierarchies and memory and I/O buses to do what might be called asymmetric processing.
The idea of using a co-processor, which does specific tasks and coordinates work with a central processor is far from new. The most famous early co-processor was probably the Intel 8087, which was a floating point math unit that plugged into a socket beside a general purpose 8088 processor, like those used in early PCs. The 8088 could emulate floating point math in software, and did in the absence of the 8087 unit. But when the 8088 saw that it had an 8087 math unit, it gladly offloaded that work, and did other tasks. The result was a machine that was substantially faster.
In time, of course, math units have been integrated into general purpose processors, much as L1 cache and L2 cache have been and, in some cases, as have main memory controllers, peripheral interfaces, and other external circuits that used to be on a motherboard or in a chip package. With the 80486, Intel moved the math unit inside all of its processors; ditto with Motorola and the 68040. By the early 1990s, math was no longer an option, with so many calculations being performed to create graphical user interfaces.
High speed interconnection schemes, high volume chip manufacturing, and high density chip making processes are all making it possible for a new breed of co-processors to come into being. It might seem that with billions of transistors on a chip, the need for a co-processor would be greatly diminished. Anything on an external circuit can be, in theory, moved onto the processor chip, including TCP/IP, database, and other kinds of acceleration as well as encryption/decryption units, and so forth. But throwing in the kitchen sink on a general purpose processor, covering all possible scenarios, is a foolish waste of chip factory yields and chip real estate. In many ways, having few different types of general purpose processor complexes–with their caches and memory controllers–that tightly couple to external co-processors is a better idea. This way, customers add the functionality to their machines as they need it.
Rather than try to fight this, chip makers such as Intel and Advanced Micro Devices are working to make it easier for companies that engineer new kinds of co-processors to hook into the systems that are based on their respective Xeon, Itanium, and Opteron processors.
For AMD, everything comes back to the HyperTransport interconnect, which is used to link Opteron processors to their main memory and I/O buses. AMD’s Torrenza initiative, created earlier this year, is more fully opening up the HyperTransport specification (which has been licensed by most of the big server makers). This will foster a whole new breed of co-processors for the Opteron architecture–or so AMD is hoping.
Last week at the Intel Developer Forum, Intel launched a similar initiative, but because Intel has not yet announced its Common System Interconnection scheme, (due next year for a unified Xeon-Itanium server architecture but likely to be pushed out for a few years) it has to allow co-processors to plug into Xeon and Itanium machines in a different way. Xeon and Itanium machines have different architectures, too, which means that if Intel wants to create co-processors that can work with both machines, it has to create a means of doing so a little farther away from the processor than a current processor front side bus or a future CSI interconnect link would allow. And that means Intel has to make do with the PCI bus. And so, Intel launched the “Geneseo” effort, in conjunction with IBM, to extend the PCI Express bus in such a way as to enable fast links to co-processors to speed up graphics, encryption, and other kinds of applications.
The fact that IBM and Intel created the InfiniBand switched fabric architecture, which is technically sound but economically not exactly taking IT by storm, should probably give many in the server industry, as well as their customers, pause as they consider what Geneseo might offer. Then again, all of the major server makers have joined the Geneseo effort, and they seem as enthusiastic to use very fast PCI Express links to connect co-processors to central processors as they were two years ago when they first used PCI Express in desktop PCs and workstations to put one or two very fast video cards into a box. (A video card is just a special kind of co-processor, if you think about it. It just happens to be one that paints pictures instead of crunching numbers. And PCI Express is far superior to the AGP graphics port Intel created for its chipsets prior to this.)
Peripherals adhering to the PCI Express 2.0 specification are expected to start appearing next year, and this spec will double the available bandwidth to 5 Gigabit/sec. Intel, IBM, and the other Geneseo partners–which does not include AMD or its soon-to-be-captive graphics unit, ATI Technologies–will be proposing that the specifications cooked up under the Geneseo banner be chosen, perhaps two years from now, to be the follow-on technology to PCI Express 2.0, and a means of linking co-processors of all kinds to servers of all kinds–not just those that have HyperTransport.
In essence, AMD wants the HyperTransport to make PCI and kicker peripheral schemes less important, while Intel wants to leave the memory and system bus alone for as long as possible and beef up and extend the PCI bus. The irony, of course, is that many companies (including IBM, Sun Microsystems, Hewlett-Packard, and others) are playing both sides of the street.
Even a startup in San Jose called XtremeData, which offers a field programmable gate array (FPGA) co-processor that is pin-compatible with the Opteron processor (meaning it plugs into a regular Opteron socket), has also joined Geneseo. (FPGAs can be programmed to run very complex algorithms, rather than just doing floating point math very fast.)
AMD itself expects to employ co-processors on the HyperTransport and says that multiple levels of co-processing–probably including devices that plug into Geneseo slots–are going to be necessary.
Still, even with all of this openness, it is hard to imaging that Intel and AMD won’t eventually want to pull the functionality of myriad co-processors onto its general purpose chips, much as operating system vendors are integrating up the stack and down into the virtual machine hypervisor.