Newsletters   Subscriptions  Forums  Store   Career  Media Kit  About Us  Contact  Search   Home 
tlb
Volume 2, Number 18 -- May 10, 2005

Battle of the X64 Platforms


by Timothy Prickett Morgan


The X86 platform has long since dominated both the server and workstation markets in terms of shipments, but in terms of engineering and features, the X86 platform has continued to lag RISC/Unix and proprietary alternatives for years. While the popularity of X86 platforms and the intense competition they have brought to the market have sucked a lot of the revenue and, more importantly, a lot of the profits from the server business, creators of non-X86 platforms have, to their credit, ran to higher ground, adding features and functions to their systems that the X86 could not deliver.

With the advent of a rapidly maturing X86 market as embodied in the new 64-bit X64 alternatives from Intel and Advanced Micro Devices, the competition looks to be getting even more intense. The few remaining RISC/Unix and proprietary platforms that are economically viable are going to start feeling even more pain now. That does not mean there is no longer room for alternative platforms; there most certainly is. But it is going to be very hard to bring them to market and make money.

The pressure that the X64 platforms, embodied in the "Nocona," Xeon DP and "Potomac" and "Cranford" Xeon MP processors from Intel and the "SledgeHammer" Opteron processors from AMD, are going to have volume economics on their side, and that means intense price advantages for similar performance against alternative Power, Sparc, PA-RISC, and Itanium platforms. Of all of these platforms, the one that could feel the most pressure is the Intel/Hewlett-Packard Itanium processor, which is at its heart a mix of proprietary, RISC, and X86 features all woven into one architecture.

The Itanium architecture, you will remember, was supposed to take over the server and workstation markets. It was supposed to out-RISC the RISC platforms, which existed precisely because the Unix vendors could count on reasonable high sales of workstations--about 600,000 units a year in the late 1990s--to cover the engineering and manufacturing costs of their RISC processors, which were sold in the tens or hundreds of thousands of units inside their RISC/Unix server platforms. With Itanium, Intel had originally expected that the transition from 32-bit X86 processors on the desktop would jump the gap to 64-bit Itanium processors, leaving the market with a single platform spanning from desktops through workstations and on up to the largest servers.

But, as any new architecture would have to be, Itanium ended up being incompatible with all of the platforms it was trying to span, which would not have necessarily been punishing if Itanium had come to market on time and performed well. The IT community of vendors, users, and investors does not have a lot of patience for delays or poor performance, either technically or economically. So rather than unifying the markets, Itanium has been a divider of the market, not a uniter. This is all the more shocking when you remember that Itanium had the support of Sun Microsystems, IBM, Compaq, NEC, Unisys, and other key system providers back in 1996 and 1997--they were all excitedly behind the project. There is no question that the current "Madison" generation of Itaniums are fulfilling the performance promises that Intel made years ago. But it is an uphill battle to get Itanium into mainstream products, and the advent of X64 alternatives for running X86 applications in a 64-bit mode only made the jump to Itanium somewhat harder for many customers--but certainly not all--to justify. There are many good things about the Itanium chip, despite it high prices and high wattage, including the scalability and reliability of the chip and the platforms that use it, and its competitive bang for the buck, which is better--that's right, better--than 32-bit X86 platforms for most OLTP, data warehousing, and some scientific workloads. But, to put it bluntly, software comes slowly to new architectures, and even though Itanium processors can run 32-bit X86 code at roughly a 50 percent performance penalty, a 64-bit X64 platform can run that same 32-bit code on a 64-bit operating system with a 10 to 15 percent performance gain--without changing any code.

Think about that for a second. What would you do? This is why AMD jumped at the chance and created the X86-64 memory extensions once Intel had made it clear that Itanium was its 64-bit path and that the clock was ticking on the 32-bit Xeons. AMD shot the gap between those two products--allowing 64-bit memory extensions with a true 32-bit compatibility mode--and eventually, with the "Prescott" Pentium 4 core (which is at the heart of the Xeons) launched in February 2004, also known as Project Yamhill, Intel blinked and did the 64-bit extensions AMD's way in the Pentium and Xeon chips.

For those vendors and customers who have made a long-term commitment to Itanium, it seems clear that Intel and HP are in it for the long haul. It's a safe bet between now and 2009 or so, unless something really crazy happens in the IT market. Software vendors will slowly roll out support for Itanium platforms, and new platform vendors will bring new Itanium-based servers to market. But there will be no volume Itanium workstations (excepting Silicon Graphic's latest pass at the idea), and Itanium will not be the 64-bit volume platform. The X64 platform will be, and it will be the volume player for both servers and workstations--just like Itanium was supposed to be. Intel and HP can wait forever and a day, but as long as AMD is in business and is driving innovation with its Opteron processors, thereby forcing Intel to counter that innovation with its own 64-bit Pentium Xeon processors, the X64 platform, in one incarnation or the other, will be the volume chip for servers and workstations.

Intel knows this, and nothing made it more clear than the fact that Intel sold more than 2 million of its 64-bit Xeon DP processors in about nine months ending in March 2005. AMD shipped 65,0000 Opterons in 2003 (mostly in two-way servers) and probably around 300,000 to 400,000 in 2004 (if growth rates from early 2004 persisted). During those months, probably somewhere on the order of 9 to 10 million X86 and X64 processors were sold, giving the 64-bit X64 architecture roughly 20 to 25 percent market share among X86 and X64 platforms as a group. (Keep in mind, those are rough estimates.) Intel shipped 100,000 Itanium processors in 2003, and initially hoped it would double that to 200,000 shipments in 2004, but Intel has stopped talking about Itanium shipments and that probably means it sold a lot less than 200,000. By the way, 100,000 processors is a lot of processors for a midrange and high-end server line. So don't get the wrong idea.

That said, the ramp for Intel's 64-bit server and workstation Xeon chips is very impressive. Intel shipped its first million 64-bit Noconas in the six between June and December 2004, and then shipped another million in three months between January and March 2005. By the end of this quarter, Intel was expecting that 80 percent of its Xeon shipments would be for 64-bit capable chips, and now that the 64-bit "Potomac" and "Cranford" Xeon MP processors are starting their ramp, it will not be long until it 64-bit chips approach 90 percent of overall shipments and very close to 100 percent of shipments in new systems. Getting to 100 percent of shipments is tough, since there are millions of 32-bit Xeon servers that are only partially populated out there, and customers will still buy processor upgrades for these boxes.

X64 Features: Similar and Different

The X64 architecture is not one, but two different architectures that can run the same instruction set and therefore support the same code base. There are gross similarities in the architectures--there has to be because of the nature of chip process technology and what economic and technical forces make you do--but there are a number of really different things that Intel and AMD are putting into their X64 platforms.

The main features that define the evolving X64 platforms are 64-bit memory extensions, the use of multiple cores and simultaneous multithreading on chips, integrated instruction set virtualization, power management, chipsets, and raw performance.

Memory: The sudden and immediate compatibility of Intel's EM64T memory extensions in the Prescott core in early 2004 with the X86-64 instruction set at the heart of the AMD Opteron processors and introduced as a product in early 2003 after four years of development by AMD has tongues wagging. Some conspiracy theorists in the IT industry believe that Intel created X64 extensions to the X86 architecture perhaps in the mid-1990s, decided to do something more radical with the 64-bit "Merced" project, then got involved with Hewlett-Packard (which needed something even more radical to support PA-RISC workloads and which wanted out of the chip making business), created the Itanium EPIC architecture with HP and threw away the idea of extending X86 instructions. Thanks to the settling of lawsuits, AMD and Intel have broad and deep cross-licensing agreements, so it could be that AMD based X86-64 on existing Intel technology that it had access to and then Intel, returning the favor, had access to AMD's X86-64 and quickly ported to back to the Prescott cores. It is possible that these two companies created black-box variants of compatible 64-bit memory extensions, of course. It is also possible to win a lottery. Neither AMD nor Intel are saying exactly how and why the X86-64 and EM64T memory extensions are precisely compatible. It is clear that they have to be for the X64 platform to work.

Suffice it to say, both Athlon/Opteron and Pentium/Xeon chip architectures have 64-bit memory extensions that, at least as far as applications and operating systems are concerned, are identical.

Multiple Cores and Simultaneous Multithreading: With AMD just launching the dual-core variants of the Opteron 800 Series in April for shipments in early May for four-socket and larger servers, AMD has the jump on Intel when it comes to dual-core server chips--just as it had the jump on Intel when it came to 64-bit memory extensions. The dual-core Opteron 200 Series processors (for two-socket servers and workstations) will start shipping in early May from AMD and will be in systems in about late May or so. The 100 Series, for single-socket servers and workstations, will begin volume shipments in June and are expected to be in systems by July. These dual-core Opterons come in 1.8 GHz, 2 GHz, and 2.2 GHz clock speeds. In terms of performance, the top-end 2.2 GHz dual-core Opteron chips will have anywhere from 30 to 75 percent better performance compared to the 2.6 GHz single-core Opteron chip. The 1.8 GHz dual-core Opteron chip offers between 20 and 30 percent better performance than that 2.6 GHz single-core Opteron--and AMD will charge the same price for the 1.8 GHz dual-core chip, yielding a 20 to 30 percent price/performance improvement for customers at the chip level.

If you have been watching the Intel Xeon and Itanium roadmaps very carefully for the past several years, what seems obvious looking back is that Intel never planned to do dual-core Xeons in the current timeframe--if at all. When Intel did talk about dual-core processors, it talked about Itanium processors getting such technology, mainly because the Itanium core is smaller and because Intel wanted to push Itanium into servers and workstations. Then, once IBM's Power4 processors took off in the marketing arena in 1999 and 2000 and hit the actual market in 2001, and Intel started to think about it. By the end of 2002, when IBM was demonstrating a roadmap that would put its Power platforms well ahead of Itanium and other vendors--include Sun, HP, and Fujitsu--were working on dual-core RISC designs, Intel took a step back and started thinking about re-arranging its roadmaps. The fourth-generation "Montecito" Itanium chip, due at the end of 2005 for shipments in early 2006, was originally a single-core Itanium that was supposed to ship in 2004 if you look at the 2002 Intel roadmaps. Intel pushed Montecito out a year (more like 18 months by the time it gets here) and designated it as a dual-core chip. But we only started hearing about dual-core Xeons until recently, and for a while Intel was saying publicly that Itanium would always have a 2 to 1 performance advantage over Xeon chips. That wasn't exclusively because of memory bit-ness, but rather, it seems, because Intel intended to always have twice the number of cores on an Itanium than it offered on Xeons. Xeons were to get simultaneous multithreading, which Intel calls HyperThreading and which provides two virtual threads on a single core. Itanium was to get dual cores and no HyperThreading.

With the "Dempsey" Xeons due later this year alongside the Montecito Itaniums, Intel has stopped messing around. Montecito has been retrofitted with hyperthreading along side dual cores. Intel expects that by the time the company exits 2006, 70 percent of its mobile processor shipments will be for chips with at least two cores per socket, with a similar penetration on desktop processors. And in the server space, Intel expects that more than 85 percent of machines will have more than one core per socket as 2006 comes to a close.

Interestingly, AMD says that back in 1999, the Opteron architecture was created with multiple cores in mind and that it has always been part of its plans. The company has never considered adding simultaneous multithreading to the Opteron processors, and does not seem to be inclined to start doing it now. But, stranger things have happened--such as Intel launching 64-bit Xeons, for instance.

How far Intel and AMD will push the multiple core concept remains to be seen. IBM says that getting two cores on a chip to work well is difficult, but doable, and that four-cores on a chip presents some pretty big problems. Chip makers that are cramming many cores on a single chip--Sun's "Niagara" and "Rock" future Sparc processors, which have eight four-threaded simplified Sparc cores, or Azul System's Java co-processors, which have 24 home-grown "Vega" cores designed specifically and only to run virtual machines--are doing so with very precise uses for these multicore designs. Doing more than four cores may be very difficult for general purpose processors, especially the large and complex Xeon and Itanium cores with many cache levels and large caches.


For all the talk about how Moore's Law has been saved by changing the focus from increasing clock speed with each chip process shrink to adding multiple cores on a chip at the same thermal envelope, there are limits to how many cores can be put on a die. There are limits to how many threads need to be active in multitasking environments, too. Four cores plus four virtual cores from HyperThreading in a single socket is a lot of threads for a desktop PC, and it is four times as many threads as a two-socket, Xeon DP workstation or entry server has today. Put such eight-threaded chips in a two-way or four-way SMP configuration, you get 16 or 32 threads in a single system. This is a lot of threads for a so-called entry system.

Integrated Instruction Set Virtualization: For the past five years or so, RISC/Unix platforms have included some form of hardware-assisted virtualization, using either virtual or logical partitions riding on top of a hypervisor layer that abstracts the processor instruction set such that virtual machine partitions equipped with their own operating systems think they are running a whole machine even though they are getting only a slice of it.

With future Xeon and Opteron processors, Intel and AMD are introducing hardware-assisted instruction set virtualization to make virtualization run more smoothly and without consuming as much resources as it does today.

There are limits to what Intel and AMD can do with virtualization on the chip, however, with current chip process technologies. The virtualization features that come with Intel's Virtualization Technology or AMD's "Pacifica" technology, due respectively in the "Montecito" Itaniums and future Xeons from Intel and in future Opteron processors from AMD, are only implementing instruction set virtualization in the chip rather than in VMware's ESX Server hypervisor, Microsoft's Virtual Server 2005 hypervisor, or the open source Xen hypervisor. However, to make a virtualized workstation or server environment, you have to virtualize memory--carving up a gob of main memory into pieces for each virtual machine and making sure that virtualized servers share memory for common functions so they use memory efficiently. Similarly, the virtualization software also has to do I/O virtualization, providing disk and network I/O access for each partition. These last two features are not going to be embedded in processors for a long time--perhaps years. They will be embedded in systems eventually, however, in some form. It is the nature of the IT industry to do this wherever possible. It is a question of transistor counts and standardization.

Power Management: Intel's latest Nocona Xeon DP variants, dubbed the "Irwindale" chips, and the new Potomac and Cranford Xeon MPs include the new Demand Based Switching (DBS) and SpeedStep power management features that Intel perfected in its laptop processors and is now moving into its server and workstation chips. The DBS and SpeedStep features of the Xeons senses how much load is on the processors, and if applications are not consuming a lot of CPU cycles, they scale back the clock speed and voltage of the processors to meet the processing needs of the workload. By cutting back cycles and volts, a server can consume 24 percent power as workloads decrease. In racks or blade chassis full of servers, this is a big savings in power consumption.

Montecito Itaniums will include another power management technology called "Foxton," which boosts the clock speed on the Itanium chip when the workload demands it and the server can take the heat; Montecito will also include DBS power management features. Presumably, Foxton will make its way to Xeons as well.

Since the summer of 2004, all Opterons have had AMD's PowerNow power management features, so it was already ahead of Intel on this front and it did not need to add these features to the new dual-core Opterons.

Chipsets: Intel has just launched the "Truland" platform, which is the name of the platform that will be comprised of Intel's E8500 chipsets and the single-core Potomac 8 MB cache and Cranford 1 MB cache Xeon MP processors. The Truland platform was originally expected by the end of 2004, and it completes the rollout of the EM64T memory extensions to the Xeon MP server line. The Truland platform, like the platform based on the "Irwindale" Xeon DPs that were announced earlier this year, include 64-bit extensions, PCI Express I/O, DDR2 main memory, Execute Disable (XD) security, error correction on the system bus, memory, RAID, and PCI Express buses, and Demand-Based Switching (DBS) power management features. The Irwindale Xeon DPs have 2 MB of on-chip cache. The Truland platform has a 667 MHz double-pumped front side bus and will support DDR2-400 main memory.

At the end of 2005, Intel will roll out a kicker to the Truland platform that will include the "Tulsa" large cache and "Paxville" small cache dual-core 64-bit Xeon MPs, which will have two cores per processor. The company expects that systems based on these dual-core Xeon MPs will be available in the first quarter of 2006. On about the same schedule--release from Intel at the end of 2005, with initial product shipments in the first quarter of 2006--Intel expects to roll out the "Bensley" Xeon DP platform. The Bensley platform will include the "Dempsey" dual-core Xeon DP--which is really two "Nocona" Xeon 1MB chips implemented in 65 nanometer chip technologies put into a single MCM that plugs into a single Xeon socket. The Bensley platform will use a chipset codenamed "Blackford," which will come in a flavor for regular two-socket servers as well as a version called Blackford-VS for so-called "value" servers.

The Bensley platform includes support for fully buffered DIMM memory--which increases memory bandwidth and overall system performance, particularly for multicore designs--as well as a new I/O feature called I/O Acceleration Technology (I/OAT), which speeds up the performance of TCP/IP and other CPU-related messaging technologies by making tweaks in the CPU, in the chipset, in software compilers, and in operating systems. Intel is advancing I/OAT as a better alternative than having dedicated TCP/IP engines in a system. The Bensley platform will also be the first server to feature Virtualization Technology, the hardware-assisted virtual machine partitioning discussed above.

By the way, Intel is not doing a lot of work with its Itanium chipsets, and the future Itaniums will use the same E8870 chipset it currents sells. Hewlett-Packard, Fujitsu-Siemens, Hitachi, NEC, and Unisys all sell high-end Itanium chipsets, however.

On the Opteron front, AMD's 8000 series of chipsets have enabled scalability from two-sockets to eight-sockets from the get-go. Newisys has delivered two-way and four-way SMP for Opterons with its own chipset (and resold by Sun Microsystems in its V20z and V40z and by Verari Systems as well), and is still working on its 32-way "Horus" chipset. HP is selling the 8000 Series chipsets in its ProLiant servers, as are a number of smaller server makers. A number of other motherboard makers, such as Tyan and VIA, have created Opteron chipsets as well.

Performance: For the moment, the performance figures for the new 64-bit Xeon MPs and the dual-core Opteron 800 Series are a little thin. A four-way IBM xSeries 366 server using the "Cranford" Xeon MP and IBM's own "Hurricane" X3 chipset will be able to push about 150,704 transactions per minute (TPM) on the TPC-C online transaction processing benchmark test at a cost of $6 per TPM. The server was running the 64-bit version of Windows Server 2003 and DB2 8.2, which also supports 64-bits.

The IBM chipset is not using the large-cache 64-bit Potomac Xeon MPs, but has rather opted to create a chipset that can make use of the much less costly 64-bit Cranford low-cache Xeon MPs. In this case, IBM is using the 3.66 GHz Cranfords. Last year, IBM has posted a TPC-C online transaction processing benchmark result of 102,667 transactions per minute on a four-way xSeries 365 using the Summit-II chipset and the 3 GHz "Gallatin" Xeon MPs. With the Hurricane chipset, IBM has designed out the need for the L4 cache, and presumably in the Cranford boxes will not even include L3 cache. There is support for something IBM is calling a virtual L4 cache, which is probably a chunk of main memory carved out to act like a cache, just as we used to do back in the old days of the PC. That's a pretty big performance bump.

But not quite as impressive as the move to dual-core Opterons in a four-way server. HP has just run its four-socket DL585 system through the paces, and a box using the dual-core 2.2 GHz Opteron 875 processors was able to handle an incredible 187,296 TPM at a cost of just over $2 per TPM. The HP box was running the 64-bit versions of Windows Server 2003 and SQL Server 2000.

To put it bluntly, the dual-core Opterons just smoked the single core Xeon MPs in terms of performance and bang for the buck. Intel will come back swinging as hard as it can in early 2006 with its own dual-core Xeon MPs--you can count on that.

The Net Effect of the X64 Platform

In 1997, before the dot-com boom got roaring, the whole world consumed about 2.5 million servers in an entire year, about a third the number of servers that will be sold in 2005. Intel takes a lot of hit points for Itanium and for letting AMD get the upper hand with Opteron, but Intel is mostly to thank for the widespread adoption of 32-bit Pentium and Xeon servers, which have expanded the server market and have made servers an affordable tool for many more businesses than the RISC/Unix crowd ever did with more than a decade of intense engineering and cut-throat marketing. Intel deserves a lot of credit, and it has profited handsomely from its strategy.

The good news is that the evolving X64 platform being driven by both AMD and Intel is going to engender a more sophisticated kind of computing for the masses than they have seen before, one that used to be only available in mainframes or high-end RISC/Unix and proprietary servers. It will be interesting to see how the interplay of multiple cores, virtualization, and server scalability all play out. The market could expand in terms of the chips shipped, with more companies being brought into more modern platforms, or it could contract as companies consolidate workloads onto smaller boxes, essentially keeping their thread counts the same and using virtualization to get by with fewer chips to do the same work.

Sponsored By
CALIFORNIA DIGITAL

Expertise in Deploying Massively Parallel Clusters

· Comprehensive Software and Services

· Leadership Contribution to Open Source

· DejaVu Transparent Fault Tolerance for Parallel Clusters

· DQ, Perfect Preemptive Scheduling on Parallel Clusters

For More Information: www.californiadigital.com


Editor: Timothy Prickett Morgan
Contributing Editors: Dan Burger, Joe Hertvik, Kevin Vandever,
Shannon O'Donnell, Victor Rozek, Hesh Wiener, Alex Woodie
Publisher and Advertising Director: Jenny Thomas
Advertising Sales Representative: Kim Reed
Contact the Editors: To contact anyone on the IT Jungle Team
Go to our contacts page and send us a message.


THIS ISSUE
SPONSORED BY:

Stalker Software
California Digital
ShaoLin Microsystems
Arkeia
Micro Focus


The Linux Beacon

BACK ISSUES

TABLE OF
CONTENTS
Former SUSE CEO Seibt Leaves Novell

Battle of the X64 Platforms

Palamida Offers IP Tracking for Open, Closed Source Apps

Sun Expands N1 Systems Management Programs

But Wait, There's More


The Four Hundred
iSeries SNA Software Support Continues with Enterprise Extender

IBS to Port OS/400 Apps to Unix, Windows, and Linux

IBM to Cut Up to 13,000 Employees, Mostly in Europe

As I See It: IT, the Early Days

The Windows Observer
Microsoft Puts X64 Windows to the Dog Food Test

Server Sales Drive Revenue Increase for Microsoft

Dell and Symantec Launch Windows Patch Management Tools

Mad Dog 21/21: The Princess and IP

The Unix Guardian
Solaris 10 Tops 1.3 Million Downloads, Gets Oracle 10g Support

Sun Plugs the Grid Some More, Adds Some Features

Sun Expands N1 Systems Management Programs

Sun Puts JES Release 3 Middleware Out and Through the Paces


Copyright © 1996-2008 Guild Companies, Inc. All Rights Reserved.
Guild Companies, Inc. (formerly Midrange Server), 50 Park Terrace East, Suite 8F, New York, NY 10034
Privacy Statement