Power vs. Nehalem: Scalability Is So 1995, Cash is So 2009
April 6, 2009 Timothy Prickett Morgan
The entry and midrange server racket has a new processor in town, and its name is Nehalem, and it comes from Intel. And despite the poor global economy and despite the nay-sayers who think that people don’t think about processors and their features any more, the Nehalem chip is going to get traction this year and it is most certainly going to shake up the entry and midrange spaces where the Power Systems i platform plays.
Let’s play some Intel code-name bingo first, just because it is fun. Technically, the chip that Intel announced last week has Nehalem as the name of its chip microarchitecture (what Intel means when it describes instruction sets implemented in the chip). Its family code-name is “Nehalem EP,” with the EP being short for Enhanced Platform and the Nehalem part coming from river on the Oregon coast that is actually the word in Hebrew for “river” and from a town on that river in that state. (Yes, I know, it is really silly to name a city “River,” but I can understand “Riverside.”) The particular sub-family of chips that Intel announced last week was for two-socket servers and was code-named “Gainestown,” and no one is sure what the reference is, but it probably is not the Gainestown in Alabama. Although they didn’t get much press last week, Intel also announced a set of Nehalem UP chips, for uniprocessor machines, code-named “Bloomfield,” which are also going to give entry server makers a run.
Here’s the thing about the Nehalem chips that you need to remember: These are the X64 chips that Intel should have put into the field in 2001 or 2002, and if it had been truly forward thinking (as IBM has been with its Power chip designs), the server business and the way many of us do infrastructure would be very different today. Advanced Micro Devices would not have had years of the buildup to its “SledgeHammer” Opteron processor launch and then several years of market share gains in the sever space while Intel figured out that system builders (and therefore their customers) wanted an X64 chip that was compatible with their 32-bit X86 processors but with 64-bit memory extensions, as well as a chip architecture that enables multicore designs while at the same time delivering great bandwidth–as AMD’s HyperTransport interconnect for CPU cores, memory, and I/O does for Opterons, and as the QuickPath Interconnect does for Nehalems. It is a pity that QPI is not a clone of HyperTransport and isn’t socket compatible with Opterons, but it looks to be as close as Intel could stomach making it without being accused by AMD’s lawyers of coping off the SledgeHammer homework.
That’s all water under the bridge now, of course. Intel took its lumps and because IBM had pretty respectable single- and two-socket Power servers, and Hewlett-Packard could put out two-socket Itanium servers that didn’t look too pricey, the proprietary and Unix midrange could continue to compete against Intel-based iron quite well. Even Sun Microsystems has been able to make a fairly strong case on some workloads for its relatively pricey “Niagara” servers, which use the multicore, multithreaded Sparc T series processors.
Last week, Pat Gelsinger, general manager of Intel’s Digital Enterprise Group and probably the lead contender for the position of chief information officer several years hence, said that this announcement was as significant as the launch of the Pentium Pro processor back in 1995. The Pentium Pro was the first chip from Intel that had symmetric multiprocessing native to the chip, and it laid the groundwork for Intel creating what is now called “industry standard servers” or “the volume server market” by people who like or sell X86 and X64 servers. There is nothing standard about X86 or X64 processors, except that there have been multiple suppliers of clones for many years. Intel sets the standard, not the industry. Anyway, the Pentium Pro eventually became the Xeon DP, and eventually Intel and its partners (remember ServerWorks?) created chipsets that could glue four Xeon MP chips into a single system image. And companies like IBM, Unisys, and Compaq made machines that could take multiple Xeon MP system boards and create even larger machines, with as many as 16 or 32 processor sockets.
You will get a chuckle out of how long this innovation has taken Intel. IBM announced the two-way AS/400 D80 back in April 1991, and quickly followed up with three-way E90s and four-way E95s in early February 1992. By 1995, when Intel was getting ready for the Pentium Pro, IBM had ported from 48-bit CISC processors (I think a variant of the Motorola 68K, but I could never prove it) to 64-bit PowerPC AS RISC processors. And by 1997, when Intel was not even close to getting 32-bit four-sockets Xeon MPs out the door, IBM had skipped its eight-way, 64-bit AS/400s and RS/6000s and moved straight up to 12-way machines. Intel didn’t get a decent four-way out until 2002, and you could argue it wasn’t until 2003.
And on and on. IBM has always beat Intel at this scalability game. Hooray!
The trouble is that scalability has not been an issue for midrange customers with modest numbers of end users for maybe a decade. You know it, and I know it. A decade ago, IBM sold a mix of small, medium, and large Power-based bearing the AS/400 and iSeries brands, but now, the vast majority of Power Systems i shipments–I am talking about 95 percent of shipments–are for smaller 520-class machines with one or maybe two cores activated, a couple of gigabytes of memory, and a handful of disk drives. To be sure, IBM makes plenty of i-related revenues from customers who use Power 570 or Power 595 boxes, and this is good business. But you have to engineer the Power Systems to compete with X64 machines at SMB shops. It has to compete in terms of iron, software, and pricing.
Now that Intel has ditched the front side bus (FSB) architecture of the Xeons and replaced it with QPI, the Xeon doesn’t have bandwidth constraints and has a much more balanced performance. Moreover, Intel has woven a lot of goodies into the Nehalem chips to manage power consumption, boost performance, and support virtualization, the gap between X64 and Power chips and their resulting entry and midrange systems has closed considerably.
The Nehalem EP processor is a four-core chip that employs Intel’s implementation of simultaneous multithreading (SMT), called HyperThreading, that lets each processor core present two virtual execution threads to the operating system. Intel didn’t invent SMT–supercomputer makers did–but IBM used an early version of SMT back in the “Northstar” family of PowerPC chips for those 12-way AS/400 and RS/6000 machines from 1997, dropped it for Power4 chips when it went with a dual-core chip design in 2001, and then added it back in with the Power5 and goosed it further with Power6. Intel only did HyperThreading at first because it did not have Xeon designs with multiple cores, and like IBM, at first it used multiple cores with real threads instead of single cores with virtual threads. But as chip-making processes shrink every two years, both companies could put enough transistors on a die so they can do SMT and multiple cores at the same time.
According to Gelsinger, the Nehalem chip has 731 million transistors, and it is created using Intel’s 45 nanometer Hi-K process, the top-of-the-line process Intel has in production right now. The Nehalem EP chip has four cores, each one with 32 KB of L1 data cache, 32 KB of L1 instruction cache, and 256 KB of L2 cache. The chip has 8 MB of L3 cache on the die, and main memory controllers also on the chip that support three DDR3 memory channels. As IBM has done with its Power processors, Intel is taking parts where some of the cores or cache don’t work right and gearing them down for low-end workloads. There is a dual-core Xeon 5502 that runs at only 1.86 GHz that only has 4 MB of L3 cache, and the E5506 and E5504 chips are quad-core variants that run at 2.13 GHz and 2 GHz, respectively, but only have 4 MB of L3 cache. None of these chips have HyperThreading, either, so the core count is the thread count and the performance per GHz is lower–and hence, so is the price. The full-bore Nehalem EP chips have HyperThreading enabled as well as a new feature called Turbo Boost, which allows a core to rev up a few hundred megahertz if some of the cores in the chip are not doing heavy workloads or are shut down completely. These fully capable Nehalem EPs run at between 2.26 GHz and 2.5 GHz in an 80 watt power envelope, and from 2.66 GHz up to 2.93 GHz at 95 watts. The fastest Nehalem EP, the W5580, runs at 3.2 GHz and has a ridiculous 130 watt power rating. Prices for the chips, in 1,000-unit quantities, run from $188 to $1,600. There are also low-power and embedded versions of the chips, but these will not be used in mainstream rack or blade servers that are most comparable to the Power Systems i platform.
In his briefing with analysts and journalists last week, Gelsinger took pot shots at Sun’s Sparc T2 and IBM’s Power 570 platforms as the ones that Nehalem EP machines would compete with. The Sparc T2 comparison was fair enough, considering there are two-socket and four-socket versions of these most recent Niagara platforms, but the comparison to the Power 570 seemed a bit disingenuous. The Power 570 will more correctly be compared to the future “Nehalem EX” four, six, and eight-core processor and its systems, due later this year or early next. Anyway, Gelsinger said that on a suite of four benchmarks, a Nehalem machine was half the price of a Sparc T2 box and delivered 1.71 times the performance, on average, across those benchmarks. And for the Power6-based Power 570, Gelsinger said it would cost 10 times as much to buy a Power 570 and that the Nehalem machine would deliver 2.45 times the performance on those four tests, on average.
“Comparing to the IBM Power environment, it is almost humorous,” Gelsinger quipped.
Not if you are a Power Systems reseller, particularly if you are pushing i 6.1 at midrange shops.
Vendors are always talking like this, of course. But over the next few weeks, I am going to be trying to get my hands on enough pricing information for the Power 520 and Power 550 machines and the JS12 and JS22 blades to make fair and honest comparisons to rack, tower, and blade servers based on the Nehalem EPs. I suspect that Intel has closed the performance gap substantially, and that the doubling up of processor cores on Power Systems last October was IBM preparing to battle Nehalem EP and EX systems in 2009.
If IBM would put out a quad-core Power6+ chip–either by putting two Power6 chips running at, say, 3 GHz instead of 4 GHz or 5 GHz or by actually putting more cores on a single die–the company would be better able to compete against the claims Intel is making. Putting a true quad-core Power6+ chip in the field is obviously a more costly and therefore a less attractive option, but IBM is trying to get to eight cores per die with Power7 next year, so this would be a good interim step. IBM has been quiet as a mouse in a field with a dozen hawks overhead about the existence of Power6+ processors. But Power6+ chips were on the roadmap, as I told you last July and then again with more roadmaps in January of this year.
If IBM doubled up the cores in Power 520 and Power 550 servers with Power6+ chips, and then cut the price of i 6.1 per core in half (keeping the price per system therefore the same), this would go a long way toward making the Power Systems i platform competitive with Nehalem EP platforms.
Now would be a good time, IBM. But, of course, you’re too busy trying to buy Sun, or not, these days. And hopefully you will figure out, Big Blue, that you need to compete down there in the same SMB space that Intel is going after with Nehalem systems, and in this economy, the way to win that deal is with the best bang for the buck, not with lots of scalability headroom. SMBs are buying for today, and are spending as little money as possible.