X Marks The Spot
April 4, 2016 Timothy Prickett Morgan
Just because IBM sold off its System x X86 server business to Lenovo Group does not mean magically that X86 servers are no longer relevant to Power Systems customers. As far as we can tell, Big Blue’s exit from this business has had no measurable impact on the use of X86 iron at IBM i shops, although the brand of machines that IBM i customers might choose in their next X86 upgrade cycle might change if they had not already been using gear from Hewlett-Packard Enterprise or Dell, the dominant suppliers of X86 gear in the datacenter.
Last week, Intel launched its “Broadwell” Xeon E5 v4 processors, which are used in the two-socket servers that are the main compute element in the datacenters and data closets of the world and which are the primary target for IBM’s efforts as it tries to make its Power8 and follow-on chips more pervasive. The Xeon family of chips account for about 99 percent of server shipments worldwide and about 80 percent of revenue, and the Xeon E5 devices are by far the most popular of Intel’s processors, probably accounting for 85 percent of its shipments and maybe about the same slice of its CPU revenue in the datacenter.
With the Broadwell family of Xeon chips, which already includes low-end Xeon D processors for single-socket microservers favored in some cases by hyperscalers like Facebook and which will this year also include high-end Xeon E7 processors that are used as big database and analytics engines in machines with four sockets or more, Intel is moving the product line to its 14 nanometer wafer baking processes. As with every process shrink, Intel has taken the opportunity to make some tweaks to the processor microarchitecture to goose performance a little while at the same time adding more processing cores and L3 cache to the chip to boost its throughput.
IBM will upgrade to its Power8+ chips sometime this year, and historically with the Power4+, Power5+, Power6+, and Power7+ chips, IBM has done a process shrink on the transistors, so logically it makes sense that the Power8+ would do so as well. But we are uncertain if Power8+ includes a process shrink, so the performance gains from the jump from Power8 to Power8+ could be relatively small. We do expect microarchitecture enhancements with the Power8+ as well as the integration of Nvidia‘s NVLink proprietary interconnect to linking GPUs to each other and to processors to be added to the Power8+ complex. The Power8 chips were made using IBM’s homegrown 22 nanometer processes, which are now controlled by Globalfoundries, IBM’s chip fab partner after it sold off the Microelectronics division to the former Advanced Micro Devices foundry operation in late 2014. We do not know for sure if Globalfoundries has its 14 nanometer processes up to speed, but if by hard work and good luck the fab can make Power8+ chips using 14 nanometer transistors, now would be a good time to do so. Intel is keeping the heat on in the datacenter, and if IBM wants the Power chip to compete, it has to stand toe-to-toe on chip manufacturing and has to excel at chip design. IBM and its OpenPower partners have to show the same kind of performance and throughput increases that Intel is offering with the Xeon E5s and Xeon E7s. It is that simple.
As soon as we know what the plan is for Power8+, we will surely tell you. Right now the roadmaps are vague and the chatter is convoluted.
What I can tell you is that despite its difficulties in ramping up its own 14 nanometer processes, Intel is getting sufficiently good yield on them now that it can move the process from relatively small Core processors used with PCs, laptops, and tablets to the heftier processors used in servers. The Broadwell Xeon E5s are actually not one chip, but three distinct chips, and they offer a range of performance from 4 to 22 cores on a single die and, generally speaking, somewhere between 20 percent and 30 percent more throughput than their predecessors, the “Haswell” Xeon E5 v3 chips that launched in September 2014 and that are the dominant chips that Intel has sold more the past year or so.
If you want to get all of the feeds and speeds on the new Broadwell Xeon E5s, I have covered them in detail over at The Next Platform. In this story, I will make some general observations about the Broadwell Xeons as they relate to the market for entry and midrange iron and also talk a bit about how Intel sees the competitive landscape with regard to Power Systems now that the Broadwell Xeon E5 processors are in the field.
The cores in the Broadwell chips are essentially the same as in the Haswell chips, but there are a slew of nips and tucks that make the systems that use the Broadwells better able to support server virtualization, do complex mathematical calculations more efficiently (both in terms of clock cycles used and heat produced), and provide security and isolation for multiple workloads running on the machines. Perhaps more significantly, according to the executives that I have talked to at Intel, Hewlett-Packard Enterprise, and Dell, it looks like the resulting systems will have a nominal price increase–if any at all–of a few percent and deliver that extra performance. So the bang for the buck on X86 systems running Windows and Linux systems is going to improve a bit, putting even more pressure on existing Power8 machines and future Power8+ ones.
The thing that is interesting about the latest decade of Xeon processor development is the level of sophistication that the chips have. These processors are every bit as sophisticated as anything IBM has ever put out, and they demonstrate more than anything the significant prowess that Intel has in leveraging its manufacturing edge to keep the Xeons on a growth curve for both single-threaded and multicore throughput. But, generally speaking, the price per core is kind of flattening out in recent years for the belly of the Xeon E5 line, and that means there is a chance for IBM to come in and be very aggressive with Power chips if it wants to win deals. And with Intel relaxing the pace of chip process ramps a bit, moving from a two-step “tick-tock” rhythm where it first advances a chip process with a tweaked microarchitecture and then in the next step rolls out a new microarchitecture on an established process (thereby mitigating some risk) to one where it will do a “tick-tock-tock” and have three generations of chips on a process with two microarchitecture tweaks, there is a chance for IBM to play catch up and actually stay there.
I said there was a chance. IBM has to want it, though, and Globalfoundries has to deliver it.
The transformation of Intel’s server chips since the Great Recession–which coincided with the implosion of AMD’s Opteron line from self-inflicted wounds as much as Intel basically cloning some of the best ideas in the Opterons with the “Nehalem” Xeons launched in March 2009–is remarkable, and it is no wonder that its Data Center Group, which sells chips, chipsets, motherboards, and other components for servers, switches, and storage, had $16 billion in sales in 2015 and had an operating profit of $7.8 billion. That is the kind of revenues that IBM was generating just from the System/390 mainframe business in the late 1980s, just to give you some perspective.
Intel has leveraged its tick-tock method to keep the raw throughput of the Xeon core going up for both integer and floating point workloads as well as expanding out the number of cores on the die to get the overall throughput of a chip up very high. It would be far better for everyone–particularly software engineers–if we could have suspended some of the laws of thermodynamics and put out 10 GHz and 20 GHz processors by now, but we have had to go parallel to make use of the ever-increasing number of transistors that Moore’s Law, thus far have enabled. IBM has done much the same between 2001, when the dual-core Power4 chip launched, and 2014, when the 12-core Power8s were rolled out completely across the Power Systems line. Intel got its Xeon architecture act together in 2009 with the Nehalem Xeon 5500s, putting four cores on the die and a sophisticated interconnect for linking multiple CPUs together as well as a new L3 caching structure that made the chips much more powerful and, more importantly, gave Intel an architecture that could scale.
Starting from the Xeon 5400 processors and their “Penryn” core designs as a baseline, since 2009 Intel has increased the single-threaded performance of its cores by around 45 percent from myriad tweaks to the instruction stream and caching structure of the processors up through the Broadwell cores that came out last week. The Nehalem bump was the largest, at 12 percent, followed by the “Sandy Bridge” and “Haswell” bumps, at 10 percent and 10.5 percent, respectively. The tweaks in the Broadwell cores yield about 5.5 percent more oomph on workloads, and it is expected that the “Skylake” Xeons due in 2017 will have another big bump in instructions per clock, or IPC, in the chip lingo.
The core counts have also kept pace, with the Nehalem Xeon 5500 at four cores, the “Westmere” Xeon 5600 at six cores, the Sandy Bridge Xeon E5 v1 at eight cores, the Ivy Bridge Xeon E5 v2 at 12 cores, the Haswell Xeon E5 v3 at 18 cores, and the Broadwell Xeon E5 v4 at 22 cores. (The biggest Broadwell die actually has 24 cores on it but to improve yields Intel only counted on 22 of them being bug free. If by chance there are some 23 core or 24 core parts that work, hyperscalers snap them up like a trout on mayflies.) So call it a factor of 1.5X improvement in IPC over the past seven years and a factor of 5.5X improvement in parallel throughput. Clock speeds have come down a bit, so the performance of the Xeon line has probably only gone up by maybe a factor of 6X or so.
By my calculations, the performance of the top-bin 22 core Broadwell Xeon E5 v4 chip running at 2.2 GHz with 55 MB of L3 cache is 6.34X that of a baseline Nehalem Xeon E5540 with four cores running at 2.53 GHz and 8 MB of L3 cache. That Nehalem E5540 cost $744. Intel did not provide pricing on this top-bin part, but we estimate it is around $4,100 when purchased in 1,000-unit trays at list price, yielding a price of around $650 per unit of relative performance normalized to that Xeon E5540. The less capacious chips in the Broadwell Xeon E5 line offer somewhere between 3.5X and 5X the performance of the baseline Nehalem E5540 I selected as a benchmark, and they provide that capacity at somewhere between 30 percent and 50 percent off the cost per unit of capacity. (I have not adjusted those dollars for inflation.)
We do not have pricing for bare, merchant Power processors to do a similar comparison for IBM’s Power6 through Power8 generations, which cover roughly the same timeframe as the most recent Xeons.
Intel’s rise is what happens when engineering meets volume economics. And frankly, the price/performance would be even more aggressive if AMD had been able to keep the heat on Intel more in the past seven years. Sparc is not much of a threat to the Xeon, and so far neither are the various ARM server chips that are emerging. And while Power chips are in a sense something of a threat, IBM will have to work very hard and win some very big deals to get the 10 percent to 20 percent share of the server market that is its stated goal. It is very hard to see how that might happen, given the hegemony of Intel in the datacenter these days. But, stranger things have happened. Like Intel jumping from the desktop to take over the datacenter, for instance.
Now, it is time for IBM and its OpenPower partners to bring the fight to Intel, and the datacenter has a big red X on it showing who and what the target is. And this will be a discussion I will be having with IBM and those partners at next week’s OpenPower Summit.