Chips Sliding Away
May 26, 2009 Timothy Prickett Morgan
When you look at a photograph of a microprocessor die, doesn’t it always look like a city of some kind where electrons live and work? The L1 and L2 cache memories are perhaps where the electrons live, and they work in the central processing units and floating point units, commuting back and forth as they form the signals that become zeros and ones, like a human wave at a football stadium. Main memory is where the electrons go for weekends, and disk storage is where they go for long vacations.
It is perhaps not a coincidence, then, that when people refer to the design of buildings and their cities and the design of processors and their systems that both are referred to as architecture. The links between towns and between families of processors are called roadmaps, and as many of you know, the move from one processor and its system to another can be as traumatic as picking up and moving to a new town.
Perhaps because I live in New York City, and because I have flown over it and looked down on it, in a kind of quiet reverie that is the same as I feel at the top of a mountain or inside a great cathedral, I have a sense of the complexity that comprises cities, something I really didn’t have an appreciation of when I grew up in the country. The lines might be curvier in the country, but human landscape is in many ways less cluttered and less complex. It is easier to understand, easier to model. This is an illusion, I am sure, but I am attracted to the complexity I can see in a city because it compels human interaction. You can’t avoid people, but ironically, in a city you are more anonymous than you ever are in a small town.
Both cities and chips are beautiful and static and orderly when looked at from on high. But let me assure you, cities are messy creations and, as you can tell from the tumultuous history of computing, microprocessors are messy as well, particularly when you are talking about the process of defining a chip and bringing it to market.
Server chip delays seem to be all the rage these days, although one chip maker has recently pulled its latest chip forward to deliver it early. And the reason–and one that people don’t have a lot of compassion about–is that chip makers are always pushing the envelope on features and architectures, which they need to do to compete, and in doing so they add so much complexity to their systems that they don’t end up working as planned. This forces tweaks, if not outright redesigns, of processors, which causes minor or major delays in the rollout of incremental processor performance. Basically, they get greedy and they build too fast, and they do so because features make money and time also costs money. There is a tremendous amount of pressure on chip makers, although I think it is fair to say that Moore’s Law has been stretched out as each successive generation of chips gets more complex, more and more functions of the silicon are integrated onto the chip, and each new chip-making process gets increasingly expensive to bring to market. (We’re talking about a $1 billion investment to get a 90 nanometer chip and process to manufacture it to market, and something like $3 billion to get to 65 nanometer processes.) This is not exactly a breather, mind you, even if Moore’s Law is stretching out the doubling in processor performance to between 24 and 36 months across various architectures instead of 18 to 24 months in days gone by.
It was Intel‘s turn to sit in the chip delay hot seat last week just before the Memorial Day holiday got rolling, when it admitted that it was once again pushing out the delivery of its quad-core “Tukwila” Itanium processors. Intel gave some half-hearted and non-technical explanation of its latest delay.
“As you know, end users choose Itanium-based servers for their most mission-critical environments, where application scalability is paramount,” an Intel spokesperson told me in an email last Thursday. “During final system-level testing, we identified an opportunity to further enhance application scalability. As a result, the Tukwila processor will now ship to OEMs in Q1 2010. In addition to better meeting the needs of our current Itanium customers, this change will allow Tukwila systems a greater opportunity to gain share versus proprietary RISC solutions including SPARC and IBM POWER. Tukwila is tracking to 2X performance versus its predecessor chip. This change is about delivering even further application scalability for mission-critical workloads. IDC recently reported that Itanium continues to be the fastest-growing processor in the RISC/Mainframe market segment.”
That makes Tukwila about three years later coming to market, and I suspect it also makes Hewlett-Packard, which accounts for the vast majority of the approximate $5 billion in Itanium system sales and 400,000 processor shipments per year, hopping mad. But HP has tied its HP-UX, NonStop, and OpenVMS operating systems very tightly to the Itanium platform and its software ecosystems are in no mood to do application conversions. And with IBM owning Transitive’s QuickTransit emulation environment, HP and Intel have limited options unless they create an emulation environment of their own that can take code off RISC and mainframe systems and run it on Itanium or Xeon iron. Gordon Haff, an analyst at Illuminata, told me last week he figures that Intel generates somewhere between $1.5 billion and $2.5 billion in revenues from Itanium chip sales, and given that Intel needs to make investments in chip fabs for its Xeon server and Core desktop processors, throwing Itanium onto older and mature fab technologies doesn’t cost all that much. So Itanium is–and will continue to be–a viable chip and a saleable server platform for years to come, even if every other vendor drops it. (And many have over the years.)
IBM is by no means immune to collisions on the Power processor roadmap. While the dual-core Power4 processors came out right on time in October 2001 with the 180 nanometer processes, the shrink to 130 nanometer processes in 2002 and 2003 didn’t boost performance all that much because IBM ran into the same thermal wall that all chip makers hit. IBM ramped up those 130 nanometer processes to get Power5 chips out, but the performance increases were modest. The move to 90 nanometer chip making with the Power5+ chips in late 2005 and early 2006 didn’t boost clock speeds as much as Big Blue hoped, and to better compete with dual-core X64 processors coming out at the time from Advanced Micro Devices, the company went so far as to use the same trick HP invented in desperation with Itanium chips, which was to slow down the clock speeds on chips and put two chips into one socket. In HP’s case, the company was trying to get a quasi dual-core Itanium into the field to compete against IBM’s true dual-core chips, and IBM was putting two dual-core chips into a single socket–what it called a dual chip module or DCM at the time–to compete with low-end dual-core Opterons.
Now, Power6 was originally scheduled to appear in 2006, and had it come out on time, IBM would not have had to resort to such measures. And while I can’t prove it, I think Power5 was supposed to be on the 90 nanometer processes but something went wrong, pushing it out to Power5+. In any event, Power6 came out in July 2007, and only in two 570-class machines, and the converged Power Systems line didn’t get it until April 2008. The plan for Power6 was to have clock speeds between 4 GHz and 5 GHz and deliver a processor that had twice the oomph of the Power5 chip, which in turn had twice the oomph of the original Power4 chips. The Power6+ chips were supposed to have some kind of speed bump, something that would allow it to have about twice the performance of the Power6, according to roadmaps I have been over a bunch of times.
Well, that didn’t happen. As I told you back at the end of April, IBM actually slipped Power6+ chips into machines back in October 2008 (those Power 560 and 570 machines) and didn’t tell anyone. And the reason why, despite what IBM tells us, is because there was not significant performance boost moving from Power6 to Power6+, which was, in fact, the plan. My gut says that Power6+ was supposed to be a four-core chip using the 45 nanometer processes slated now for Power7, since it seems so crazy that IBM would be able to get clock speeds up to 7 GHz or 8 GHz to double performance. Power7 was supposed to be out in late 2008 or so, and now we are looking at early 2010 or perhaps later now for this chip, which is expected to be an eight-core processor with all kinds of adjunct co-processors added to the silicon.
These are the kinds of delays that plagued the RS/6000 line throughout the 1990s and that gave HP and Sun Microsystems free rein in the Unix space because they hit their chip roadmaps regularly, like IBM did in the early 2000s. In the late 1990s, when Sun’s UltraSparc-II processors were the dot in the dot-com boom, it was bragging about all the wonderful things that its kicker UltraSparc-III processors would be. I recall Sun talking about being able to deliver servers with 1,000 or more processors and a supercomputer-class, memory-speed interconnect called Wildfire. Well, the UltraSparc-III chips were 18 to 24 months late coming into full ramp, and that delay and a bad economy pushed out the UltraSparc-IV chips and killed off the “Millennium” UltraSparc-V processors. Then Sun put all of its eggs in its “Niagara” many-core, many-thread chip designs and tapped Fujitsu‘s dual-core Sparc64-VI chips as a stopgap until Niagara’s older brother, dubbed “Rock” and sporting 16 cores with 32-threads, came into the field in late 2007 or early 2008. Rock was pushed out to the second half of 2008 and Fujitsu brought out that dual-core Sparc64-VI chip late, which meant the Sun and Fujitsu installed bases were wide open to an all-out assault from Big Blue.
In February 2008, Sun pushed out Rock to the second half of 2009, and now that Oracle is buying Sun for $5.6 billion and there are rumors that Rock will never see the light of day, it is hard to say what is going to happen to Rock systems. Fujitsu and Sun are selling quad-core Sparc64-VII iron now, and Fujitsu is right now showing off the prototype of an eight-core “Venus” Sparc64-VIII processor. But that is not expected to be ready until 2010 or maybe 2011. Sun has bigger problems than Larry Ellison, trying to negotiate all of its chip delays and options. I would say that it could be worse and that Sun could have tied itself to Itanium, but the funny thing is, I am not sure if that would, in fact, be worse.
To AMD’s credit, it is expected to deliver its six-core “Instanbul” Opteron processors several months early, with shipments to server makers already starting now. (Of course, its quad-core “Barcelona” chips were delayed by about six months and then were buggy and really hurting AMD’s business.)
I could go on, and on, and on.
Here’s the real stickler: new processor generations usually mean big improvements in bang for the buck, and when chips are delayed, it is the same thing as holding back the price/performance curve and that means keeping prices higher than they might otherwise be. I think that performance increases that far outstrip workload growth are, therefore, something that all vendors want to avoid like the plague. Particularly in a down economy where end users are being fired and business is slower than it is in a boom time. While chips always have some technical issues–many of which can be worked around with microcode and operating system tweaks–I sometimes get the feeling that delays are a little too convenient for the vendor’s bottom lines, extracting profits from legacy customer bases. Of course, delays also compel people to jump architectures of the price/performance differentials get to be too high. As was the case when IBM got serious about AIX with the Power4 machines at about the time that the X64 market got Linux religion, a doubly whammy that Sun has never really recovered from, really.
The slip sliding of chip roadmaps makes it very difficult for customers who are hitting performance ceilings to manage their workloads. And because increases in memory capacity and I/O bandwidth often come with each new chip generation, companies that need more memory or I/O (but not necessarily more CPU oomph) are also left in the lurch. The adoption of neat features like decimal math units of hardware-assisted CPU and I/O virtualization, just to name a few, can also be halted because a chip generation slips. This is about more than how many cores a chip has and at what clocks speeds they whir.
The wonder, of course, is this: that a city or a chip works at all. Delays or not, I never cease to be impressed by the desire to push architectures, despite the huge costs of change. Eventually, chips improve, and usually, cities persist. And it truly is a wonder.