Bang for the Buck: Enterprise i5 Servers Versus the Competition
September 5, 2006 Timothy Prickett Morgan
If there is a general rule in the server business, it is this: The cost of server scalability rises faster than the increase in scalability. In an ideal world, servers would scale perfectly linearly, and vendors could just keep adding processors, memory, and I/O to boxes to help their customers support ever-larger workloads. Or, because this is 2006, they could support ever-more server consolidation. But, this being the real world, which has some limits of physics, scalability comes at a cost.
In the so-called enterprise-class server space, by which I mean machines that scale from four to maybe 16, 24, or 32 sockets–the cost of scalability is indeed relatively high, and vendors have to charge a premium for scalable machines regardless of hardware architecture or operating system. It costs more to engineer the hardware and software stack that drives such machines, which in days gone by would have simply been called mainframes or maybe even supercomputers based on the aggregate computing capacity and memory space they can bring to bear on one or multiple problems. Moreover, because there are relatively few vendors of both the high-end components that go into such boxes, very few makers of them, and comparatively few distributors, the entire enterprise-class ecosystem is comprised of companies that need to charge a premium to cover their costs, and customers who are well used to paying such a premium.
Big iron boxes, as we will see in the next installment of this Bang for the Buck series, cost even more because they tackle even larger hardware and software engineering issues.
The good news is that an enterprise-class machine of 2006, thanks to a decade of engineering, can do a lot more work than a similar machine could do a decade ago. Back in 1997, when IBM launched the “Apache” PowerPC servers, putting up to a dozen of these 125 MHz processors into a single processor complex, the top-end machine, the 650-2243, could deliver about 2,340 CPWs of raw computing power. At the time, this was one of the most scalable and powerful machines in the world. This server was also the first machine to bear both the AS/400 and RS/6000 label.
Back in 1999, when IBM was shipping the “Northstar” PowerPC-based AS/400 and RS/6000 servers, the company more than doubled the clock speed of these 64-bit processors to 262 MHz, and doubled the performance of the top-end 12-way box to 4,550 CPWs. At the time, based on the then-current roadmaps IBM had for Power4 and Power5 processors, I projected that in 2004 or so IBM would deliver a 64-core box using Power5 processors running at 2.2 GHz capable of delivering about 107,000 CPWs of performance, or just over 1 million transactions per minute (TPM) on the TPC-C online transaction processing benchmark test. IBM has far exceeded my estimates, and in terms of performance, appears to have exceeded a lot of expectations.
With the i5 570, which packs 16 cores into four boxes that are lashed together NUMA-style using fiber optic cables, IBM can hit 58,500 CPWs with 2.2 GHz Power5+ chips. That should work out to close to 600,000 TPM running i5/OS V5R4 and DB2/400, based on my current estimates. (IBM hasn’t run a TPC-C test on the AS/400, iSeries, or i5 line in so long, it is hard to be absolutely certain. But CPW does bear a direct relationship to TPC-C, since CPW is a variant of TPC-C.) With AIX 5.3 and DB2 8.1, IBM has been able to get a p5 570 using the 2.2 GHz Power5+ processors to deliver over 1 million TPM. The 64-core i5 595 using 1.9 GHz Power5 chips can hit about 1.8 million TPM, and the p5 595 using 2.3 GHz Power5+ chips can break 4 million TPM. Basically, IBM can deliver what seven years ago would have been the expected performance of a big iron box in a smaller and cheaper enterprise-class system.
But, that doesn’t mean IBM will not charge a premium for that enterprise-class machines. It does, as the salient characteristics table I built for this story shows.
The Metrics of Comparison
The machines in this table have the hardware features shown, including a basic chassis, processors (either single-core or dual-core), memory, two disk drives, and a tape drive. I have tried to keep the configurations across server architectures and operating system platforms as similar as is practical based on the natures of the product lines. I tried to put 2 GB of main memory per processor on the servers with multicore processors. In some cases, the architecture of the processor and the clock speed it runs at seems to be more of a limiting factor, and in those cases, there may be only 1 GB per core.
As I have explained before, I am aware that I am showing the estimated or actual (when test results are available) OLTP performance of a given processor complex and comparing the cost of a base configuration to this estimated top-end performance for the machine. In this way, I am trying to isolate the base cost of a server and show its potential performance on the TPC-C online transaction processing benchmark. Yes, the Transaction Processing Performance Council frowns on this sort of thing. Someone has to do like-for-like comparisons, and it is either going to be you or me–and I figure you have better things to do, like read this story after letting me do the work.
Each server has a similar stack of software. I have added an operating system and a relational database management system, and unlike in past years when I did such comparisons, this year I have thrown in virtual machine or logical partitioning hypervisors. I think many people are going to start using these hypervisors in production, and not just at the biggest data centers in the world.
The i5 has had such software embedded for years, and to make it a fair comparison, this functionality should be added to X64 servers as well. On these enterprise-class boxes running Windows and Linux, I added in VMware‘s top-of-the-line ESX Server 3 with all of the bells and whistles. While Novell‘s just announced SUSE Linux Enterprise Server 10 has the integrated and free Xen 3 hypervisor from XenSource in it, there are no recent tests on enterprise-class machines running Linux that employ SUSE Linux. Because the architecture of these big boxes is very different from two-socket or four-socket servers, where this is more data about Linux performance, the differences between Red Hat and SUSE are not very large. But on enterprise servers, where there is a lot of work that Red Hat or Novell do with specific partners, performance could differ considerably. Without a lot of data, it is hard to be sure.
I put Windows Server 2003 Enterprise Edition or Datacenter Edition on the Windows boxes, as well as SQL Server 2005 Enterprise Edition. Oracle Enterprise Edition on the Linux and Unix boxes. The Unix boxes running HP-UX use HP’s own Virtual Server Environment partitioning, the IBM p5 boxes use the Virtualization Engine hypervisor (also used with the i5), and the Sun Microsystems boxes use Solaris containers. I know that the latter is not as sophisticated as some of the other hypervisors–since containers have a shared Solaris kernel and file system underneath virtual machines–but if you want, you could put VMware ESX Server 3 on the Opteron boxes and run Solaris 10 inside the partitions.
None of the configurations have any hardware or software support costs added in, and where vendors put these in as a base requirement–as IBM does with Software Maintenance on the i5 line–I have stripped these costs out. Pricing is just for system acquisition and basic installation support.
How the i5 570 Measures Up
Compared to the i5 550, computing capacity on a base i5 570 is quite a bit more expensive than on the smaller two-socket i5 550 box. The good news is that if most customers can get by on the even cheaper (in terms of relative bang of the buck) single-socket i5 520, very few customers in the i5 installed base need the i5 550 and even fewer need the i5 570.
How much of a premium am I talking about? The i5 520 Standard Edition machines, which do not have any 5250 green-screen processing capacity, cost between 84 cents and $1 per TPM for the two configurations I profiled several stories ago. The i5 550s, which offered twice the scalability, cost between $1.42 to $1.56 per TPM for the configurations I ginned up. With the four configurations of the i5 570 I priced out–which had 2, 4, 8, and 16 cores activated and running i5/OS Standard Edition–the cost per TPM ranged from $2.16 to $2.56 per TPM. Smaller i5 570 configurations running i5/OS Enterprise Edition cost about 2 to 2.5 times that of Standard Edition machines when their cores were activated to fully support green-screen processing; bigger i5 machines running i5/OS Enterprise Edition cost about 1.5 times of the Standard Edition configurations. For these very large boxes, 5250 capacity was a lot cheaper than on i5 550 Enterprise Edition configurations and was in the same range as i5 520 Enterprise Edition machines. Clearly, IBM is positioning its largest i5 570 boxes as RPG and COBOL application consolidation boxes.
The other thing that is immediately obvious from the table is that enterprise-class Windows, Linux, and Unix boxes are still less expensive than i5 alternatives. But the gap in price/performance is a lot less egregious. In many cases, the enterprise boxes that have been tested by various vendors are as expensive as i5 570 machines, TPM for TPM.
However, the economics in this enterprise-class server space are changing, thanks to the introduction of dual-core processors from Intel and Advanced Micro Devices, and if IBM is not careful, it will very quickly fall behind.
Benchmark test results are not yet widely available on the new “Montecito” dual-core Itanium 9000 processors that were announced in July, and results have similarly not been announced for machines using the dual-core “Tulsa” Xeon MP 7100s. The Montecito chips offer about twice the performance of the single-core “Madison” 9 MB chips shown in some configurations, and the Tulsa chips offer between 60 and 70 percent more oomph than the dual-core “Paxville” Xeon MPs that are in some of the servers shown. If vendors hold prices relatively steady on their Xeon MP and Itanium boxes, customers will see a very big jump in price/performance.
HP put out some performance data on its rx6600 servers, which use the Montecito Itaniums and which are due to be launched this week. Based on HP-UX and Windows benchmarks on the rx6600, which uses its xz2 “Titan” chipset, HP is going to be able to offer very aggressively priced Windows and Unix boxes. These may not be able to scale quite as far as the i5 570 and p5 570, because they can only have a maximum of eight Montecito cores in the box. But at around 345,000 TPM for a four-socket server, this is all the box many customers will need for many years. With the Windows stack, the rx6600 can span from just under 140,000 TPM to 345,000 TPM (that’s with 2 to 8 cores) at a cost of between 58 and 62 cents per TPM. Customers who want a more scalable box can choose the rx7640, which will scale to 16 cores using Montecito, or the rx8640, which will scale to 32 cores. The indications are that these machines will come in at around $1 per TPM.
Ditto for the Tulsa Xeon MP servers. Unisys and IBM have done tests on their Paxville Xeon MP servers–the ES7000/one and System x 3950, respectively. Using the 3 GHz dual-core Paxville Xeon MPs, the ES7000/one machine delivered nearly 750,000 TPM running Windows Server 2003 Datacenter Edition and SQL Server 2005 at a cost of $1.27 per TPM for a base box with 32 cores (16 processors) and 32 GB of main memory. IBM only tested the x3950 using up to eight Paxville Xeon MP chips (16 cores), but I reckon that a 32-core box could do about 650,000 TPM at a cost of around 80 cents per TPM. When you slap Tulsa chips in that box, you boost performance by around 65 percent or so, and the Tulsa chips are about half as expensive as their Paxville predecessors. Assuming all other costs remain about the same–and there is no reason why they shouldn’t–then that extra performance translates almost directly into bang for the buck. That should put the top-end ES7000/one machine with 32 Tulsa cores at around 75 cents per TPM and the IBM x3950 at around 50 cents per TPM.
Like I said last week, IBM has to really think about using the Power5+ quad-core modules (QCMs) in the i5 line to make sure the i5 line keeps riding down the price/performance curve.
The interesting thing to note is that enterprise-class machines running Linux and Oracle 10g Enterprise Edition are not all that inexpensive. Microsoft is pricing its software stack very aggressively, and Oracle basically needs to charge half of list price to compete.
On the Unix front, HP’s rx6600 boxes are, as I explained above, very aggressively priced, and in some cases, they give the p5 570 a serious run for the money–and win it. This is possible because HP is moving to dual-core Itanium chips and packing a lot of wallop into a four-socket box. A similarly configured four-socket p5 570 has the same cost per TPM and does 60 percent more work. And, if you need to, you can add two more p5 570 chassis and double the performance again to over 1 million TPM.
While Sun has foolishly not provided TPC-C benchmark test results for any of its “Galaxy” Opteron-based servers, the estimates that I have done lead me to believe that if it did, it would be able to demonstrate price/performance on par with the HP rx6600 and IBM p5 570–very roughly, around $1 per TPM. Based on my guesses, I think the 16-core Sun Fire X4600 can scale from about 50,000 TPM with two cores (a single dual-core Opteron 885 running at 2.6 GHz) to about 350,000 TPM with eight of these processors. The Sun Fire E4900 server can have up to a dozen of Sun’s dual-core UltraSparc-IV+ chips in it, and the E6900 can have two dozen. The E6900 has a bit more scalability than the p5 570, and considerably more than the X4600 Galaxy box. But according to my analysis of the E6900’s pricing and my own performance estimates, the E6900 costs twice as much per unit of work than the HP Itanium, Sun Galaxy, or IBM p5 boxes.
You can sure tell which is the legacy Sun box, eh? And you know that customers who have applications that have been tightly written to the Sparc/Solaris architecture are now wishing maybe they hadn’t done that. Still, Sun has come a long way to close the price/performance gap with its UltraSparc-IV+ chips, and the improvements Sun has made have helped it retain its Sparc customer base (after losing a lot of it to Lintel boxes in the past five years). Moreover, on many workloads, the Power5 and Itanium chips do not do any more work than Sun’s homegrown UltraSparc-IV+ chips, and in that case, there is no penalty at all.