The Power 595 Takes the Top TPC-C Benchmark Ranking
June 16, 2008 Timothy Prickett Morgan
It is a goal of all the makers of big iron boxes to push the performance envelope, and not just because vendors have big egos–oh, they certainly have those–but because their customers are always pushing them to push the performance envelope a little further. Sometimes, a lot further. For the past six years, the cold war in the big iron space has been especially intense between IBM, the Unix upstart and proprietary system leader, and Hewlett-Packard, the Unix stalwart and Windows and Linux upstart with a smattering of proprietary big iron.
The two server makers have been leapfrogging each other pretty regularly since IBM put its first dual-core Power4 processors in the field and HP dropped its PA-RISC processor and adopted the Itanium chip from Intel in its Integrity server line. IBM has been peddling its i5/OS and AIX platforms steadily on its Power iron, and has moved Linux into the mix as the scalability of the Linux kernel has been expanded to support 16, 32, and then 64 cores. HP had some issues getting HP-UX and a cluster file system moved over to Itanium, but sorted it out by using Veritas rather than porting its own Tru64 Unix TruCluster software to HP-UX. Intel’s delays in rolling out successive generations of Itanium processors have caused the company grief in the high end of the server market, but Sun Microsystems and Fujitsu-Siemens had similar woes in the early 2000s, and even Big Blue had issues with the Power5+ and Power6 chips, which were late to market and did not deliver as much performance as many had expected. Chip happens, man. That’s how you know you are in the server racket.
With the delivery of the Power6-based Power 595 servers in May, IBM has once again been able to take the lead in the benchmark game for big iron, particularly since Intel’s quad-core “Tukwila” Itaniums, which were expected around the middle of this year once upon a time, are now being pushed out for customer deliveries in systems to early 2009. HP, which relies on Intel for chips for its Integrity server line, is once again going to lag IBM significantly in the raw performance game, but Sun has pushed out its “Rock” UltraSparc RK chips by a year to the second half of 2009, so HP is not the only one staring down the barrel of a much more aggressive Power6 lineup. And, of course, while IBM’s Power machinery can support i5/OS V5R4, i 6.1, AIX 5.3, AIX 6.1, and Linux 2.6 from Red Hat and Novell, HP and other Itanium server makers can support Windows Server 2003 and Windows Server 2008 natively on their iron, and HP can also offer HP-UX v3, Linux 2.6, OpenVMS 8.3, and the NonStop fault tolerant platform on Itanium gear. HP and NEC have also partnered with Platform Solutions to support its System64 mainframe emulation environment on their Itanium machines, and similarly, HP has partnered with Infinite Software to sell its Infinite iSeries i5/OS and RPG emulation environment on HP-UX running on Integrity servers using Oracle databases. To some ways of measuring it, HP’s application support on Integrity is broader than IBM’s on Power; both offer much broader support than either Sun or Fujitsu-Siemens do on their respective Sparc iron, which is a Solaris or nothing proposition.
The point is, there are many different ways to brag, and IBM and HP seem to be the only two vendors that have bigtime bragging rights at the high end of the server space. Fujitsu-Siemens and Sun could get back into the game, but they are going to have to take it up a few notches.
The latest proof point for performance, and one that IBM has favored since it got the first dual-core chip to market in the fall of 2001, is the Transaction Performance Council‘s TPC-C online transaction processing benchmark test. And last week, IBM reported the results it has just completed on its 64-core Power 595 box, which give it a considerable performance margin over the biggest HP Integrity iron but not as much of a gap as IBM has held in the past. The transition from Power5+ to Power6 involved a substantial change in the instruction pipeline, such that more than doubling clock speed only resulted in about a 50 percent performance boost. This sounds like a pretty bad trade, until you realize IBM can goose clock speeds by another 20 percent or so yet and that, in an increasingly multicore world with per-core software pricing, keeping the core count constant in a box while every other vendor doubles the core count to get a 50 percent performance boost within the same thermal environment will give IBM a significant software pricing advantage over its peers. If software makers shift to performance-based or system-level software pricing, then this advantage will disappear, of course. But in the enterprise server space, you try to craft every advantage you can.
On the TPC-C benchmark, IBM set up the Power 595 server with 32 of its 5 GHz Power6 dual-core processors, which have 4 MB of L2 cache per core and 32 MB of L3 cache shared for each pair. That’s a total of 64 cores and 128 threads, since the Power6 chip supports simultaneous multithreading, which allows each instruction stream to be virtualized and then presented to the operating system as two virtual instruction streams that boost performance because of the increased efficiency of thread processing. The Power 595 was set up with a stunning 4 TB of main memory and 68 4 Gb/sec Fibre Channel adapters to link out to 68 of IBM’s DS4800 storage arrays with an astounding 10,992 disk drives, providing a total of 805.7 TB of data. This is an insane amount of disk drives, but the TPC-C workload requires a certain amount of capacity per user, and the Power 595 is able to support 5,184,000 simulated end users, so it takes an absurdly huge amount of capacity to run the test.
With 4 TB of main memory, it is hardly likely that I/O subsystems are being stressed as much as they were when boxes were confined to 128 GB or 256 GB of memory, which wasn’t all that long ago. In any event, this huge server was configured with AIX 5.3 (not 6.1, interestingly) and a forthcoming release of IBM’s own DB2 database, Version 9.5, which is not slated for availability until December 10, according to the TPC-C results. This box was able to handle 6,085,166 transactions per minute (TPM) when operating as a database server for the workload, which was actually running on a network of 128 System x 3550 servers running Windows.
The pricing on this benchmark test was interesting, as it tends to be. The basic server chassis (actually two racks packed to the gills with electronics) cost $12.05 million, with $630,000 of maintenance fees for two years beyond the one-year warranty period. The memory cards in the machine cost $2.6 million just to have them installed, and another $6.2 million to activate the 4 TB of memory latent on the cards; that’s $8.8 million, or $2,150 per GB. You can see where the profits are in such a machine–and it is not all in the Power6 chips, my friends. That said, it cost $484,000 to buy the eight processor books in the machine and another $1.94 million to activate those cores. Now, hold your breath. All that storage cost another $20.4 million. And then AIX cost $2,495 per core, plus $7,680 per core for DB2. When you add up the systems software (operating system plus database) and their maintenance for a three-year span, that comes to $2.78 million, which seems cheap by comparison. (Only by comparison, of course. Wink, wink.) When you add in the client hardware and software running the TPC-C workload, the whole shebang costs $37.4 million, and then IBM comes in with a big red pen and slashes $20.3 million, or 54.3 percent of the sticker price, off the configuration. And that works out to $2.81 per TPM after that, uh, generous discount.
Good luck trying to get it out of IBM, though. Unless you are talking about moving applications to HP-UX, Windows, or Linux on Itanium.
Now, the interesting bit will be to see how the i5/OS and i 6.1 operating systems do in terms of performance and price/performance on the TPC-C test. That is, if IBM has the courage and sense to actually tune i5/OS so it can exploit the hardware fully and let DB2 for i scale as well as DB2 9.5 does for AIX. IBM has not released Commercial Workload Performance (CPW) ratings for the Power 595 running an i operating system yet, but if the past is any gauge, the Power 595 will be rated at somewhere around 331,000 CPWs with all of the memory and disks it could want to run the CPW test. That works out to about 3.3 million TPM on the TPC-C test–a little more than half the work it can do running AIX. I am hoping IBM fixes that. Keeping the hardware price the same, but the performance half, is just another way of charging twice as much.
Not that the price will necessarily stay the same for a configured TPC-C machine running the i software stack. i5/OS V5R4 or i 6.1 costs $53,000 per core, plus another $16,200 per core for three years of Software Maintenance. On a 64-core box, that software would cost $4.4 million–nearly twice the price of the AIX and DB2 stack, which cost $2.78 million, as I said above. An i edition of this Power 595 machine would therefore cost just over $39 million. Now, if this machine does twice as much work as a System i 595 box, that’s great. But to get price/performance parity with the AIX setup, DB2 for i has to be tuned like AIX and the DB2 for Unix, Windows, and Linux has been. If it isn’t, the raw price/performance of the i version of the machine will be somewhere on the order of $11.80 per TPM, more than four times higher than the discounted AIX box. Even after the same 53.4 percent discount, an i box would cost in the range of $5.50 per TPM. That’s still not low enough. The main difference is that IBM is only charging what amounts to $43,393 per core for three years of maintained AIX and DB2 licensing, while i5/OS V5R4 and i 6.1 is costing $69,200 per core. That’s a 59.5 percent price premium, and at those prices, i5/OS should be doing more work, not less. That premium, and the relatively low performance of i compared to AIX, is the true cost of supporting legacy applications at the high end.
If there is any consolation, it is probably that System z mainframes are worse. (I will get to that eventually. Fear not.)
HP surely knows this, which is why it has partnered with Infinite Software and Platform Solutions. Using dual-core Itanium “Montvale” processors running at 1.6 GHz with 12 MB of L3 cache per chip, an Integrity Superdome machine with 64 processor sockets, 128 cores, and 256 threads, equipped with 2 TB of main memory, 320 TB of disk capacity, and HP-UX 11i v3 and Oracle 10g Release 2 Enterprise Edition, can handle just under 4.1 million TPM on the TPC-C benchmark and costs $21.6 million. After a 44 percent discount–considerably lower than what IBM is offering–this box yielded a price/performance of $2.93 per TPM. The HP box is in the same ballpark as the Power 595 running AIX, but it is a lot less expensive than what I think the i variant will yield if performance and prices come in where I expect when the Power 595 supports i5/OS and i this coming September.