Power6-Based System i Performance: Your Mileage Will Vary
August 13, 2007 Timothy Prickett Morgan
Two weeks ago, I walked you through some performance and price/performance comparisons for the new Power6-based System i 570 server, which was announced in July. In that story, I focused on IBM‘s Commercial Performance Workload (CPW) ratings for the new box as well as on three prior generations of 16-core systems. This week, I will go over some other performance metrics that IBM is making available to help i5/OS and OS/400 shops suss out the amount of work these machines can do.
Two weeks ago, I went through comparisons of 16-core machines using Power4, Power5, Power5+, and Power6 processors and running the CPW benchmark. CPW is a variant of the Transaction Processing Council‘s TPC-C online transaction processing benchmark, and as such, it is very I/O intensive. What that means is that the big jump in processor core performance does not help as much as you might think in overall system performance. This is something that System i buyers need to keep in mind as they plan the capacity of their machines and their system budgets for acquiring that capacity. To recap briefly, with eight 4.7 GHz dual-core Power6 chips, the new System i 570 is rated at 76,900 CPWs, or about 31 percent more than a Power5+ machine with 16 dual-core Power5+ chips running at 2.2 GHz. This is about 72 percent more work that a 16-core i5 570 using 1.65 GHz Power5 chips could do three years ago.
In addition to the CPW rankings, which IBM provides for every processor bump for every model in every AS/400, iSeries, and System i generation, the company has provided three other benchmark test results for the new Power6 machine. And, somewhat strangely, IBM is only configuring four processor cores in the System i 570 for these comparisons–one quarter of the number of the processors that the box can support. Whenever a server maker does that–and they all do it–it makes me think that the SMP scalability of the server on certain workloads is not as good as some other tests might show. If it were, you can bet they would be testing fatter configurations.
In any event, IBM says that the new Power6 machine with four 4.7 GHz cores (that’s only two processor sockets) can handle 47,500 Domino Mail and Calendar users (MCUs), which is 38 percent more than an i5 570 with 2.2 GHz Power5+ chips (rated at 34,500 MCUs) and 79 percent more than an i5 570 with 1.65 GHz Power5 processors. The relative performance of the Power6 is a little bit better than the CPW rankings suggest, and this stands to reason given that the Domino workload is more CPU intensive than the CPW workload. (Not a lot, but some.)
IBM also tested the Power6-based machine using the SAP Business Intelligence Datamart (BI-D) test. A decade ago, when IBM was trying to get customers interesting in using the SAP R/3 and the mySAP ERP suites atop OS/400, the company ran the SAP Sales and Distribution (SD) benchmark on OS/400 servers. But, because benchmarks are oftentimes as much about not making comparisons as they are about helping to make them, IBM eventually shifted to running the Business Information Warehouse (BW, and yes, I know that abbreviation is not correct, but tell SAP, not me) benchmark on OS/400 and i5/OS servers. This way, IBM’s AIX and Linux boxes were tested using the SAP SD test and the System i was tested using the BW test, making direct comparisons impossible.
Ditto for the TPC-C OLTP test, now that I think of it, which IBM championed on the AS/400 back in 1992 and which it has backed away from like the plague. The iSeries and System i are not even in the TPC-C database any more, and the last OS/400 server to be tested on the TPC-C benchmark is the iSeries Model 840, back in December 2000. IBM has not been pushing performance as a differentiator on these OS/400 and i5/OS boxes, and because TPC-C tests have pricing information as well as steep discounts, it sure doesn’t want to give anyone the idea that it is willing to give a 50 percent discount on Power systems–as it does when it runs AIX and DB2 on them. IBM doesn’t even want to show people the hefty premium it charges for i5/OS and DB2/400 as well as peripherals that are in no substantial way different from those it sells in AIX machines for a lot less money. This is why I make my own TPC-C estimates and gin up prices for entry, midrange, and big iron boxes every year. How else will we know how things stack up?
In any event, if you thought IBM might continue using the SAP BW test, forget it. SAP BW is out, and SAP BI-D is now in when it comes to i5/OS performance measurements using SAP software. It matters very little, apparently, that there are only a few thousand installations of SAP code running on OS/400 and i5/OS, and that there are even fewer customers who might use SAP’s data warehousing software in production on a machine with a System i label on it. So far, only three machines have been tested with the BI-D benchmark, and they are all System i machines. In January, IBM tested a four-core i5 570 with 2.2 GHz Power5+ chips and 64 GB of main memory; this machine could process 51,875 query navigation steps per hour. In April, IBM doubled up the cores and memory in this machine, and more than doubled performance to 114,687 query navigation steps per hour. Both of these machines ran i5/OS V5R4 and SAP’s NetWeaver middleware. (Not WebSphere.) With the launch of the Power6-based System i 570 in July, IBM ran the BI-D test on a four-core machine using the 4.7 GHz processors, configured with 64 GB of memory. This machine was able to handle 92,716 query navigation steps per hour, a 79 percent increase over the four-core Power5+ machine tested earlier this year.
Finally, IBM gave out some performance metrics for its Trade6 benchmark, which is the company’s alternative to the new TPC-E benchmark, which simulates the data processing operations of a stock brokerage. The Trade6 test also simulates stock trading, and was formerly known as the WebSphere Performance Benchmark because the code is implemented in Java and runs on Java-based application servers in a two-tier or three-tier environment. IBM is using Trade6 internally to test the performance of its i5/OS, AIX, and Linux systems.
In a two-tier setup, the Power6-based System i 570 shines the best among all of the tests that IBM has run. A four-core i5 570 with 1.65 GHz Power5 processors was rated at 1,517 transactions per second (TPS) on the Tade6 test, and moving to the 2.2 GHz Power5+ chips boosted the performance of this four-core box to 2,045 TPS. That’s an increase of 35 percent. Moving up to four of the 4.7 GHz cores increases the performance by 46 percent again compared to the Power5+ box, with 2,980 TPS. That’s nearly double the performance of the Power5-based machine from three years ago.
As you can see, the performance increase you get really depends a lot on the workload. And as I have said many times, vendor and consortium benchmarks are only really useful if there are a wide variety of them and if they are being tested on lots of different architectures and configurations. The most important benchmark is your own, so you should run them and then reckon how your performance increases as you upgrade compare to publicly available tests. For all you know, your applications scale more like Trade6 and less like CPW–which would be a good thing.
One last thing. I have been racking my brains trying to figure out why a jump from 2.2 GHz to 4.7 GHz does not yield more performance, and if it doesn’t, then why IBM would bother architecting its machines this way? Just based on clock speeds alone and the improved efficiency that IBM is promising in terms of instructions processed per clock comparing Power5 and Power6 chips, I would have expected a lot more oomph out of the Power6 designs. Somewhere between 30 percent and 45 percent more work for a 114 percent increase in clock speed just does not seem like a smart trade–or a fair one. Particularly when all of IBM’s competitors are throwing four or eight cores on a chip and trying to lower clock speeds to save on heat and to get memory and CPU speeds closer to each other and thereby boost the efficiency of their processors.
This approach, from IBM’s point of view, has two problems. It presupposes that applications are multithreaded and can spread work out over more cores. Batch jobs and single-threaded RPG, COBOL, and C++ applications can’t take as much advantage of extra cores. Moreover, when IBM is charging for software by the core, the company cannot announce a machine where each core does less work and yet it still costs the same money to pay for software per core. IBM would have to cut its per-core software prices more or less in lockstep with the decrease in clock speeds. From IBM’s point of view, it is far better to keep those software prices right where they are and charge roughly the same for the hardware, giving customers that 30 percent to 45 percent price/performance increase on the hardware side.
If I had to guess, I would say that the architecture of this rejiggered i5 570 machine, which was designed for the Power5 and Power5+ processors, is not taking full advantage of the Power6 processor, and that when the real System i 670 machines show up in early 2008, we will see a fully enabled i5/OS V6R1 operating system that can make this machine do a lot more work than this box does. But that is just a guess, remember.