The Power 795: Cheaper Performance, Expensive Software
August 23, 2010 Timothy Prickett Morgan
In last week’s issue of Four Hundred Stuff, I gave you the basic feeds and speeds of the top-end, Power7-based Power 795 server that was announced on August 17 by IBM. At the time of writing, lots of the feeds and speeds for the machine as well as its pricing were not yet available, so let’s circle back and fill in some of the blanks.
You can see our original coverage of the Power 795 system here, but here’s a brief recap. The Power 795 has eight processor books, each with four processor sockets. One variant of the machine uses six-core Power7 processors running at 3.7 GHz, and another uses eight-core chips that run at 4 GHz in MaxCore mode (with all cores on each chip turned off) and run at a higher 4.25 GHz when in TurboCore mode (half the cores in the system are turned off after rebooting the machine). The baby Power 795 has a maximum of 192 cores and 768 threads for applications to play in, while the big model has 256 cores and 1,024 threads. The machine comes with 32 12X I/O remote I/O drawer attachments for a total of 3,052 “natively” attached disk drives and 640 PCI-Express peripheral slots; it can have as much as 8 TB of main memory. On AIX, with Active Memory Expansion data compression, the effective capacity of main memory is 16 TB.
By any measure in any time, this is an absolute beast of a box. And is easily the most scalable single system image anyone has ever built. (Well, unless you count whatever mysterious machines are probably part of some black ops budget at the Central Intelligence Agency.)
The memory architecture in the Power 795 machines, says IBM, is brand new, allowing for each processor book to support 32 DDR3 memory sticks running at just over 1 GHz. My guess is that IBM is buffering memory instead of running it at lower clock speeds and interleaving it, as it has done in the past. Intel has also instituted memory buffering with its top-end Xeon 7500 processors for much the same reason–to get processor and memory speeds to align better. Main memory features are groupings of four DDR3 memory sticks, which are added one per socket presumably. Each book requires a minimum of two memory features (four DIMMs) to be installed. Once the memory is physically installed, customers have to active 2 GB of main memory per core, with a minimum of 32 GB being required for the Power 795 system. (You have to pick whichever number is greater. On a 16-core machine, the numbers are the same.) IBM recommends balancing the memory across the sockets and books in the system for performance reasons, but it is not required.
IBM has three different memory features for the Power 795, which is known by the product number 9119-FHB. Feature 5600 consists of four 8 GB memory sticks, for a total of 32 GB of physical memory, for $1,960; feature 5601 consists of four 16 GB sticks, but these DIMMs are a lot more expensive, at $7,720 a pop; feature 5602 has four 32 GB sticks, for a total of 128 GB of capacity, at a cost of $15,440. That just gets you the physical memory in the system, and you need to buy memory activations at a cost of $245 per gigabyte to actually let i, AIX, or Linux see the memory. The cheapest way to add memory is obviously to use feature 5600, which would give a fully loaded Power 795 a maximum of 2 TB of main memory–the same capacity as the Power 595 when it was fully loaded. On the Power 795, using the skinniest memory sticks would cost $306.25 per GB, or $627,200 for the full 2 TB. Using the denser feature 5601 memory features on the Power 795 processor books costs $365.63 per gigabyte, or just under $1.5 million for a full 4 TB on the box. Using the denser 32 GB memory sticks costs the same $365.63 per gigabyte, or just a shade under $3 million for the full 8 TB.
That’s a lot of memory, to be sure. But that is also a lot of money, and that is why you can expect to see AIX shops get very excited about Active Memory Expansion, which is turned on with feature 4790, which costs $13,800, and which spans the entire system. For a number of workloads, this memory compression not only will boost performance, but it can cut main memory costs in half.
And isn’t it a damned shame that i 7.1 doesn’t have this memory compression technology? Just like the i variant of the Power Systems platform doesn’t have dynamic logical partitions or subsystems that can move around from system to system like dynamic workload partitions in AIX (WPARs for short). Or like the i 6.1 and i 7.1 operating systems can only span 32 cores and 128 threads, and for i 7.1 running on Power7 iron, a special PRPQ will only let it span 64 cores and 256 threads. That is, at best, one quarter of a Power 795 for the i operating system and its integrated database. I have complained to the IBM top brass in the Power Systems division about these shortcomings on your behalf, for whatever that is worth.
AIX has some threading issues, too. On the Power 795, you are limited to 32 cores and 128 threads for dynamic logical partitions at the hardware level (that’s one processor book and that makes it a hardware partition of sorts) unless you get feature 1256, which allows dynamic LPARs to scale up to the full 256 cores of the Power 795 system. IBM is not charging for feature 1256, but the fact that it has a separate feature for this scalability implies that Big Blue is keeping its options open. It is not clear if i 7.1 needs feature 1256 or an analog to jump from 32 cores per partition up to 64 cores in that PRPQ. (I covered the short-sheeting of the i operating system thread counts back in February.)
The base Power 795 frame costs $91,000. The 24-core, 3.7 GHz processor book (feature 4702) for the machine runs $50,900, and a processor core activation for this book run $7,550 a pop. Rather than buy a whole processor forever, you can buy 100 minutes on a 3.7 GHz core for $3, or a whole day for $23. The faster and fuller 32-core 4 GHz processor books (feature 4700) cost more, and not just because they have more oomph but because they also have that TurboCore mode that the 24-core books do not. The feature 4700 processor book costs $99,900, and each core activation costs $10,950. You can activate one of the cores on the 32-core book for 100 minutes for $27 and you can use it for a whole day for $194.
When you do the math, a base Power 795 with a dozen cores using the six-core Power7 chips would cost $232,500 with no memory, disks, or I/O features. A base Power 795 using their eight-core Power7 chips and sporting 16 active cores would run $366,100. Loading up that Power 795 with the full complement of cheap cores and an appropriate amount of memory (call it 1 TB using the cheaper memory), and the base hardware system costs $2.26 million at list price. A top-end Power 795 with 256 of the faster cores and 2 TB of main memory (using the 16 GB or 32 GB sticks, it doesn’t matter), would cost $4.44 million at list price.
IBM was charging $33,456 for a 4.2 GHz Power6 processor book for the Power 595 machine, plus $16,796 per core to activate the cores. So a full 64-core system using the slow processors would cost you $1.43 million just for the base hardware chassis and the processors. The full-on 5 GHz Power 595 processor book had a list price of $47,390 plus $23,750 per core to activate the cores, and a 64-core box would cost $1.99 million with no memory, disk, or peripherals. That 64-core machine was rated at 294,700 on the Commercial Performance Workload (CPW) scale used for OS/400 and i platforms, which makes the base top-end Power 595 cost $6.75 per CPW.
You can get 64 cores out of two Power 795 books, and the resulting base system will only cost $991,600 for the rack chassis, the books, and the activated Power7 cores using 4 GHz processors. That’s less than half the money for the same number of cores. IBM did not provide a CPW rating for the 64-core machine, but said that a 24-core box using 3.7 GHz cores was rated at 149,100 CPWs. At 64 cores running at 4 GHz, you are probably talking about something along the lines of 430,000 CPWs (although you can’t scale to more than half of that with i 7.1). If you do the math, this works out to $2.31 per CPW for the Power 795 machine. That’s a 66 percent improvement in bang for the buck, which is a lot of improvement for IBM. And certainly what is needed to make up for a lack of a bump in performance with a Power6+ machine in 2009 and the ongoing 30 percent or so improvement per year that customers expect in the server racket.
The Power 795 is in OS/400 and i software tier P50. IBM i 7.1 is known by product number 5761-SS1, and it costs $59,000 per core to license the operating system and cover one year of Software Maintenance (SWMA) support. The license without SWMA is $53,000, but the way it is packaged, you really can’t buy it that way. The 5250 Enablement feature for the Power 795 costs $50,000 per processor core on the Power 795, and customers can activate it on the entire box for $400,000.
On that hypothetical base Power 795 with 64 cores, the software costs are exactly the same as on the 64-core Power 595, and at $3.78 million for all 64 cores. Toss in 5250 Enablement on eight or more cores, and you have to add another $400,000. That’s $4.18 million, or about $9.72 per CPW on the Power 795. While the price per core stayed the same, the performance got better and the software cost per unit of work is 31 percent better than the $14.17 per CPW price of the Power 595.
If you add the hardware and the software for a base system with no memory and disk together, then the bang for the buck improvement is a mere 16.2 percent. Which means one thing for sure: customers upgrading to or buying new Power 795s are going to get some killer discounts on software. If they were getting 25 to 30 percent in 2008 and 2009, they will be expecting 50 percent or more in 2010 and 2011. There’s just no way around that.