Performance Per Watt on Power6: Same Thermals, More Work
August 13, 2007 Timothy Prickett Morgan
The largest data centers in the world, particularly those that are located on the East and West coasts of the United States or in similarly heavily urbanized areas around the globe that strain their respective power grids, are facing two problems. The first problem is that their data centers are running out of power and cooling capacity to keep servers and their related storage running. The second problem, affecting mostly urban areas on the coasts so far but maybe everywhere on the grid as electricity consumption grows, is that power companies are actually telling corporations that they have to cap their electricity usage.
While the AS/400, iSeries, and System i server platforms have tended to be used by small companies, often in rural and suburban areas as well as in urban areas, and therefore are not usually deployed in large, multi-million dollar data centers, the fact remains that power and cooling issues are something that every company should be conscious of. Regardless of what you think about global warming or climate change, with the ever-increasing cost of energy and the political and personal ramifications of energy policies, we can all agree that every watt that doesn’t have to be burned should not be burned.
Luckily for i5/OS and OS/400 shops, IBM has been riding along on the Moore’s Law curve, providing ever more powerful processors in its servers, integrating more functions inside of those processors that used to be on outboard chips or peripheral cards inside the system, and adding innovative virtualization features to the box that allow more efficient usage of these components and consolidation of physical footprints inside the data center and the data closet.
IBM has been pretty cagey about the thermal design points (TDPs) of the Power4, Power5, and Power6 generations of servers, and before the Power4 chips came out in October 2001, IBM didn’t ever talk about the electricity consumption and heat dissipation ratings of its AS/400 CISC processors or the Power chips used in the AS/400 line. But with the launch of the Power6-based System i 570 in July, IBM did provide some guidance for the thermals of the central electronic complex (CEC) of the new Power6 box compared to earlier generations.
Specifically, Craig Johnson, product manager for the new Power Systems division (formerly of the System i division and still located in the Rochester, Minnesota, labs), says that a System p6 570 CEC with four cores activated using the 4.7 GHz Power6 chip consumes about 1,400 kilowatt-hours of electricity. That electricity consumption–which all turns into heat dissipation except for a infinitesimally small amount of energy that is converted into stored information, locked into disk drives, or causing fatigue in components or noise in the air–is for the CEC only, and does not include the power consumed by main memory, by disk drives, and other peripherals used in the system. The 1,400 kilowatt-hour rating of the four-core Power6-based machine is a little more than the 1,300 kilowatt-hours of the System i5 570 server using the 1.9 GHz Power5 processors, with four cores activated. However, on a CPW per watt basis, the Power6-based server is 67 percent more power efficient than the Power5 box. This is a very large gain, and one that is attributed mainly to Moore’s Law.
IBM also provided a comparison with the earlier iSeries Model 870 machine, equipped with 16 of IBM’s Power4 cores running at 1.3 GHz, which consumed 6,000 kilowatt-hours for the base CEC. A four-core Power6-based System i 570 does essentially the same amount of work, but burns less than one-quarter of the juice as the Power4-based machine at the CEC level.
While progress is always a good thing, these comparisons do not take into account the fact that memory is getting faster and hotter and disk drives are spinning faster and also generating more heat. Moreover, they ignore some of the substantially performance per watt gains IBM made with the Power4+ and Power5+ processors at the chip level.
TDP ratings are not normalized across the industry, so it is hard to make comparisons across chip generations and architectures because different components are integrated in each design. Intel, for instance, does not integrate memory controllers on its X64 chips, while Advanced Micro Devices does, and IBM has as well since the Power4 generation back in 2001. So when Intel says it can do a standard dual-core or quad-core Xeon chip with an 80-watt TDP, you have to remember to add at least 20 watts for the memory controller to compare to the 95-watt TDP of the standard Opteron parts or any of IBM’s Power4, Power5, or Power6 chips.
The data is a bit sparse, but the original Power4 chips, which ran at 1.1 GHz and 1.3 GHz had a TDP of 115 watts and 125 watts, respectively. These were very hot processors, and even though they had very impressive performance at the time–at least twice the performance that any other chip maker could deliver, thanks to being the first dual-core chip in the world–this heat explains why IBM did not put them in blade servers or entry System i servers. The Power4 chips were implemented in a 180 nanometer chip making process, which was not cutting edge at the time in terms of transistor size, but which was very sophisticated in terms of the style of processes that IBM used.
With the Power4+ generation of chips, IBM kept the design essentially the same, adding 10 million transistors, boosting the count to 184 million, increasing the L2 cache a bit and tweaking the microarchitecture some. IBM could do this because it moved to a 130 nanometer process, which allowed it to boost the clock speed to 1.9 GHz in some high-end machines, but the TDP dropped to around 70 watts according to some estimates. (This sounds a bit low to me.) With the Power5 design, IBM boosted the transistor count to 276 million, adding more L2 cache (1.9 MB shared cache for both cores) and delivering chips that ran at 1.6 GHz or 1.9 GHz. The top-end part is estimated at have an 80-watt TDP, which is very respectable for a dual-core chip with an integrated memory controller and lots of other features.
With the Power5+ chips, IBM basically kept the chip the same and moved to a 90 nanometer chip making process, allowing it to shrink the chip by around 40 percent, boost the clock speed to 2.2 GHz, and keep the TDP down around an estimated 70 watts. This is truly impressive, if the estimates I have seen are correct, and it is a wonder IBM doesn’t brag more about this.
With the Power6 chip, IBM has thrown thermal caution to the wind, however, and is seeking to gain performance through clock speed increases, which are engendered through a move from 90 nanometer processes to 65 nanometer manufacturing. This, of course, runs counter to the approach taken by Intel and AMD with their X64 processors, where they are dialing down clock speeds a little bit each generation and doubling up on cores with each chip fabrication jump. Then again, IBM thinks about batch windows and single-threaded application performance a lot more than Intel and AMD do, so this stands to reason.
In any event, the Power6 chip running at 4.7 GHz is estimated to have a TDP north of 100 watts, but considering its clock speed–more than double that of the Power5+ chip–and the 750 million transistor count, anything around 100 watts is pretty reasonable given the amount of work the Power6 chip does. By contrast, Intel’s dual-core “Montecito” Itanium 9000 series chip running at 1.6 GHz and doing roughly half the work of a Power6 chip has a TDP rating of 104 watts, not including a memory controller but including two 12 MB L3 caches, one for each core. Intel needs that large cache because it doesn’t have the memory controller on the chip.
As has been the case since the Power chips went 64-bit thanks to the Rochester Labs with the “Muskie” PowerPC AS designs in 1995, the Power designs beat whatever Intel can deliver in terms of power efficiency and elegance. But the fact remains that IBM sells fewer Power chips each year in its commercial servers than Intel does with Itanium and Xeon MPs–and Itanium shipments might even be drawing even with Power shipments as well at this point.
Getting back on my soapbox again, I still think that IBM should be positioning a line of System i servers with low-power main memory, small form factor SAS disks that use a lot less electricity, and maybe even solid state disks that use very little power. The Power6 line of servers should be the unquestionable leader in performance per watt, because this is what will get a salesperson in the data center door these days. An i5/OS box that can do transactions with half the energy for the same money will win deals. Period. And with a mix of Web-enabled 5250 workloads and the right hardware, the System i can win in such competitive situations–even against System p boxes running Java. If I was running IBM, that is what I would be building a line of Power6-based System i machines and their marketing message around.