The Balance of Server Powers
Published: March 2, 2006
by Timothy Prickett Morgan
For the past couple of decades, when people talked about computing power, it was understood that what they meant was the ability of a computer to do work of some sort--number crunching, database churning, transaction processing, or, more recently, data streaming. Power is work over time, and when you invest in a server platform, processing power was by and large what you were investing in.
But computers have always have another kind of power associated with them, one that is less apparent except for those few who can still recall rooms full of white-hot vacuum tubs on very early electronic computers from the 1950s. Academic and government institutions that run the world's largest supercomputers are also all too aware of the other kind of power that is inextricably tied to electronic computing: the consumption of electricity by processors, memory, disk drives, and other server components, the creation of heat from the cycling and whirring of those parts, and the need to move all of that heat off the computers and get it out of the data center through various kinds raised-floor, water and air cooling systems and external chillers. All of this data processing and movement of heat takes a lot of energy.
It is ironic that very little of the energy that goes into a computing complex--as measured from the moment the juice enters the building and is used by the computers and their cooling systems--actually ends up as manipulated information. A computer, whether you are talking about a PC or Linux cluster, is a very inefficient machine. All of the peppy response time and data bandwidth that the modern, computerized world requires comes at a big price: as computer components have become smaller, and therefore use a lot less power to do the same work, we have just consumed more and more computing power--and done so very inefficiently, at that.
When you buy a server, you don't just pay once, you pay three times: once when you buy it, once when you run it, and once when you cool it. Last September, in a presentation for the Association for Computing Machinery, Luiz André Barroso, the principal engineer at Google, explained that after three different generations of server architectures for its massive search engine complex, it had increased performance by nearly a factor of two and had increased the performance per server dollar by a factor of 1.3, but performance per watt in its vast network of machines (which is rumored to be north of 100,000 units) remained perfectly flat. That was after some of the smartest nerds in the world tried everything they could think of to improve the energy efficiency of the Google network. But it gets worse. Assuming that hardware costs hold relatively static at around $3,000 for a server over the next five years, assuming that energy costs rise by 20 percent per year, by year five the cost of running a server each year will be the same as acquiring it. If you do a worse-case scenario where energy costs skyrocket on a 50 percent growth path from current levels, it could cost $10,000 a year to power and cool a server five years from now--arguably a very powerful server with very nice price/performance at the hardware level. But who can afford that electricity bill?
No one in a data center can see the nuclear power plant or coal-fired plant that fuels the place where they work and where their company keeps its brains, but if they did, it might give them reason to pause. The move to distributed computing, which is very flexible, useful, and affordable (at least from an acquisition standpoint) even if it is inefficient, and the application of computing to more and more tasks in the operations of businesses, have caused a massive hot zone in the data centers of the world. Push has indeed met shove, and the desire by IT people to take the easy way out and throw hardware at their IT problems has run smack into the laws of thermodynamics. The MIPS and FLOPS have met the kilowatt, and they do not like each other very much.
Technologies are being brought to bear on a number of fronts to deal with the power consumption and heat dissipation issues that data centers are facing. It is probably only though a combination of these technologies, as well as careful planning of hardware and software systems and the exercising of extreme restraint, that data centers are going to be able to cope with power and thermal issues.
Chips: So Long GHz, Hello Multicores
It is funny to remember that in late 1990s and early 2000s, a lot of the chip roadmaps coming out from the major chip and server makers--Intel, IBM, Hewlett-Packard, Sun Microsystems, and others--were still talking about getting clock speeds up to 4 GHz, 5 GHz, and 6 GHz with the now-current generations of 90 nanometer and 65 nanometer chip making technologies. This obviously didn't happen, but chip makers sure would have liked it to--especially those who were not planning on moving to multiple cores per chip until later in the 2000s. In hindsight, the shift away from cranking up processor clock speeds and toward multiple cores on a processor seems inevitable, but it was not. Chip makers did not want to cope with creating a dual-core chip designs and adding multiple threads per core to try to get each of the hundreds of millions of transistors on their processors to do work. They wanted to ratchet up clock speed, give more performance, and make more money.
But, chip makers had no choice but to abandon the gigahertz race because the tower, rack, and blade server form factors in use in the data centers of the world had hit their thermal limits. Customers told server makers they could not put hotter machines in their data centers. No matter the performance benefits of a big RISC box or a dense block of blade servers, many data centers were not designed to supply that much power or cope with that much heat in a small place.
While no one wants to talk about this so much in 2006, there will come a time when this multicore use of the transistors enabled by Moore's Law (which is slowing to a doubling of transistor count to about every 24 months after a run rate of every 18 months for decades) will also run out of gas, much as the move from CISC to RISC and the megahertz and then gigahertz crank did. How many threads does a server need? How many does a PC need? At some point, having more threads and more cores does not add to application performance, but simply crams two computers worth of iron into a shared chassis.
The "Niagara" Sparc T1 chip from Sun already jams eight cores with four threads each onto a single chip that consumes between 72 watts and 79 watts. The T1 threads are clever in that they are active about 75 percent of the time, and they are designed with exactly enough threads to efficiently do the sorts of Web work that Sun designed the chips to do. The T1 chips have an elegant design, and have about the same Web serving performance as a two-way Intel Xeon server, which consumes somewhere between two and three times as much juice to do the same work. Sun is already working on future Niagara-II chips that will offer a lot more performance, and its future "Rock" Sparc processors, due a few years from now, will take multithreading to new heights.
Advanced Micro Devices has for a few years taken a different tack in the power game, by embedding the memory controller inside the Opteron processor and by doing a deep sort in its processor bins to find chips that run at the specified clock speeds at different voltages. AMD has Opteron EE chips that run at 35 watts for the embedded market, Opteron HE chips that run at 55 watts, a special-bid 68-watt chip that is available to tier one server makers, and an Opteron standard part that runs at 95 watts. These chips all run at rated clock speeds (although the Opteron HE variants tend to initially lag by 400 MHz or so), and deliver full performance. And AMD gets to extract some profits from customers who need to have the absolutely lowest electricity usage and heat dissipation, because it can charge a premium for these low-powered chips. By contrast, low voltage Xeon or Itanium processors from Intel tend to run at much lower clock speeds than the standard parts and therefore offer much lower performance.
The Form Factor Factor
Of course, the processor is not the only culprit. Power supplies get larger as more components are added to servers. Disk drives spin faster. Memory cycles quicker. And so on. In presentations throughout 2005, IBM, which has become the dominant supplier of blade servers in the world, outlined the power issues with blade form factors this way. In a typical 1U rack-mounted server, about 30 percent of the power use of the entire machine was from the CPUs, with memory accounting for 11 percent, PCI buses for 3 percent, the backplane for 4 percent, the disk drives for 6 percent, 2 percent for standby components, and the remaining 44 percent for power and cooling components such as power supplies and fans. In a blade server, that 44 percent is reduced to 10 percent because of the sharing of these components. This gives the blade server a tremendous advantage when it comes to electricity consumption and heat dissipation. An individual blade can cost 25 percent less than a configured 1U server, is 33 percent more efficient when it comes to power use, and takes up half the floor space. The virtualization and internalization of the network in the blade approach also cuts cabling costs by as much as 86 percent, which cleans up the spaghetti wire mess in the back of the machine and reduces complexity. If you have ever configured a rack of servers, you know that this wiring mess is a real problem.
So why doesn't everyone use blade servers? Well, there is a direct relationship between computing power and electrical power consumed for computing and cooling. The more computing you pack into a space, the bigger the headache you have to supply it with power and, even if you can do this, you have to cool it. The raised-floor environments in most data centers were designed for a relatively even distribution of heat, and this is not the case any more. Just adding a rack of ultra-dense blade servers to a data center can cause a massive heat island.
No matter what form factor a server is in, once you start looking at the interplay of performance, electricity, and heat, you see problems everywhere. Here's a case in point, just to illustrate a new kind of thinking that data center managers are going to have to engage in. A compelling feature of modern servers that plays against the power struggle between MIPS and watts is capacity on demand. When vendors load up their servers with deactivated processors and memory so they can activate it for customers in an on-demand fashion, those components, even when they are not active, still burn electricity when they are idle. You end up paying to run and cool an asset you are not using. Granted, it is not running at full speed, but even running in power-down mode is not free, either. So an idea that sounds good from a capacity planning standpoint might not be as completely good as it seemed at first.
Of course, the best thing to do with the heat is to not generate it in the first place, and that is what one vendor, Rackable Systems, has focused like a laser upon. Rackable, which went public last summer, has done a lot of engineering to make rack-mounted servers that can pack a lot of processing wallop in a small space and has some interesting DC power units and low-power processors that make its machines run as much as 40 percent cooler than standard Xeon-based rack servers. For data centers busting at the seams or those with severe power and cooling issues, the designs put forth by Rackable are starting to resonate.
Rackable sells rack servers and related storage products that put the servers back-to-back in a rack that is designed to move hot air from the center of the box, like a chimney. Because of the compactness of the design, Rackable can put twice as many processors in the space taken up by a standard rack. The company likes the Opteron HE chips, but also sells Intel chips, including the new "Sossaman" dual-core variant of the Pentium M. The low-voltage Opteron processors can also significantly cut down on power use and heat dissipation. Finally, its most advanced machines do not have AC power supplies inside each server module in the rack, but rather a big DC power unit at the top of the rack that has power lines that reach down into the servers. The DC power unit can also be stored outside of the rack--say, down the hall where the HVAC unit is located--thus keeping the environment surrounding the servers cooler.
When you add all of this up, the savings can be substantial as can the lowering of electricity usage and cooling needs (which further reduces electricity needs). Rackable did some math on a setup of 1,760 X64 servers. Using regular two-socket Xeon processors, the setup needed 65 racks. Using the low-voltage Opteron processors and the DC power, Rackable could deliver the same 1,760 two-socket servers in 20 racks at a slight premium in hardware and installation, but still deliver $276,000 per year in savings for power and cooling.
SWaP with Your Friends
Of course, everyone wants to be able to make comparisons between servers when it comes to power efficiency, but there is no agreed upon standard for doing this. Sun has been at the forefront of this effort, has proposed a metric called SWaP, short for Space, Watt, and Performance.
While SWaP is a step in the right direction, it has some limitations. For one thing, SWaP takes the performance of a server on any benchmark and divides it by the space used by the server (as measured in standard form factors, which are 1.75-inches high and are abbreviated "U" in the industry, presumably short for "unit") and then divides it again by the power consumed by the server. There are a few problems with SWaP. First, it assumes that server form factors are constant in terms of depth and width, and they may not stay that way. Rackable already puts two servers in the same rack space that a standard rack server takes, so the SWaP's space factor is not being measured correctly. To be fair, you need to measure the volume of a server, not its vertical space. Moreover, measuring vertical space in a rack when it is square feet in floor space that costs money in the real world seems insufficient. Then again, once you put one server in the rack, the floor space is for all intents and purposes used up, whether the rack has one or 21 servers. Still, measuring the volume of the server seems more logical. Also, SWaP is generally using published ratings for peak loads of servers, when what you want to do is actually use a power meter to measure the electricity used while the benchmark is actually running the test. Finally, if you just count the energy used, you didn't count the energy it takes to remove the heat from the data center. Servers should be penalized fully for the electricity they use and the heat they generate.
Water-Cooling: Back to the Future
This may seem funny, given all of the grief that water-cooled mainframe shops took in the 1980s and 1990s as they were being replaced by air-cooled Unix boxes, but water cooling and some variants using antifreeze are trying to come back in vogue. The heat density problem is so bad in data centers that managers are willing to go back to the future. Why? Because power consumption and heat generation has exploded by a factor of 10 in as many years.
In January, server makers HP and Egenera both announced partnerships to deliver cooling jackets (kinda the inverse of a smoking jacket) for their server racks. And last fall, IBM announced its own rear-door rack heat exchanger, also based on water, for its racks.
HP has partnered with Rittal, a German supplier of industrial enclosures, power distribution, and cooling systems, to deliver the Modular Cooling System, a system that allows racks of servers to plug into existing water-based data center cooling systems. The Modular Cooling System is an air-water heat exchanger that is half a rack wide that you bolt onto the side of an HP server rack. This exchanger sucks the heat out, where it is absorbed by the water and pumped away to an outside chiller. The HP rack is completely closed off, and it takes the hot air from the back of the rack and pumps it to the exchanger and then pumps cool air to the front of the rack, where it can keep the computers cool. The system provides 2,700 cubic feet per minute of cool air, which is distributed along the full height of a 42-server rack. And the Modular Cooling System doesn't just work on ProLiant rack servers and BladeSystem blade servers, but also on HP's Itanium-based Integrity systems and its older HP 9000 and NonStop machines, as well as its StorageWorks storage arrays. Customers that already have water-cooling in their data centers are golden, because this system only costs $28,500 per rack; all you have to do is plug it into your existing water pipes and chiller.
The Modular Cooling System is important because it will allow blade server-level power densities in a data center that was never designed for them. And, because it takes heat directly out of server racks instead of relying on inefficient air cooling systems, companies can spend less money on cooling--as much as 30 percent less. But the savings can be much larger than that. Perez calculates that an 11,000 square foot data center consumes about 3.6 megawatts of power and costs about $16 million to build in a major metropolitan area. With the Modular Cooling System as designed by HP and Rittal, you can cram the same servers into about 3,000 square feet, you don't need a raised-floor, air-cooled environment, and you can chop the cost to about $7.4 million.
Blade server maker Egenera also announced a partnership for cooling, but with commercial and consumer heating and cooling specialist Emerson to provide sophisticated cooling technologies for its new BladeFrame EX machines. Through a partnership with Liebert, a division of Emerson Network Power, Egenera will enable very high blade server densities while actually providing customers with a net savings in electricity costs related to cooling.
This system, called the Liebert XD system, gets rid of the idea of blowing cold air through the raised floor to have it sucked through servers to cool them, and then have that heated air sucked out of the data center and pumped outside. The XD system doesn't use air, but rather is comprised of a pump that interfaces with the outdoor chilling systems and special radiators that you attach to the servers. These radiators attach to the top or sides of the servers, and they are filled with an automobile refrigerant called R134A, which absorbs the heat and is then swiftly pumped out of the data center through a lattice of pipes overhead. This refrigerant is not water, which really messes up computers, but rather a liquid that gasifies when it is exposed to the air. The R134A can absorb heat as it undergoes a phase change, and that absorption means that the volume of this liquid is about one-fifth of what it would be if Liebert used water to chill the servers. The radiators and pipes can be moved around the data center, so as heat islands emerge, data center managers can attack them.
Customers who buy a BladeFrame EX from Egenera can buy a variant of the server called the CoolFrame that embeds two radiators on the back of the BladeFrame EX racks, which can require 20 kilowatts of cooling when fully loaded. Each XD radiator can pull about 9.25 kilowatts each directly off the BladeFrame EX racks and take it directly out of the data center, which means the normal air conditioning in the data center only has to provide about 1.5 kilowatts of cooling. The XD pump hooks into the same outdoor chiller, but because of the efficiency of the XD setup, companies will be able to get by with a smaller chiller and also cut their electricity bill--probably about 22 percent less on the BladeFrame EX.
IBM's "Cool Blue" Rear Door Heat eXchanger is a 4-inch thick water-filled heat exchanger that bolts onto the back of a standard 42U server rack. The door holds about 6 gallons of water and moves between 8 and 10 gallons per minute through the exchanger and back into water chilling systems that have been used for mainframes for decades. It can remove up to 50,000 BTUs of heat from the rack, which works out to somewhere around 12 to 15 kilowatts of dissipated energy. IBM figures this is about 55 percent of the heat in a fully loaded rack, which is not too bad considering that the unit just sits there, capturing heat blown off the servers by their fans.
While these cooling systems are admittedly necessary, it would be hard to call it progress. Computers need to be more power efficient, and we need to focus less on raw computing power and more on computing power that actually gets used. Virtualizing servers and cramming many virtual servers onto a single machine is an important way to drive up utilization and efficiencies. But it is equally important to better design systems and application software--how much spaghetti code that needs to be cleaned up is actually driving all of those hardware sales?--and to better design computers so they run a lot cooler to begin with. The best way to keep a data center cool is to not generate the heat in the first place. Either that, or make data centers into combination compute facilities and fruit dryers, thereby making use of the heat generated. A hot tub outside the data center, heated by blade servers, might be cool, too. So to speak.