Servers Get Their First Power and Performance Benchmark
January 7, 2008 Timothy Prickett Morgan
For the first 40 years of the systems and server market, price/performance was basically the only metric that mattered within a given class of machines. But in the 21st century, when power and cooling issues are thwarting the attempts of companies to continue adding computing capacity at the rates they did in years gone by, there will be a new metric that people will start to pay attention to: performance/watt. To that end, the Standard Performance Evaluation Corporation has just release its first benchmark that gauges the performance and power consumption of servers.
You can expect many more such power-sensitive benchmark tests. In fact, it is fair to say that for any benchmark that is commonly used today, power consumption will eventually be part of the basic criteria of the test, much as system specifications are today down to the nuts and bolts of hardware and software and as component pricing ought to be and usually is not.
The first SPEC test that includes power consumption information is called the SPECpower_ssj2008 test. Many of the top names in servers and components participated in the development of the benchmark, including Advanced Micro Devices, Dell, Fujitsu-Siemens, Hewlett-Packard, Intel, IBM, and Sun Microsystems; techies at the University of California at Berkeley, Lawrence Berkeley National Laboratory, and Virginia Polytechnic Institute and State University also had a hand in crafting the new benchmark test as well.
While chip makers and server makers have been jury rigging their own tests to show the performance of a chip or machine as it stacks up against power draw, they tend to pick their own tests and they do not always measure the power consumption in exactly the same way. The SPECpower_ssj2008 test not only gives us a standardized test to measure power consumption at various performance levels, but sets out a methodology by which other benchmark tests can be tweaked to give them a consistent set of power consumption metrics.
“The ability to measure power consumption in a consistent way across multiple server platforms is an important element of the Energy Star program,” explained Andrew Fanara, director of the Energy Star product specifications development team for the U.S. Environmental Protection Agency. The EPA has been spearheading a drive to get servers and other data center equipment certified with something akin to the Energy Star ratings on electronic appliances so customers have some sense of the tradeoffs they are making between device performance and power consumption. “SPEC has taken a critical first step to give server vendors and their customers a standardized benchmark tool that elevates power efficiency in the performance evaluation process.”
Here’s how the SPECpower_ssj2008 test works. The test itself includes a bunch of software, of course, and in this case it is software that runs on the system under test as well as a collection of code that runs on a benchmark control and collection system (CCS) that sits between a power analyzer–a common electronic component for measuring how much power an appliance draws from a power source–and the system under test. The CCS runs a bit of code called a power and temperature daemon, or PTD, that connects to the power analyzer as well as to a temperature sensor in the system under test. This PTD code is written in C++ while the CCS code is written in Java; it has been certified to run on Windows, Linux, and Solaris. The system under test can run any operating system that supports a Java Virtual Machine, that has serial ports, and TCP/IP connectivity.
The workload running on the system under test is called SSJ, and it looks like a subset of the SPECjbb test in that it implements the transactions associated with managing a warehouse and includes new order, payment, order status, delivery, stock level, and customer report transactions. The system is loaded up with the SPECpower_ssj2008 code, and 100 percent of the CPU capacity in the box is dedicated to running the workload for a set amount of time; power consumption is gauged during that interval and operations per second (ops/sec) are counted. Then, CPU capacity is dialed back in increments of 10 percent of total CPU in the system, and the test keeps running. This test does not stress I/O and network subsystems, but CPUs, memory (main and cache), and can be used to differentiate JVMs and just in time (JIT) compiler technology for Java applications to a certain extent.
The final metric in the SPECpower_ssj2008 test is called overall ssj_ops/watt, and this is derived from the average performance over the 11 increments (0 percent through 100 percent) on the test divided by the average power consumed across those 11 increments. Measuring power different performance levels is key, since processor and system architectures throw off different heat as they are stressed or idled–some get very hot by doing a little extra work, some can’t run efficiently (in terms of electricity usage) at low CPU utilization. In any event, results are published based on the number of JVMs running on the server.
So far, a dozen machines have been tested by AMD (one), Dell (one) HP (three), Intel (three), and Fujitsu-Siemens on the SPECpower_ssj2008 test. The results are all over the map and there does not appear to be much of a pattern. Obviously, having the highest number possible for a given class of computer is a good thing, and it is no surprise at all that the best metrics for this are coming from streamlined two-socket machines so far. An HP ProLiant DL160 G5 with two quad-core Xeon E5450 processors running at 3 GHz with 16 GB of main memory and a single 80 GB 7.2K SATA drive was able to max out at 308,022 ops/sec on the test; the average idle power draw on this machine was 160 watts and when running full speed (99.2 percent of CPU capacity), it drew 269 watts. This machine had a 1200 watt power supply, which tells you how useless these maximum draw ratings on power supplies can be in determining power usage. This particular machine was running Windows Server 2003 X64 Enterprise Edition R2 and the JRockit JVM from BEA Systems. It was rated at 698 overall ssj_ops/watt.
Bigger HP machines do more work, as do machines from other vendors, but they throw off more heat. And if you don’t think benchmarks like this are not going to put pressure on server makers to change that as much as is physically possible, boy are you wrong. Of course, I am cynical enough to almost think that server makers might not do as much engineering on their entry boxes to make their bigger iron look better in terms of performance per watt. . . .
Anyway. A two-socket Dell PowerEdge 2950 III machine configured almost identically–except it used 2.83 GHz Xeon E5440 quad-core processors–had a maximum performance of 305,413 ops/sec on the test at 99.7 percent of CPU and came in with a 682 overall ssj_ops/watt.
The best overall ssj_ops/watt that Fujitsu-Siemens put into the field so far is with a single-socket Primergy TX150 S6 server using a 2.4 GHz Xeon X3220 processor. This machine was rated at 143,742 ops/sec on the SPECpower_ssj2008 test, and came in with 667 overall ssj_ops/watt.
Demonstrating how badly AMD needs quad-core “Barcelona” Opterons in the field, a two-socket CX2266-N2 server made by Colfax International using 2.4 GHz Opteron 2216 HE–these are the energy efficient Opterons–did only 95,853 ops/sec on the test running Novell‘s SUSE Linux Enterprise Server 10 SP1 and the JRockit JVM; this machine had 16 GB of DDR2 main memory. This server drew roughly the same power as the Xeon boxes above, but its significantly reduced performance gave it a very low 203 overall ssj_ops/watt. A two-socket server tested by Intel and made by Super Micro using Intel’s dual-core “Woodcrest” Xeon 5160 chips–now two generations behind the new power-efficient “Penryn” Xeons–running at 3.6 GHz, came in at 338 overall ssj_ops/watt, while the same board equipped with low-voltage Xeon L5335 quad-core “Clovertowns” delivered 468 overall ssj_ops/watt.
IBM and Sun have not yet put out benchmark results on their systems. IBM’s previous generations of Power-based machinery are not exactly power-efficient, even if they are powerful (in terms of transaction throughput), but with the Power6 designs, IBM has done a lot of work throttling power consumption. So it will be interesting to see what happens with Power6 iron on this test. And Sun, with all of its talk about how the Sparc T1 and T2 chips are the most efficient processors in their performance class should have been first out with tests, and it will be equally interesting to see how its machinery pans out on the SPECpower_ssj2008 test.
The SPECpower_ssj2008 is available immediately from SPEC for $1,600. Non-profit and educational institutions can get the test source code at a discount, as always.