SPEC Creates New CPU2006 Benchmark Tests
Published: August 31, 2006
by Timothy Prickett Morgan
Like the computers that they stress and decent beer, benchmark tests have a shelf life. As desktop and server architectures evolve and the applications that we all use change with the times, a benchmark test that was appropriate for gauging performance is no longer as useful because it cannot stress a system that has more inherent capacity--clock cycles, cache and main memory, I/O bandwidth, and so forth--in the same way as it could a machine with a much leaner architecture. That's why the Standard Performance Evaluation Corporation, or SPEC, has just released a new series of benchmark tests.
The new SPEC CPU2006 suite of benchmarks is a kicker to the CPU2000 suite, which vendors have been using for the past six years to reckon the integer and floating-point performance of particular processors and the machines that employ them. The SPEC tests, generally speaking, have not just measured a processor's performance. It is the combination of a CPU, its related memory architecture and its implementation in a chipset and motherboard, and the C or C++ compilers used to turn the SPEC source code into benchmark applications for a specific machine that the benchmarks really test.
There is no way to separate these three elements of a machine, of course, but sometimes people think of a benchmark test as only measuring the performance of a processor, as if the other two elements didn't matter. As the past CPU series of benchmark tests have shown since they were first announced in 1989, the choice of a compiler can make a big difference in performance. This is why Intel, IBM, Sun Microsystems, and others still continue to invest in their own compilers. If you know your hardware inside and out, you can create a compiler to take advantage of it.
That's why with CPU2006, SPEC is providing open source GCC C, C++, and Fortran compilers and a toolset to compile the benchmark code. Vendors don't have to use this toolset, but it would be a good thing if they did, and then used their own compilers. GCC would provide a universal baseline. Vendors that also sell compilers might have been tempted in years gone by to put optimizations in their code and compilers that enhanced the performance of their systems on the SPEC and other benchmark tests. (No, really? Are we really that cynical? Yes, Virginia, we are.) Vendors have to also provide a list of the optimizations they used in their compilers when they run the test. The CPU2006 tests, like the prior CPU2000 tests, do allow vendors to compile the code in the benchmark suite with a single set of optimizations--which is called a base setup--or to do individual optimizations on each snippet in the suite--which is called a peak setup.
As with the past CPU2000 tests, the CPU2006 tests come in two flavors, each with two variants. One flavor of test measures the integer performance of the system under test--that's CINT2006--while the other measures the floating-point math performance of the system--that's CFP2006. For each type of test--integer or floating point--there is one test that measures speed--how quickly the system can complete a single task--and another that measures throughput--how much work a system can do running multiple tasks within a given timeframe. The speed tests are SPECint2006 and SPECfp2006, while the throughput tests are SPECint_rate2006 and SPECfp_rate2006. These are the same naming conventions and style of benchmark codes that were used with the SPEC CPU2000 tests.
However, the CPU2000 and CPU2006 benchmark tests are very different animals. They use different code, in fact, which means that you cannot compare results between the two benchmark tests, except in a very general way. (For instance, you would expect a machine that does well on the CPU2000 test to do comparatively well, gauged against other systems, on the CPU2006 tests. But that is about all you can say, since different code stresses different systems differently.)
To come up with the code behind the CPU2006 suite, SPEC called on interested parties in the IT community to submit code and data segments that would be rolled into the test. SPEC paid $5,000 for the code and datasets it used, and ensured that the code was portable across Unix, Linux, and Windows architectures before including it in the set. The CINT2006 suit includes 12 application snippets written in either C or C++, while the CFP2006 suite includes 17 application snippets written in C, C++, Fortran, and a mix of C and Fortran.
Those interested in testing their own machines using the SPEC CPU2006 benchmarks can do so. You can buy the suite of tests, including a compiler, for $800. Those who have licensed the CPU2000 test can upgrade to the new benchmark suite for $400. Educational institutions can get the tests for $200. You need a Unix, Linux, or Windows system, a DVD drive, 1 GB to 2 GB of main memory, and 8 GB of free disk space on a machine to run the CPU2006 tests. SPEC does not require a review of the benchmark tests that anyone runs, and you do not have to publish your results on the SPEC Web site.
So far, nine companies have done a total of 70 runs on the four CPU2006 tests. (You can see a listing of the SPEC CPU2006 results by clicking here.) Instead of providing a metric that measures instructions processed or some other quantity, the CPUINT2006 tests all provide a relative benchmark rating as gauged against the performance of a Sun Ultra Enterprise 2 server using two 296 MHz UltraSparc-II processors with 2 GB of main memory and two 36 GB SCSI disks. This machine, which was first sold in 1997, was configured with the Solaris 10 3/05 update. Only one of the processors was used in the test. This machine is given a SPECint_base2006 rating of 1.00.
By comparison, a Sun X4200 "Galaxy" server, configured with a 3 GHz, single-core Opteron 256 processor and 8 GB of main memory had a SPECint_base2006 rating of 11.6, and a SPECint_peak2006 rating of 13.3. This Sun box used the Solaris 10 1/06 Update and the Sun Studio 11 compilers. A Hewlett-Packard DL380 with two single-core 3.8 GHz Xeon DP processors and 8 GB of main memory had a SPECint_base2006 rating of 11.4. A Fujitsu-Siemens Celsius workstation with a single 3 GHz Opteron 256 and 2 GB of memory, running SUSE Linux Enterprise Server 9 and QLogic's PathScale compilers had the highest integer rating, with a SPECint_base2006 of 11.9 and a SPECint_peak2006 of 13.3.
On the SPECint_rate2006 test, a Sun Fire E25K server using 72 dual-core 1.5 GHz UltraSparc-IV+ processors and equipped with 304 GB of memory had a SPECint_rate_base2006 rating of 759, and with peak optimizations, Sun boosted the performance to 904. This machine was configured with the Solaris 10 6/06 Update and the Sun Studio 11 compilers. An HP ProLiant 585 with four single-core Opteron 854 processors running at 2.8 GHz running SUSE Linux and using the PathScale compilers had a SPECint_rate_base2006 rating of 41.4, and optimizations boosted that to 46.9.
The spread on the floating-point variant of the tests was not significantly different, since the machines tested were generally one- or two-socket machines using roughly similar X64 processors.
It will take some time before the CPU2006 tests are useful. HP Itanium machines and IBM's Power machines are conspicuous in their absence in the rankings so far. SPEC says it will phase out the CPU2000 tests in the next six months. So these and other machines will undoubtedly be put through the CPU2006 paces.