Cray Revamps Supercomputers with XT5 Designs
Published: November 6, 2007
by Timothy Prickett Morgan
Supercomputer maker Cray moves one step closer to its long-term goal of converging its various high performance computing machines today with the launch of the Linux-based XT5 family of machines. The XT5 and XT5h machines are the fourth generation of massively parallel machines based on the "Red Storm" design that Cray sold to the U.S. government in 2002 as part of the U.S. Department of Energy's ASCI program to use computers to manage the country's stockpile of nuclear weapons and design new ones without detonating actual warheads.
The Red Storm supercomputer, which is installed at Sandia National Laboratory, takes Opteron processors and their HyperTransport links and marries it to a high-bandwidth, low latency interconnect called SeaStar designed by Cray to put thousands and thousands of processors into a single complex. The XT3, which is the first commercialized product based on the Red Storm design, was in volume production in early 2005 and was followed by the XT4 in late 2006 and its upgraded Opterons and SeaStar-2 interconnect.
With the XT5 family of machines, Cray is tweaking the blade-style packaging for its compute and I/O blades; its rival in the HPC market, Silicon Graphics, has also moved to blade packaging for its Itanium and Xeon clusters with recent designs. The XT5 family is still based on Opterons, however, and has to be since Intel does not have anything resembling HyperTransport for the SeaStar interconnect to link into easily. This could, of course, change when Intel's future "Nehalem" Xeon and "Tukwila" Itaniums come to market with Intel's QuickPath interconnect and integrated memory controllers, which have an architecture that is suspiciously parallel to that of the Opteron-HyperTransport scheme.
There are two flavors of the XT5, which are the XT5--based solely on Opteron-based blades--and the XT5h--a hybrid that has blades with either vector processors or field programmable gate arrays (FPGAs) as well as supporting Opteron blades.
The XT5 supercomputers can employ the existing XT4 blades--customers do not have to upgrade to the latest dual-core and quad-core Opteron processors from Advanced Micro Devices to get the new chassis and its improved SeaStar-2+ interconnect. The XT4 blades, which have four Opteron sockets and four SeaStar-2+ interconnect chips, are aimed at workloads where customers need a balance of compute and interconnection bandwidth, while the new XT5 blades have eight processor sockets and four times the main memory sharing the same four SeaStar-2+ interconnects. The XT5 blades are aimed at workloads that are memory intensive or compute intensive, or both, but which do not need to communicate with other blades as much. Customers can mix and match the blades in a single XT5 chassis to get a mix of board styles that matches their particular workloads. The XT5 also supports an SIO blade, which as the name suggests handles I/O requests into and out of the SeaStar-2+ interconnect, linking disk arrays and other peripherals to the machines.
With the XT5h, Cray is throwing in a kicker to its X1 vector processor, which is itself a virtual vector processor called a MultiStreaming Processor that is made up of four vector chips that act like a bigger and more powerful single chip. This blade sporting this new X2 vector processor is the X2 vector blade, which has four vector processors on the blade that link into the SeaStar-2+ interconnect, allowing up to 1,024 vector processors to be linked into a single shared memory system. (These blades can support legacy Cray vector applications.) Each processor on the blade is rated at 25 gigaflops, yielding a vector machine that tops out at 25.6 teraflops. Cray is also integrating FPGAs, which are targeted at specific applications that can be programmed on them, in its XR1 blade. This blade has four SeaStar-2+ chips, two Opteron processors, and four FPGAs from Xilinx on it. According to Jan Silverman, Cray's senior vice president of corporate strategy and business development, this XR1 blade does not implement the hybrid FPGA-Opteron technology that Cray got through its acquisition of Canadian supercomputer maker OctigaBay in March 2004 for $115 million. But you can bet some ideas where heavily borrowed from the machines, which were renamed XD1s by Cray. The XT5h also supports a global address space for vector processing nodes that allows applications written in Unified Parallel C and Co-Array Fortran to run on the boxes. This supplements the Message Passing Interface (MPI) method of parallel processing, which breaks applications and data sets into small chunks and runs them in parallel and which is the only option on the Opteron blades. These new C and Fortran compilers try to mask some of the parallelism in the machine and allow programmers to code more like they would on an SMP box.
The XT5 and XT5h machines run a variant of Novell's SUSE Linux Enterprise Server 10. Cray is also peddling a variant of the open source Lustre file system to serve all of the nodes in the XT5 and XT5h machines.
An XT5 cabinet supports up to 192 Opteron sockets, or a maximum of 768 cores, and using the new "Barcelona" quad-core Opterons from AMD, that works out to about 7 teraflops per cabinet. Silverman says that a typical XT5 cabinet costs around $500,000, with a box with lots of memory and I/O having a price tag north of $1 million.
Two of the things that Cray will be pushing with the XT5 designs, aside from the various processing elements they embody, are density and power efficiency, which are the mantras of all server makers these days. The upgrade to the Red Storm machine at Sandia in 2005 took up 120 cabinets to hit its 43 teraflops performance, but the XT5 machine will be able to do the same task with only six cabinets. That's a factor of 20 reduction in floor space in under three years. None of the blades in the Cray cabinet has a fan, but rather cold air is pulled directly from ducts in the floor and blown up through the 24 compute blades above a single (and large) high-efficiency axial turbofan. (This fan is a lot more reliable than the muffin fans used in servers today, and is also a lot quieter than a zillion of them humming away.) The XT5 cabinet also has a 400/480 volt power distribution unit in the base of the cabinet, which feeds into a bank of modular power supplies that in turn supply power to each blade. The PDU uses the same voltage of power as comes into the data center, which means it does not need to be stepped down, which causes some energy to be wasted and heat to be generated.
AMD Gets Aggressive About Watts with Quad-Core Barcelonas
AMD's Chip Roadmaps: Beyond Barcelona
Cray Blames 2007 Revenue Shortfall on Barcelona Opteron Delays
Cray Announces XT4, XMT Supercomputers
Cray Lands $200 Million Linux-Opteron Super Deal with DOE
Cray Warns Q2 Down Significantly, Affirms Guidance for Year
Cray Gives Pink Slips to 8 Percent of its Workforce
Cray's CTO Plans Its Future Converged Iron
Cray Subcontracts SuSE for "Red Storm" Linux Super Cluster
Post this story to del.icio.us
Post this story to Digg
Post this story to Slashdot
Why File-based System Backup is your Best Bet
File-based, Full System Backups Create Advantages Over Image-based Backups
File-based backups used for system recovery have been around for years. And, until recently, file-based meant a long, painstaking, manual process capable of turning off even the most meticulous system administrator. Image-based backups, then, seemed to solve this problem by eliminating the need to deal with recreating partitions, filesystems, volume groups or other details related to the system's storage configuration. In an image-based restore, the storage configuration and data from the original system are restored as a whole to the new system. While this method produced fast recovery times, Linux administrators began to realize disk image backup was more of an alternative method with its own set of problems and limitations than an answer to the challenges of manual, file-based backup.
Limitations to Disk Image Backup
Since disk image backups make no distinction between files and instead backup the hard drive as a group of sectors, bare-metal recovery can be quick and easy by simply rewriting a duplicate image onto a new, identical disk drive. A fine solution, as long as the old system and new system are indeed identical in types, sizes, locations- basically the exact same hardware. Any differences in hardware, however, could render an image backup unusable.
Many system administrators know first-hand the frustration caused by the inflexibility of image-based backup. "What I hear time and time again from clients is that they switched from image-based backup to file-based because of the limitations they encountered when trying to restore a backup onto different hardware." said Manuel Altamirano, Storix Software Director of Sales and Marketing. "Administrators assume they will have access to identical hardware after a disaster or for migration when the time comes. Unfortunately, so often this is not the case. Companies are left with unplanned, excessive downtime."
Even more advanced disk image backup products, that offer alterations to disk partition tables, still fail to understand more advanced and increasingly common storage configuration tools such as the Logical Volume Manager (LVM) or Software RAID (meta-disks) that also must be altered to match new hard disk configuration before data can be restored. In these cases, users must manually alter and build the configuration, usually through command-line utilities and manual editing of configuration files. This also requires users to have knowledge on how to make a system bootable. Rebuilding a system using a disk image backup requires experienced Linux administrators and could take days, weeks or longer resulting in crippling downtime for an organization.
Advances in File-based Backup
File-based backup tools today can automate the process of recording every aspect of a system separately such as disk, filesystem and boot loader configuration while supporting all popular Linux storage configuration tools (i.e. LVM and Software RAID). This detailed backup information is used to greatly simplify the recovery of a failed system from scratch, even if hardware differences are detected on the new system. Furthermore, systems rebuilt from the ground up using file-based backups often times operate better than the original because there is virtually no fragmentation when the restore is completed.
Flexible recovery based on file-based backup
File-based backup products have the ability to reconfigure disks, partitions, filesystems and other storage solutions to fit onto new hardware. This ability to adapt a backup to fit new hardware or alter the system's storage configuration is called "Adaptable System Recovery" or ASR. Only backup solutions that gather details about the original system have enough information and flexibility to make the ASR process of altering configuration so simple even novice Linux administrators can quickly perform the recovery. Once new configuration is completed, data files from the backup are easily restored onto the new hardware. Finally, the system is made bootable based on the new hardware.
The revolutionary adaptability of ASR found in file-based backup tools creates further added value for system administrators because these products can now be used for far more than just reactive tasks such as disaster recovery.
Applications for ASR:
- Disaster Recovery- restore systems in minutes after a crash, even if hardware is not the same as the original
- Provisioning/cloning- a single backup "golden image" can be used to provision different systems, even if disks, adapters or other elements are not the same.
- Storage software migration- change configuration on the same system for improved performance and availability
- Hardware migration- install the same system onto newer or virtual systems
New system backup management features
Products using file-based system backup have not neglected to consider a system administrator's daily backup responsibilities. These products now incorporate functionality for backup management as well as some of the most advanced features seen in backup and recovery solutions for Linux and AIX. Some advanced features designed to simplify daily backup management for system administrators include:
- Graphical, Web and Command line interfaces
- Local and remote backups to disk or tape devices
- Sequential and random tape autoloader support
- Support for SAN storage solutions
- Tivoli Storage Manager integration
- Oracle database backup support
- Backup data encryption
- Multiple compression levels
File-based Backup Solutions Provide Most Bang for the Buck
Inexpensive products exist that combine both file-based backup management and ASR in one program. Look for a file-based system backup product with advanced features like those mentioned above. In turn, regular backup responsibilities such as automatically verifying backups and encrypting backup data will become much easier. Additionally, combined ASR capabilities greatly reduce downtime and required expertise for both reactive (even bare metal) and proactive recovery projects. File-based system backup and recovery solutions are an economical and more comprehensive option than their image-based counterparts.
About the Author
Anne Stobaugh is an independent contractor working with Storix Software to educate Linux and AIX users on the advantages of file-based backup and recovery solutions.
Editor: Timothy Prickett Morgan
Contributing Editors: Dan Burger, Joe Hertvik, Kevin Vandever,
Shannon O'Donnell, Victor Rozek, Hesh Wiener, Alex Woodie
Publisher and Advertising Director: Jenny Thomas
Advertising Sales Representative: Kim Reed
IT Jungle Store Top Book Picks
The System i Pocket RPG & RPG IV Guide: List Price, $69.95
The iSeries Pocket Database Guide: List Price, $59.00
The iSeries Pocket Developers' Guide: List Price, $59.00
The iSeries Pocket SQL Guide: List Price, $59.00
The iSeries Pocket Query Guide: List Price, $49.00
The iSeries Pocket WebFacing Primer: List Price, $39.00
Migrating to WebSphere Express for iSeries: List Price, $49.00
iSeries Express Web Implementer's Guide: List Price, $59.00
Getting Started with WebSphere Development Studio for iSeries: List Price, $79.95
Getting Started With WebSphere Development Studio Client for iSeries: List Price, $89.00
Getting Started with WebSphere Express for iSeries: List Price, $49.00
WebFacing Application Design and Development Guide: List Price, $55.00
Can the AS/400 Survive IBM?: List Price, $49.00
The All-Everything Machine: List Price, $29.95
Chip Wars: List Price, $29.95
November 3, 2007: Volume 9, Number 44
October 27, 2007: Volume 9, Number 43
October 20, 2007: Volume 9, Number 42
October 13, 2007: Volume 9, Number 41
October 6, 2007: Volume 9, Number 40
September 29, 2007: Volume 9, Number 39