|
||||||||
|
|
![]() |
|
|
Future iSeries Servers, Part 3 by Timothy Prickett Morgan Over the past few weeks, I've been giving you a sneak peek at some of the processor and packaging technologies that IBM is developing for the iSeries line over the next few years. This week, I want to talk about the system features that IBM is putting into the iSeries that will give it essentially the same reliability, availability, and serviceability as the S/390 and zSeries mainframes. I also want to tell you a little about the roadmap for I/O subsystems in the iSeries. Increasing iSeries Reliability The iSeries line has long since had redundant power supplies and fans, and, like other midrange servers, supports hot plugging of disk drives and a number of other system components. If these components fail, you yank them out of the machine and add a replacement unit, and OS/400 just keeps on running. As the new component comes online, OS/400 adds it back into the system configuration, and the state of the system is the same as it was before the failure. There is more to creating reliable servers than hot plug components, however. What a server really needs is a combination of redundant and hot plug components, which give two points of failure and easy replacement when one component fails, so that single point of failure doesn't leave the server exposed for a long period of time to a complete failure. When redundant components both fail, a server crashes, and it crashes hard. The engineers at IBM's Rochester Labs have been hard at work trying to reduce what they call high-impact outages, as well as repair actions. These are distinct but related failures that affect the overall availability of the iSeries. High-impact outages are when a processor, a logical partition, or a whole system fails. Catastrophically. These are the equivalent of the blue screen of death on a Windows box, which most of us are familiar with on our desktop PCs. High-impact outages can be caused by cosmic rays, glitches in software, and other aspects of the machine that are above (but affected by) the physical state of the server. High-impact outages are those that are not compensated for by other aspects of the machine. For instance, with RAID 5 data protection, a disk-drive failure does not cause a machine to crash. It rides out the failure and rebuilds lost data on the fly, using a complex algorithm that can recreate that lost data based on parity information stored on other disks in the array. A disk failure is not a high-impact failure; losing a processor in a machine with only one processor and no extra capacity-on-demand processors available is. While there are very few processor failures in the AS/400 and iSeries line, it does happen. And neither IBM nor its OS/400 customers like it when it does happen. Repair actions are upsetting, but not as dramatic. They are required when a physical component actually fails, as IBM's iSeries disk drives were doing in late 2001 and early 2002. Repair actions usually require an IBM engineer to go into the field to swap out a bad component, or for intrepid customers to replace bad components themselves. If you plot a chart that stacks up high-impact outages and repair actions, there isn't a lot you can do about the number of repair actions. Components fail. That is why redundancy and hot swap components are important. You don't want a component failure to make it a high-impact outage that kills the machine, but rather an event that triggers a repair action. To that end, IBM will, according to sources familiar with its plans, be delivering the ability to hot swap whole computing nodes, central electronic complexes, and processor multichip modules (MCMs). Main memory is also going to be hot-plug- and field-replaceable at the DIMM level. That's the easiest way to convert what would be a high-impact outage to a repair action. But with the Power series of servers, starting with the Apache machines six years ago, IBM has been adding RAS features to the AS/400 and iSeries server lines that pull ideas and electronics right out of the S/390 and zSeries line. These aim to eliminate high-impact outages altogether. It is not easy to quantify the improvements IBM has made, but sources say that the "Condor" S-Star line of iSeries machines had about one third of the high-impact outages of the "Blackbird" Northstar line that preceded it. The "Regatta" Power4 servers had about one third of the high-impact outages of the S-Star machines, mainly because the Power4 architecture has error-detection and correction algorithms at all the different staging points from disk to processor instruction stream inside the core, which allows it to recover from faults that would otherwise crash the system. With the Power5 generation of iSeries machines, the iSeries will have dynamic thermal and power management, processor de-allocation and error recovery, alternate data paths, and other features that will once again significantly cut back on the number of high-impact outages that customers experience. My sources tell me that, by the time the Power6 generation comes out in 2006, the RAS features of the iSeries and pSeries line will match that of the zSeries. Depending on what you believe about IBM's eServer convergence plans and the so-called "Project ECLipz" initiative that IBM is rumored to have started in order to converge the iSeries, pSeries, and zSeries lines, with the Power6 generation of servers, there may be a very good reason for that. The I/O Roadmap The information I have is a bit sketchy right now, but you can expect that Team Rochester will be adopting future I/O and storage technologies as they move into the mainstream in order to keep the iSeries competitive with alternative platforms. There are lots of different I/O technologies that the iSeries can exploit, and IBM may change its mind if the market shifts. For instance, a few years ago, it looked like InfiniBand was going to be the replacement for Fibre Channel for connecting peripherals to servers and servers to servers, but peppier versions of PCI and Ethernet technology are going to take hold in the market, and IBM is going to exploit these for the iSeries. In the prior generations of AS/400 and iSeries machines, IBM used its own Self-Timed Interface (STI) backplane and Remote I/O (RIO) I/O bus technologies to attach disk and tape subsystems to the server. The STI backplane was developed for IBM's S/390 mainframes and adapted for use in the big SMP versions of the AS/400. IBM has apparently also adapted the "Hydra" adapter scheme from its S/390 and zSeries mainframes to work on iSeries buses. (I'm a little iffy on this, because no one wants to talk about this.) Moving out from the central electronics complex, IBM employed parallel SCSI devices such as disk and tape drives (Ultra160 SCSI devices were the fastest parallel devices IBM sold in the prior iSeries generation). IBM also used to single source its disks from its own Technology Group (which was devastating when the disks were bad, and eventually forced IBM to sell its disk business off to Hitachi last year). The best disks IBM had to offer spun at 10K RPM. On the network front, prior AS/400 and iSeries machines used Token-Ring, ATM, and 10/100 Mbit Ethernet links, and 1 Gbit Fibre Channel was available to hook AS/400 and iSeries servers to storage area networks (SANs). IBM supported SPD and PCI peripherals in its I/O racks and within the servers. The current iSeries line is still using an improved STI backplane for linking processors to each other and to I/O and a revamped RIO-2 storage packaging for disks and tapes. IBM has delivered PCI-X adapter cards, which offer a lot more bandwidth than PCI adapters. IBM is apparently using some Hydra-2 adapter technology derived from the zSeries mainframes for certain I/O adapters. (I have no idea which ones, but I have a feeling it has something to do with the Integrated xSeries Adapter card and the High Speed Link.) IBM is still using parallel SCSI peripherals, but has moved up to Ultra320 devices (while maintaining backward compatibility with Ultra160, of course). IBM is now using 10K and 15K RPM disk drives, sourced from two vendors, Seagate and Hitachi. These are still 3.5-inch form factor drives, which IBM has been using since the 1990s. IBM has also rolled out 1 Gbit Ethernet networking and 2 Gbit Fibre Channel SAN connectivity, too. All of these I/O improvements have meant that the iSeries has remained a balanced system as processor speeds have increased. In the future--which could be any time over the next few years--IBM has a whole new set of backplane and I/O adapter technologies coming out, probably starting with the Power5 generation and rolling into the Power6 generation. IBM has made no secret that it likes the InfiniBand switched mesh architecture, not only because it has high bandwidth (10 Gbit/sec and 30 Gbit/sec on the roadmap) and low latency, but also because it allows memory-to-memory interconnections between processor nodes in a cluster or SMP. (In effect, InfiniBand turns a cluster into an SMP, with the latencies as low as they are.) InfiniBand looks like a decent replacement for RIO-2, but it is unclear if this is what IBM will do with the iSeries. IBM will also champion PCI-X 2.0 and PCI Express I/O in future iSeries machines. The improved PCI-X 2.0, also sometimes called PCI-DDR (for double data rate), will be a parallel architecture that runs at 266 MHz and delivers more than 2 GB/sec of I/O bandwidth. The bus speed for PCI-X 2.0 will eventually be cranked up to 533 MHz, and bandwidth will top 4 GB/sec. PCI-X 1.0 devices are aimed at the 1 Gbit/sec and 2 Gbit generation of networking devices, and PCI-X 2.0 devices are really shooting to balance the 10 Gbit devices, such as InfiniBand and 10G Ethernet. IBM will also endorse the PCI Express serial alternative to PCI-X. PCI Express will offer anywhere from 250 MB/sec to 4 GB/sec of I/O bandwidth over serial links running at anywhere from 2.5 GHz to 6.25 GHz (the former is the 1.0 specification; the latter is the 2.0 specification). PCI-X is for multidrop bus or point-to-point links, while PCI Express offers a memory mapped switch fabric, with traffic classes and virtual channels over the fabric. PCI Express is a more efficient superset of PCI, but it will probably take longer to come to market. There are also murmurings at Rochester that entry iSeries machines could support iSCSI links to I/O devices, particularly network-attached storage. It looks like IBM will be endorsing Serial ATA and Serial SCSI storage devices, and could be migrating to smaller form factor disk drives. (Hitachi and Toshiba are shipping 7.2K RPM disk drives with 40 GB to 60 GB of capacity, and it won't be long before they can hit 10K RPM and perhaps 100 GB of capacity. At this point, given enough cache memory, you can build a clever, small, and fast disk subsystem from what are essentially laptop disk drives. On the networking front, IBM seems poised to endorse 2.5 Gbit and 10 Gbit Ethernet for networking (with an eye toward faster 40 Gbit and 100 Gbit links), and is readying 10 Gbit Fibre Channel support for SAN and NAS storage. As you can see, IBM has lots of things besides Power processors in store for the iSeries. Other Articles in This Series
|
Editor
Contact the Editors |
| Copyright © 1996-2008 Guild Companies, Inc. All Rights Reserved. |