Power Systems Finally Get Solid State Disks
May 4, 2009 Timothy Prickett Morgan
As readers of The Four Hundred have been anticipating for more than a year, IBM has finally gotten around to adding flash-based solid state disks, or SSDs, to the Power Systems lineup. SSDs are all the rage because of the much higher data transfer rates they offer compared to traditional hard disk drives and the lower prices they have relative to main memory, which is still considerably faster than flash. For this reason, all the cool servers are getting flash-based SSDs.
Flash-based disk drives fill a gap between disk and main memory speeds, much as Level 1, 2, and 3 caches fill gaps between main memory and processor speeds. All of these devices are used to stage data from all the way out on the disk drives as it makes its way to the central processing units to be chewed on, and all of this memory is necessary because, as the old saying goes, the one thing that all processors do, regardless of architecture or prices, and do at exactly the same speed, is wait for data. Right now in computer systems, central processor clock speeds are in the range of 3 GHz, with some lower (I am thinking of Sun Microsystems‘ “Niagara” Sparc T chips) and with others higher (IBM’s Power6 and Power6+, which range from 3.5 GHz to 5 GHz). But generally speaking processors are doing a cycle on the order of tens of nanoseconds or less, while memory is on the order of 100 nanoseconds. (Hence the need for on-chip L1 and L2 caches to buffer main memory and L3 caches to buffer the L1 and L2 caches.)
Disk drives, which feed data into main memory at blazing speeds compared to decades ago, are still four to five orders of magnitude slower than main memory (1 million to 8 million nanoseconds to retrieve a bit of data or store it), even if disk controllers are themselves buffered with big gobs of read and write cache. SSDs fit very nicely into the gap between main memory and disk subsystems, being able to serve up bits of data in around 200,000 nanoseconds using the flash technologies IBM is deploying in its 69 GB flash drive for Power Systems. (There are faster flash-based SSDs, and fatter ones, too. And, there are certainly less expensive ones for the capacity.)
Like magnetic media, flash-based memory can wear out as the bits are flipped inside of memory cells. This fact, plus until recently flash memory was relatively skinny in terms of capacity, has meant that enterprise-class server makers such as IBM have been hesitant to put SSDs into their systems. That’s why IBM is taking a 128 GB SSD and formatting it down to 69 GB; the unused capacity in the flash cells is used as other cells begin to wear, a process called “wear leveling,” thereby extending the life of the SSD. The SSD has about 220 MB/sec of sustained throughput on reads and about 122 MB/sec of sustained throughput on writes, according to IBM documentation I have seen, and can handle about 28,000 I/O operations per second (IOPS) of random transactional processing–the kind done on AS/400s and their progeny. The unit has an average access time for data that ranges from 20 to 120 microseconds, depending on where the data is physically located on the SSD. The SSD uses a 3 Gb/sec SAS interface, and consumes about one-fifth of the power of a 15K RPM disk drive of equivalent capacity. A drive, I might add, that would be lucky to handle around 320 IOPs. In one benchmark test I have seen, IBM shows that to process 135,000 IOPS, an SSD would burn about 300 watts of juice, while a hard disk would consume about 8,300 watts. (Yes, this was probably watt-hours, I am just reading what the chart says.)
To truly make use of SSDs, the operating system has to be tweaked so it can stage data efficiently between disks and memory. This is not as trivial as it sounds, and in most systems, it is not automatic, either. (But it will be soon enough in all major operating systems.) As it turns out, OS/400 has hierarchical storage management built in, as well as a single-level storage architecture that already treats memory and disk as one space and that already knows how to move frequently used data and objects to main memory. So in this case, i 6.1 is more ready to actually utilize SSDs than either AIX or Linux on Power Systems.
But the i platform can take it one step further. System administrators run a trace on a partition or storage ASP during a peak transaction processing time. This trace monitors reads to see what data on the box is “hot,” and then after issuing a command, the hot data is moved to SSDs in the background. Sources at IBM say that eventually i 6.1 will be able to perform this function automatically, always making sure hot data is on the SSDs. (I am trying to dig up some performance metrics on the SSDs now.)
Power Systems machines can use the SSDs as boot devices if they want to, but there are rules about how it can be used in systems. The SSD can plug into the integrated SAS controllers on Power Systems and you can mix SSD and SAS drives on that same controller. However, if you have activated the internal RAID 5 data protection on that internal SAS controller (which you do by plugging a daughter card into the system board), then you cannot mix SAS and SSDs, for obvious reasons: RAID 5 requires all the drives to be the same in terms of capacity and speed since RAID 5 locksteps the drives. If you split the backplane on the Power Systems, you can put SAS on one side and SSDs on the other. In the EXP 12S drawers, which have 12 slots, you can put in up to eight units and they all have to be SSDs; four spots remain empty, and it is my guess that this has nothing to do with capacity or energy consumption but rather the amount of I/O traffic eight of these SSDs can generate, which is saturating the RIO (Fibre Channel) or 12X (InfiniBand) channel links on the systems. In addition to the integrated SAS controllers inside the Power Systems chassis, the SSDs can be attached to the feature 5904 PCI-X SAS RAID controller and the feature 5903 PCI-Express SAS RAID adapter.
On Power 520 and Power 550 machines, which were updated with Power6+ processors last week, the SSDs come in two form factors and they plug into the same disk slots inside the system chassis and I/O expansion drawers as regular disk drives. If you want to put an SSD into a 3.5-inch drive slot, you get feature 1890 or 1909, and if you want to put it into a 2.5-inch slot, you get features 3586 and 3587. The SSD will also be supported in Power 560 and 570 systems as well as the EXP 12S storage drawers and a free-standing drive is also available so it can be mounted into JS23 and JS43 Power-based blade servers. As far as I know, Power 595 servers and JS12 and JS22 blade servers do not yet have support for the SSDs.
The SSD is supported only on i 6.1–sorry i5/OS V5R4 folks–as well as on AIX 5.3 and 6.1. Novell SUSE Linux Enterprise Server 10 SP2 or later and Red Hat Enterprise Linux 4.7 and higher or 5.2 or higher can also use the SSDs on Power Systems.
The SDD feature used on the blades and in the Power 520 and 550 servers costs $10,000 a pop, which is not very cheap at all compared to other flash-based SSD alternatives. A 139 GB (for i) or 146 GB (for AIX or Linux) SAS hard disk runs $498. So SSDs are something that i shops will be using sparingly until the price comes down. By comparison, Sun started shipping a rebadged version of Intel‘s 32 GB X-25E SATA-style flash drive back in March for $1,199 (about twice Intel list price). That X-25E drive is SATA, not SAS, and while it is able to process 35,000 I/Os per second (IOPS) on reads, it can only handle 3,300 IOPs on writes. So the IBM drive is in a lot of ways better–more capacity, wear leveling, better write performance–but the question is, is it worth five times as much money? And the word on the street is that on Power 560, 570, and 595 machines, IBM will be charging $13,235 for the SSDs on these machines. Why more money? I am not certain.
In addition to the new SSDs, IBM last week also launched new disk controllers, disk drive enclosures, and a whole slew of related storage enhancements for the Power Systems i lineup. I will walk you through those in detail in next week’s issue.