Don’t Overlook Hardware-Based High Availability Alternatives
June 26, 2007 Alex Woodie
If you are a System i shop in the market for a high availability solution, you may want to consider a hardware-based solution. IBM offers several options, including solutions based on i5/OS clustering, iASPs, FlashCopy, the GlobalMirror and MetroMirror offering, and the System i Copy Services toolkit. While an external high availability setup may cost more than a software-based replication solution, it has the potential to be very reliable, as one former user of high availability software found out.
Complexity and performance have been two of the biggest drawbacks commonly cited by users of OS/400 and i5/OS high availability solutions. While the technology has proved itself during disasters for some users, other organizations report difficulties in getting the software to work properly in their environment. Crete Carrier, a trucking company based in Lincoln, Nebraska, was one of these users.
“We tried for years to make [a high availability software product] work, and work religiously every time, and we just couldn’t do it,” says Ron Edwards, who works in Crete’s IT department.
The biggest problem had to do with keeping the files in synch between their two iSeries servers. The high availability software vendor recommended that Crete run synch checks across the systems, but due to the size of Crete’s environment, the check could only be run across 20 percent of its data. “It couldn’t get it finished in one day’s time,” Edwards says. “If on Friday evening we decided we needed to do a role swap, if we were in synch at the beginning of week, we could be out of synch by end of week. That was really a mess.”
Crete brought in experts from the high availability vendor, but they didn’t succeed in fixing the problem. When Crete upgraded to a new release of the vendor’s product that was based on remote journaling, it didn’t help. In fact, things seemed to get worse. “We tried our best. We worked for seven or eight years to try to make this work, but were just not able to,” Edwards says.
Edwards tried talking with other companies of similar size that had installed that vendor’s high availability solution. “What I found time and again with the references was, all the companies said how well [the product] was working,” he says. “I asked them if they perform regular role swaps, most of them would say, yeah, normally. Then I tried to nail it down. I asked them ‘Did you actually put users on it, did you say, swap for a week or a month?’ Without fail, with virtually 100 percent of responses we got back, they said they did a role swap, verified all the files were there, then did a role swap back,” which didn’t prove anything, Edwards says.
Edwards was going to try starting fresh with another HA vendor, when he heard about hardware-based replication from his hardware vendor, Sirius Computer Solutions. “One day we had them in, and I said, ‘This is ridiculous, there needs to be a hardware-based solution for this. I know it exists in the mainframe world.'”
The Sirius representative suggested that Edwards speak with Selwyn Dickey, an IBMer based in Rochester who had developed a utility for managing hardware-based replication from the System i, called the Copy Services for System i Toolkit. “I talked with him for two hours on the phone,” Edwards said. “He laughed and said, ‘not only do we have a solution, but it works.'”
Hardware-Based Replication Offerings
IBM’s hardware-based replication solutions fall into two main camps, including those that use the System i’s internal storage, and those that use external storage, namely the TotalStorage DS8000 series of storage area networks (SANs).
On the internal storage front, IBM offers cross-site mirroring (XSM). This technology leverages i5/OS clustering and independent auxiliary storage pool (iASP) to duplicate the contents of one iASP into a second iASP, which can be located just about anywhere.
On the external front, IBM offers the MetroMirror and GlobalMirror solutions, which are offerings based on IBM’s peer-to-peer remote copy (PPRC) technology. GlobalMirror is an asynchronous replication technology that ensures anything written to an iASP on a source disk is replicated to a second iASP on the target disk. MetroMirror works similarly for open-systems environments, except that it’s a synchronous technology that writes to the source and target disks before the write is acknowledged to the application. Also falling into the external camp is IBM’s FlashCopy services, which uses advanced “smoke and mirrors” technology to make the contents of an iASP available in a matter of seconds. FlashCopy is used to reduce the downtime associated with backups.
GlobalMirror is similar in many respects to XSM, with the main difference being that GlobalMirror runs in the AIX system running Power5-powered DS8000 SANs, while XSM is implemented in i5/OS, or, more precisely, below the operating system level in the SLIC layer of code on the System i.
The Copy Services for System i Toolkit that Dickey developed allowed administrators to setup and configure GlobalMirror form the System i also includes a command that is used to institute role swaps. However, some elements of a role swap, such as managing user profiles, are managed through i5/OS clustering.
Hardware-based high availability solutions such as GlobalMirror and FlashCopy are marked by their simplicity, says Tim Klubertanz, a System i and storage HA consultant with IBM in Rochester and a co-worker of Dickey. “Anything they put in the iASP is automatically replicated,” he says. “It’s automatically done for them, at a storage subsystem level. The iSeries can fail or be turned off altogether and we’re going to continue to do replication on those disk units.”
Implementation at Crete Carrier
After 10 years of struggling with a logical replication solution, Crete Carrier found that a hardware-based replication solution was the answer to their high availability needs.
The company is using GlobalMirror, MetroMirror, iASPs, FlashCopy, and i5/OS clustering to replicate their DB2/400 and Windows data between two data centers that are 60 miles apart. At each data center resides a System i Model 530, a Windows-based BladeCenter system, and a DS8100 SAN. Each SAN holds about 17 TB of data.
“The solution that IBM and Sirius came up for us worked out really well,” Edwards says. “It’s 100 percent accurate. Not a single issue. Nothing even remotely incorrect. It just runs.”
The company regularly performs role swaps, and Edwards estimates they take between 15 and 20 minutes, after all the users have been logged off. Latency in the i5/OS environment is about one-and-a-half seconds.
Edwards used to assign one person to monitor the high availability environment, and that task usually took up the better part of his entire work day. Not anymore. “Nowadays, time spent on that is virtually nonexistent,” he says. “Files can’t be out of synch. They have to remain in synch. It’s a guaranteed thing. When I go to bed at night, that’s not something that I worry about.”
There are trade-offs to a hardware-based high availability setup, however. For starters, it requires users to learn how to configure and manage iASPs. “As far as running an iSeries with an independent ASP, that does take some getting used to,” Edwards says. “There are learning curves associated with it.”
An external hardware-based high availability option also requires the implementation of IBM TotalStorage SAN devices, which are not inexpensive. But in Crete Carrier’s case, Edwards thinks the trucking company might have been better off going the SAN route from the get-go. “If you put them side by side, the hardware-based solution probably cost more than [the software-based high availability solution],” he says. “But when you consider the amount of money spent putzing around with [the software-based high availability solution], we could have saved a lot of money.”
This article has been corrected. The latency in Crete Carrier’s hardware-based high availability solution is about one-and-a-half seconds, not one-and-a-half minutes. Also, FlashCopy only works with external disk, not internal disk. IT Jungle regrets the errors.