Great Plains Communications Finds Comfort in iTera
June 29, 2004 Alex Woodie
For two years, Great Plains Communications had trouble managing its OS/400 high availability environment. Problems such as chronic out-of-synch conditions, difficulty replicating IFS and spool files, and inaccurate system monitor displays prevented the small Nebraska telecommunications company from practicing roll-swaps and thereby gaining confidence in its high availability setup. When the company finally replaced the software with startup iTera‘s Echo2 product, it found a few rough edges, but overall things improved dramatically.
Great Plains Communications is based in the eastern Nebraska city of Blair, on the banks of the Missouri River. The third-generation family owned company has grown over the years, and today it employs about 200 people and provides a combination of local and long-distance telephone service, Internet access, and cable television service to about 70 communities in rural Nebraska and parts of South Dakota, Kansas, and Colorado.
Since the 1980s, Great Plains Communications has relied on the IBM midrange server (the S/38, the AS/400, and now the iSeries) to run its core billing applications. For years the company used a homegrown service order processing program that required workers to print up to six copies of each order. In 1999, the company decided to significantly reduce its paper use by switching to a packaged application from Quintrex Data Systems that used electronic documents. However, without a paper trail to work from, the company would be exposing itself to a potentially devastating loss of data and transactions if the server that held those electronic documents went offline. This is when the company decided that it needed to fortify its nightly backups by replicating data in real time to a secondary server.
The company investigated the offerings of the “big three” of OS/400 high availability software, DataMirror, Lakeview Technology, and Vision Solutions, and eventually settled on one of them. (To protect the innocent, and the guilty, Guild Companies is not disclosing the name of the vendor.)
The company installed the high availability package in March 1999, says Tim Jensen, a programmer/analyst with Great Plains Communications. “We were very naïve,” he says. “We were persuaded that this was a turnkey solution. ‘Just install the software and–voila!–it does everything for you, and data is replicated to the other system. You don’t need to know very much.’ Boy, were we wrong.”
Great Plains Communications started having problems soon after the initial install of the high availability software, including chronic out-of-synch conditions, poor replication of Integrated File System (IFS) objects and spool files, inaccurate monitoring and reporting of replication information, and an inability to test the role-swap process. The vendor tried to resolve the problems and sent technicians to reinstall the product. Two years after the first implementation, after several weeks of work by the technician, the first roll swap was completed. But there would be little smooth sailing ahead.
Jensen inherited responsibility for the high availability environment in 2000. While Jensen would have weeks that went by smoothly, he spent the majority of his time documenting errors with the high availability setup and trying to get the vendor to work with him to solve them. “I said, ‘I can get this thing figured out.’ But the more I asked, the more I wasn’t getting answers,” Jensen says. “I’m sure they have happy customers. But we weren’t really one of them.”
In 2002, Jensen set out to search for another high availability product. He went to the COMMON conference and met with representatives of the other two members of the big three, and even spoke with their customers. However, the high price tag and the complexity of the systems, along with the fact that some of their customers were unsatisfied, made him hesitant to give it another go with one of the other established players.
Jensen read about a new class of high availability application, based on remote journaling, and eventually found iTera, which had just started actively promoting its Echo2 product and had only a few dozen customers at the time. After talking with iTera engineers over the phone, he decided to fly out to Salt Lake City, Utah, where the company is based, to meet with its leadership. (Jensen also wanted to make sure the startup wasn’t being run out of the founder’s garage. It wasn’t.)
Jensen liked what he saw of Echo2 during a demo. He liked the menu screen, which, he says, was intuitive and much easier to grasp than his company’s existing high availability product, which, he says, “you just have to learn.” Echo2 was also significantly less expensive than other high availability offerings, because of its use of IBM’s remote journaling technology, instead of proprietary methods that had to be developed at great expense. The company decided to install the software in January 2003, since its contract with its existing high availability software supplier didn’t expire until the end of March.
The iTera technicians installed Echo2 in one weekend and stuck around Monday and Tuesday fine-tuning the software and providing training. This was another point of departure from traditional OS/400 high availability software vendors: “iTera didn’t encourage you to take a five-day class,” Jensen says. “The learning curve was pretty quick.” The first roll swap was done in February 2003, less than a month after installing Echo2. The old maintenance contract was allowed to expire.
ROOM TO GROW
While remote journaling has eliminated some of the complexity of OS/400 high availability products (and has reduced the cost), every user’s environment is unique. The bottom line is, high availability is still complicated stuff, and every user needs to figure out what works best for him. In Great Plains Communications’ case, it found Echo2 to be far superior to the previous product in certain areas, and a less-than-perfect solution in others.
The biggest improvement Jensen noticed with Echo2 was a dramatic decrease in out-of-synch conditions. Jensen also found the Echo2 product to be much more resilient in recovering from replication’s little hiccups. For example, with the old high availability product, serious harm would be done to the system if he abruptly ended the replication, he says. But with Echo2 he could go into a subsystem and arbitrarily end several jobs and the product would take it all in stride.
Jensen still runs into errors with distributed data management (DDM) files, and he experiences problems with the replication of IFS and spool files. This is one area where Echo2 is still an immature product, Jensen says, but he notes that the previous product wasn’t any better at replicating these files. Jensen is thrilled to see that iTera keeps developing new audit programs, but finds that these added bells and whistles don’t always work correctly and have the potential to detract from the original simplicity of Echo2’s architecture. The product’s instructions could be improved upon too, he says.
Overall, Jensen says Echo2 is “a good, solid product” that has significantly reduced the amount of time he’s spent managing the high availability environment. “We’re very happy we switched,” he says. “It hasn’t been seamless or without its problems or bugs. Any large piece of software is going to have bugs. It’s not as error-free or as simple as I would like it to be. I’m on the phone or writing e-mails to Salt Lake City more than I would like. But I have a good relationship with them . . . and they’re good people.”
Where the rubber meets the road for a high availability product is its execution of roll swaps. After all, this is why you buy the software in the first place. In Great Plains Communications’ case, Jensen found Echo2 to be much better at roll swaps than the previous product. The company has conducted monthly roll swaps since installing the software. “It’s a weekend commitment every month to do this,” Jensen says. “I run every audit in sight before doing the rollover, and other checks, to make sure the file and library counts are right. It takes a good chunk of time.”
The monthly roll swaps have provided Jensen and company operators with a level of familiarity with and confidence in the Echo2 software and the company’s processes that paid off about two months ago, when Great Plains Communications’ primary iSeries Model 820 suddenly IPL’ed, for no apparent reason. When the iSeries was slow in coming back online, Jensen executed a roll swap. Users were now accessing the secondary iSeries Model 820, which was located about 10 miles away and was connected by a fibre channel link. The amount of downtime was limited to 20 minutes. The primary iSeries was offline for about four hours during what turned out to be a slow day. No transactions were lost.
Jensen still doesn’t know what caused the iSeries to crash. “This is unheard of for an AS/400 to go down,” he says. “These things just don’t happen with the ‘400.” Even in the cozy world of the OS/400 server, the unthinkable sometimes happens.