Bank Says ‘HA’ to System i Hardware Failure
August 17, 2016 Alex Woodie
When the New Zealand bank Westpac experienced a rare backplane failure in its System i server last year, it took a critical wholesale trading application completely offline. IBM immediately put a replacement on a plane, but the best case scenario called for three days of downtime. But thanks to its use of a high availability solution from Maxava, not a single transaction was lost.
With more than 13 million customers, Westpac is one of the biggest banks in the Asia Pacific region, and one of Australia’s “big four” banks. The Sydney-based company relies on a variety of applications and servers to help serve consumer and institutional banking customers. Among those are is an IBM i-based banking application called Midas developed by Misys. This application provides back office support for the wholesale money market and securities trading activities of Westpac’s customers.
Cliff McCauley, the technology solutions manager for Westpac in New Zealand, recalls getting the news about the failed backplane in the production System i Model 550 server running Midas that was housed in an IBM data center in the New Zealand city of Auckland. “I was on leave, which was a bit of a problem,” McCauley tells IT Jungle. “I got the call and cleared up the next couple of days.”
The failure occurred on a Thursday, April 30, which meant the system could potentially be down until Sunday, May 3, when IBM was confident it could get its hands on a new backplane, install it, and get the system fired back up. The server only handled trades for customers in New Zealand, but that still put millions of dollars in trades in jeopardy, not to mention the reputation of the company.
“It would have been a disaster,” McCauley says matter-of-factly about the prospect of not having the Midas application for three days. While a separate server houses the front-end trading application that customers see, the failure of the back-office system would have been devastating.
“I don’t know how we would have done it,” he adds. “The GL [general ledger] wouldn’t have had positon information. We would have had to put some manual processes in place and hope for the best, I guess. It’s all the back-office stuff, but it would have been very difficult to manage.”
But thanks to McCauley’s foresight, Westpac didn’t have to go down that road. Soon after acquiring its current System i system nearly a decade ago, the technology manager oversaw the implementation of Maxava’s high availability software to protect this particular application.
Over the years, the company tested the Maxava software and its preparedness by conducting role swaps between the production system and the backup system, which was located in a different data center on the other side of Auckland. Those tests proved valuable when the backplane failure occurred.
“Once we knew there was a problem and nobody could sign on, and in conjunction with IBM and the incident management team, we made the decisions that we would need to flip the switch,” McCauley says. “The decision was made and the process was kicked off. And it was quite quick. We had the DR server available and running and everybody was able to connect and log on and keep going.
There weren’t any surprises during the actual failover, which McCauley says took about 10 minutes to complete. After making a network change to redirect Westpac’s front-end trading system to the backup machine, the system was back up and running, with nary a lost trade.
“There was no real customer impact,” McCauley says. “From an application and data perspective, it was all there and up to date.”
From a failover perspective, everything went about as well as it could. There were no unknowns for Westpac during the failover. All the planning the company did paid dividends during its hour of need. The new part arrived from IBM on time, and the primary System i Model 550 server was back up and running on Sunday. After another role-swap, the production server was ready to support normal business operations on Monday morning.
“Having done the plans, where we knew what we had to do and who had to do what, made the process a lot easier and more reliable,” McCauley says. “We were confident that what we were doing was going to be correct and keep everything ticking along the way that it should be ticking along.”
The only aspect of the failover that could have been improved was out of Westpac’s hands. Part of Midas runs in S/36 emulation mode and uses S/36 files. While the application runs perfectly fine, replicating the S/36 files poses a problem for real-time data replication tools such as Maxava’s. The company has a technique for replicating those files, but it’s not hooked into Maxava’s software. “The utilization of that [S/36] software is becoming less and less every day,” McCauley says. “But unfortunately we still are” using it.
IBM i shops that implement high availability software typically anticipate using it to prevent downtime experienced as a result of natural disasters, such as earthquakes, severe storms, or tornados. Such is the case when using IBM hardware that’s considered to be bullet-proof. But the truth is, most IT disasters are the result of hardware failures. While IBM’s hardware is still world-class, it’s not immune to hardware failures.
Most IBM i shops never need their high availability software. The purchase is considered an insurance premium, a down payment on a recovery that they hope they’ll never need. But in Westpac’s case, the decision to implement high availability software provided a very real return.
“It’s the thing you don’t ever want to see happen,” McCauley says. “You know in the back of your mind that things can do and do happen, but it certainly wasn’t something that we were wanting to happen, that’s for sure.”