Simulated Role Swaps–Maxava’s Secret Weapon
December 9, 2014 Alex Woodie
It may sound trite, but during a disaster you can expect the unexpected. No matter how much planning you put into your disaster preps, chances are good you overlooked something. It’s also true that, without any planning, your chances of surviving the disaster are poor. That’s why high availability software vendor Maxava puts a strong emphasis on testing using its simulated role swap (SRS) functionality.
The SRS feature, which is available only on Maxava’s high-end offering called Enterprise+, allows IBM i customers to test their failover processes to see whether the backup machine is ready to handle the load in the event of a real emergency. The company built the feature into the software to make it easy for customers to test their HA setups, which is something many HA shops fail to do in an adequate and regular manner.
It’s called a “simulated” role swap because production processing doesn’t actually shift over to the backup IBM i machine. The production machine keeps on processing transactions for most applications and users, and the Maxava software keeps replicating the changes to the backup machine using IBM‘s remote journaling. But instead of applying them immediately, as Maxava’s software would normally do, the backup machine stores the changes and only applies them to the database after the test is over.
During an SRS test, an organization will typically have several users log onto the backup machine to make sure everything works. There’s a lot that can go wrong in HA–new fields can be added to the database, applications can change, and user profiles and other critical objects can be missing–so it’s imperative to run tests frequently to maintain a high level of readiness for an actual emergency.
“We encourage our customers to test their role swaps on a regular basis,” says Peter Kania, Maxava technical services and development director. “The whole mantra for high availably and disaster recovery is test and test and test again.”
Maxava encourages customers to test at least twice a year, although not all of them do. “We have customers who do it quite regularly. It’s good practice,” Kania tells IT Jungle “Not only are you able to use the SRS process to test new code and new things that are changing . . . they know that if they have an emergency, that within the last six months, they’ve actually tested this environment, so they know if they cut over that it’s going to be right.”
All HA vendors encourage their customers to test their software. After all, it’s good business (and keeps the HA vendor in the customer’s mind). But not all IBM i shops test their HA or DR setups as much as they should.
Recently a managed service provider (MSP) called Focal Point Solutions Group launched a new offering that uses IBM’s FlashCopy technology on Storwize V3700 SAN arrays to instantaneously make copies of backup LPARs for the purpose of testing the target. While the tests aren’t conducted on the actual machine that would be used during a disaster, it does allow customers to test the replicated environment on a third machine, and to do so at their leisure. You can read more about it in this story: “Startup Looks To Take the Pain Out Of HA Testing”.
While Maxava has plenty of customers who use the software in an MSP model, the company has issues with Focal Point’s approach. According to Kania, the best tests are done on the actual machine that will be used during an emergency.
“If you’re testing, you really want to be testing the role swaps on the actual machine that is going to be the one that you cut over to,” he explains. “It’s all good and well to fire up a machine elsewhere and dump some data on it and see if the core bits and pieces work. But if you’re looking at the interface and the links into the machine and all that kind of good stuff, you need to be doing it on the actual machine that is going to be the one that’s going to be used.”
As the MSP and cloud models continue to gain steam, the notion of what is a “machine” is likely to change. Thanks to technologies like FlashCopy and Live Partition Mobility, IBM i server environments are increasingly fungible. In some instances, these types of virtualization technologies can be used to create higher availability for customers’ workloads. In other cases, however, the added complexity may create a weak link in the chain.
Different approaches are valid for different customer situations. But no matter how a customer is using HA, the need to test it regularly will never go away.