Anatomy of a (Successful) Cloud DR Test
March 12, 2013 Alex Woodie
You understand that it’s critical to test your disaster recovery (DR) plan on a regular basis. You also understand that a real disaster will bring surprises, and that practicing DR will increase your company’s chances of overcoming those inevitable challenges. The folks running the VAULT400 cloud backup and recovery service understand this, too. But according to VAULT/400, most companies don’t realize what it actually takes to successfully complete a full DR test, let alone do it in a real disaster.
VAULT400 is an online vaulting and DR service started seven years ago by United Computer Group (UCG), a Cleveland, Ohio-based IBM business partner and IBM i server reseller. Over that time, VAULT400 has signed about 100 IBM i shops to utilize its vaulting solution, which is based on a private-label version of EVault‘s IBM i backup and recovery agent.
VAULT400 offers 12-hour and 24-hour recovery services utilizing IBM i servers in UCG data centers, as well as a 48-hour “quick ship” program for customers who want to recover their systems on-site. VAULT400 also runs Maxava high availability environments for customers who have a recovery time objective (RTO) of less than one hour.
Practicing DR failovers is a big part of VAULT400’s business, says Jim Kandrac, the founder and president of UCG and VAULT400.
“It’s imperative to do DR tests, and to do them properly,” Kandrac tells IT Jungle. “It’s not OK to say, ‘You know what, the data is there, and we restored a couple of files.’ That’s how some people view DR tests. And it’s OK. You can check that box off for compliance. But do you want a hope and a prayer, or do you want something tried and true, that’s valid, and that’s worked? [If it’s the latter], there are a lot of steps to that.”
As a DR service provider, VAULT400 tries to keep it simple for its customers. “There are so many things that go on in a true DR test, but the only two things that a client needs to do in a DR test is: 1) have their encryption key and 2) get third-party license keys, because you’re going to new serial number,” Kandrac says.
Shielding users from the complexity of a high-level DR test is part of the reason a customer would choose a service provider like VAULT400. And while the folks running VAULT400 tests and recoveries don’t want to burden their customers with this complexity, it’s still important for customers to realize the scope of a high-level DR test, including what can possibly go wrong and what kind of work the customer might have to do to make DR really work.
“Everybody thinks it’s real easy,” Kandrac says. “If I have a disaster, just give me an iSeries. Let me load OS/400 and PTFs. Let me load my VAULT400 agent and let me restore my data. It’s hunky dory. OK, that’s fine, we’ll let you do that. But it doesn’t work. Trust me.”
UCG distributes a one-page sheet that lists some of the other steps that must be done as part of a DR restore. User profiles and configurations need to be restored, the VAULT400 agent must be re-registered, and IBM i jobs must be sync-ed with VAULT400. Then the USRLIB and production libraries and the IFS must be restored, which can take up to seven hours. Networking and disks must be properly set up. System values and authorities must be restored.
“The sheet I put together, which is a digest of a typical DR test or recovery, might take six to 12 hours,” Kandrac says. “But that’s actually backed up by a 40-60 page step by step document that we do in the DR test.”
There are many “gotchas” that can surface in a DR test, including:
Due to the thoroughness of VAULT400’s DR tests, it charges customers for them. The first year’s DR test is $2,500, which drops to $1,900 the second year. “It is not just restore and go,” Kandrac says. “Every situation is unique. We roll up our sleeves and work it out. You can’t just sprinkle magic fairy dust and think you’re covered.”