Testing: The Secret Weapon Of DR Success
December 2, 2020 Alex Woodie
Just as every company should carry liability insurance, every company should have a disaster recovery (DR) plan. Without a plan, the odds of recovering from a disaster are meager, at best. But just having a plan isn’t good enough, either — the company must test the plan to gauge whether it will actually work.
Whether your software runs in a server housed at your headquarters or in a giant rack in a public cloud across the state, it’s a critical element for smooth operations. While this information technology is usually reliable — and if you’re running IBM i, it’s almost always reliable — there are occasions when the business applications are not available.
In these situations when IT stops working as it should, businesses must have a backup plan. It could be a catastrophic server failure (almost unheard of in the IBM i world) or a misconfigured system setting (much more likely). Networks also go down, which is a risk in our uber-connected world.
Running in the public cloud brings its own set of risks. Amazon Web Services, the world’s largest cloud that brings in about $46 billion per year, experienced a prolonged six-hour outage last week that brought down large chunks of the Internet. The cause, reportedly, was a blown attempt to add capacity to its Kinesis data messaging service (whoops).
Another frequent cause of IT outages are disasters served up by Mother Nature. North America is blessed with abundant natural resources and amazing landscapes, but its weather is not for the faint of heart. In fact, you’d be hard pressed to find another chunk of land the size of the United States that has the same amount of extreme weather, according to meteorologists.
Having good backups of one’s data and a way to get to them, then, becomes essential. Depending on one’s tolerance for downtime, there are different setups that make sense. Performing regular tape backups, powered by BRMS, Robot/SAVE, or IBM i save commands, remains the go-to method for ensuring an IBM i shop can recover its data following a disaster. However, disk-based backups and virtual tape libraries are becoming more popular on IBM i. Some of these options also provide cloud-based storage.
For businesses with less appetite for downtime, a real-time data replication solution that can move data to another server located across the continent on a synchronous or asynchronous basis may be in order. There are numerous high availability products available on the market, ranging from the MIMIX and iTera HA offerings from Precisely (formerly Vision Solutions) to the PowerHA offerings from IBM.
IBM i shops with the lowest tolerance for downtime would be inclined to use the Db2 Mirror offering, which allows two Db2 for i databases to be connected to the same IBM i server in a campus setting. While this continuous availability setup can virtually eliminate planned and unplanned downtime, Db2 Mirror does not provide any protection from a disaster, as it’s limited to running in a campus setting (IBM is currently looking to expand Db2 Mirror’s distance limit).
No matter which DR approach an IBM i shop chooses, they must practice the recovery to be sure that it will work. Unfortunately, too many IBM i shops do not take the time to practice their DR plans. It’s understandable, given the time and resource pressures on modern IT shops today, but it’s short-sighted, nonetheless.
IBM i customers who sign up with Recovery Point Systems, a full-service DR provider based in Maryland that competes with Sungard AS and IBM Business Continuity and Recovery Services (BCRS), are actively encouraged by the vendor to practice their recovery plan multiple times per year.
“We’re kind of a PITA vendor,” says Recovery Point’s COO Robert Hicks, referring to the phrase “pain in the [buttocks region].” “We’re constantly calling, ‘Can we schedule your test? Can we schedule your test? Can we schedule your test?’ We’re always on it.”
While DR plans are likely good when they’re written down, there are myriad ways that those plans may not pan out when the city is on fire or a hurricane has wiped out half of South Florida.
Server configurations change, libraries get added and deleted, and user profiles are moved around. Processes can be put in place to stay on top of these changes, but without an actual test, there’s no way to really know if it will actually work.
“We welcome the opportunity to show you it works,” Hicks says. “In our mind it’s the ultimate way to prove that the money they’re spending with us is worthwhile.”