Data Loss In A Disaster Exceeds Expectations
June 22, 2015 Dan Burger
Just how big of a mistake is it to do nightly backups? Depends what your definition of big is. If a disaster takes down your system at the end of a working day, but before the scheduled backup occurs, how much data will you lose? You probably don’t have an accurate answer to that question. But if you knew, it might change the way you think about how data loss would affect your business.
Most pros who think about data loss and loss prevention have implemented disaster recovery plans, even though the majority of them have little or no idea how much data their companies collect and or alter each day. That doesn’t deter them from making a plan that calls for back up to tape at the end of each working day and thinking that no matter how much that is, they can live with the risk of losing it.
“We’re finding that, for the most part, even the most efficient and well-resourced organizations just don’t have this information, which does make it very difficult for them to adequately plan for disaster recovery,” wrote Simon O’Sullivan in his blog last week. O’Sullivan has been in the IBM i disaster recovery business for a lot of years. He’s the co-founder and vice president of Maxava, an IBM midrange vendor specializing in business resiliency, whether that’s disaster recovery or high availability.
Putting that information in front of IT and business executives is Simon’s crusade.
It’s become somewhat of an IBM i community rule of thumb that companies set their recovery point objectives (RPOs) and recovery time objectives (RTOs) at 24 hours. This excludes banks, high volume online retailers, and companies that have absolutely no tolerance for downtime. The 24-hour rule seemed reasonable. It felt right. A big part of making that a comfort zone was not knowing how much data was at risk and what the business consequences would be if that amount of data was lost.
Determining the amount of data created each day–actually quantifying it–leads to better disaster recovery decisions, more realistic recovery time and point objectives, O’Sullivan says.
To get a handle on data, Maxava developed a tool that identifies the metrics such as the general size and shape of the data, and some specific numbers like CPWs, memory allocation, the size of the database, and the amount of changes that are taking place on a daily basis, which determines the bandwidth requirements so that changes can be sent “down the line” in real time.
Maxava is doing a lot more work in the cloud these days. That’s because a significant number of customers no longer want what used to be the only choice–to buy a second machine exactly like its primary machine, buy the software that it takes to duplicate data on that secondary machine, and then monitor and manage the system and the data. According to Sullivan, customers are looking for an RTO of one hour instead of one day and an RPO of one minute. What they are asking for is HA in a cloud.
This is happening often enough to convince Maxava to provide IBM i shops with the hardware, the data center, and the management services. Maxava charges the customer a monthly fee for the package deal. IT Jungle reported on Maxava’s high availability cloud six weeks ago.
With just about any IT migration or modernization, it’s a good idea to really understand what you have before making a decision to make a move. Maxava uses its own software that is loaded on the IBM i server. It runs in the background and creates an output file that shows the size and scope of the data. It identifies how much change happens in a 24-hour period.
“The amount of change is always a surprise to the customer,” O’Sullivan says. “Most have no idea about the amount of activity on their systems. Some may have an idea, but we can tell them precisely.”
The “discovery” process is done during a pre-sales stage, so the customer gets this information whether they make a change in their disaster recovery plans or not.
The process does not affect users or their production workloads, O’Sullivan says. Typically, the software is set to run several times a week for 30 to 60 minutes. This provides a snapshot of how the data is being collected and altered.
The output–measurements of the system, not a copy of any sensitive data that is on the system–is owned by the company not by Maxava.
From the snapshot view, the company can decide if it wants to take the next step, which is a 24-hour view of what is taking place with the data. During that phase, comparisons are made of the same day of the week on different weeks and from one day to another. This fills in the blanks and more precisely determines what is changing with the data and how much data is being collected.
O’Sullivan says the discovery process sometimes goes on for a full month and sometimes a customer asks Maxava to return and do a second full month of monitoring for comparison. That allows a company to see how fast its database is growing.
“Some companies take the information and run off saying, ‘thanks very much,’ and we never hear from them again. Others say we are going to have to do something to protect the data we collect every day.”
Gathering the metrics on data that is stored versus data that has changed is important for the planning stage. As the amount of changed data grows, it will take a bigger network pipe to replicate it across the internet. Changes need to be replicated in real time without creating a back log.
Maxava has used the discovery process for some time. Much of it used to involve manual processes, but as the software has been improved to include additional automation, which speeds the process, O’Sullivan says.
“In the IT business understanding risk often does not occur,” O’Sullivan says. “There’s an attitude of ‘good enough’ that permeates without a real look at the contents or the cost of the rebuild. The decision to accept risk is often made in the dark. If it is known and understood what will be lost in a disaster, the discussion changes.”