RevSoft Leans On ‘Shadowing’ To Ensure Speedy Job Scheduler Failovers
March 24, 2021 Alex Woodie
There are several things that can prevent a high availability failover from executing quickly on an IBM i server, including data synchronization issues, communication failures, and other problems. With the “shadowing” feature in its job scheduler, RevSoft is determined that issues with synching job schedulers do not make that list.
Jobs demand special treatment in the IBM i high availability world. Care must be taken to ensure that active jobs, including jobs that are in the job queue as well as jobs that are scheduled to run in the job scheduler, are replicated from the primary server to the backup server, so the customer can resume production on the backup server as soon as the failover is complete.
There are several ways to do this. For active jobs that are already in the job queue, which is not covered by IBM journaling, a company called Shield Advanced Solutions developed a tool called JT4i (formerly JobQGenie) to track active jobs in the job queue and make sure they’re ready to go on the failover server.
But when it comes to the job scheduler, it is common for customers to try to use their high availability software to replicate the job scheduler. However, that does not always work well. In fact, it’s not uncommon to hear about failovers taking an hour or longer, says John Massey, the managing director of RevSoft, which develops the RevScheduler, among other products.
“What other people will do is, they have a scheduler on the primary server, and then let [a popular high availability] to replicate it to another library on the failover server,” Massey says. “When the fertilizer hits the ventilator and the primary is gone, you’ve got to stop the scheduler on the secondary server — and there’s going to be locks on tables and other things like. You’ve got to rename that to something else. Then you’ve got to rename the one that was being replicated over back to be the live one. Then you start it up.”
RevSoft eliminates all of that extra work by handling the replication of the job schedule by itself. It does this through its own “shadowing” feature in its RevScheduler, which is a cross-platform job scheduler that runs on Windows, Linux, and Unix, and IBM i environments.
According to Massey, the shadowing feature works automatically to keep jobs synchronized across all installations of RevScheduler. “Every time I add, update, or change an event, it’s automatically shadowed,” he says. “We call it shadowing so it doesn’t get mixed up with MIMIX or things like that.”
The simplicity of the shadowing approach is what makes it fast and resilient, Massey explains.
“What happens is, under the covers, there’s basically a flag in the table that says ‘Am I active? Yes or no?’ and on the failover server, it’s no,” he says. “All the jobs on the failover server are sitting there in a dormant state because they’re not active. The customer says, ‘Hey we got to cutover.’ You run the cutover command, it turn those 200 jobs into active jobs, which means they can run now.”
One of its large American IBM i customers runs a monthly failover to ensure that its HA setup is running and that it’s prepared for a disaster. The longest it has taken the primary job scheduler that’s tracking 3,000 IBM i jobs to fail over to a secondary job scheduler is 0.54 seconds, Massey says.
“If you’re the customer, there’s a big difference between 60 minutes and one minute in a cutover in a disaster recovery situation. And more and more companies now are having schedulers on their ERP,” Massey says. “Most of the time, we say we can do it under a minute, because if we say under a second, everyone’s going to go, that’s BS. But we haven’t seen anybody who’s got to a second yet.”
The shadowing features is not new for RevSoft, which is based in Australia and recently hired an experienced IBM i salesperson, Alex Rodriguez, to help expand business in North America. What is new is a monitoring tool that tells the customer what the state of the job scheduler replication is.
“When I click the check button, it’s checking those jobs on the byte level for each of the servers,” Massey says. “We’re proving that the scheduled jobs you said to be replicated are replicated, that this is 100 percent accuracy and every job we have here is correct.”
The job checking features requires running RevScheduler on a Windows server, not just for the prettier UI but for some of the communications that are necessary, Massey says. The job checking routine can be set to run every hour, and to email a PDF copy of the report to recipients letting them know the status of the job schedule replication.
The new integrity checker is about giving customers security. In the event that something is amiss in the replication of the job scheduler from the primary to the secondary systems, it will give them a valuable heads-up to fix it before it’s needed. For the 99.9 percent of the rest of the time, it will give them the assurance that at least the job scheduler is ready for a failover.
“It’s all about giving them a seatbelt type scenario that everything is good should you need to cut over,” Massey says. “You put on a seatbelt when you’re in a car. And the reason to do that is because you want to survive a crash. You don’t have to have crash and get maimed to realize you need to put the seatbelt on.”
Massey says that this is a big problem, and that the HA vendors should be doing more about it. He says he tried discussing it with a guy in charge of a well-known HA brand, but that they weren’t interested in it.
“He said ‘How can I recommend a job scheduler where you don’t replicate those libraries?’ I said, ‘You’re getting stuck on the point there. We’re doing it better and faster than you guys do, so why would you replicate our library?” Massey says. “In some respects, with our software, we’re actually showing people what they should be seeing from their HA solutions.”
For more information, see the company’s website at www.revsoft.com.