IBM i Cloud Providers React To Amazon S3 Outage
March 6, 2017 Alex Woodie
Last week’s Amazon S3 outage served as a wakeup call that cloud platforms, despite the scalability and availability advantages, actually aren’t infallible. But will the Internet giant’s East Coast outage have any lasting effect on cloud adoption? And will it impact cloud adoption among IBM i customers? We asked some experts to find out.
User error was blamed on the “impairment” of US East 1, one of 16 “regions” under the massive AWS tent. The region, which is based in AWS’ Ashburn, Virginia data center, never went down entirely, despite claims that all of AWS was down, or that the entire Internet was broken. AWS services in other regions kept humming along merrily. But as you’re probably aware, the impairment of US East 1 resulted in widespread outages among other vendor’s services that rely on AWS and its massive object storage system, dubbed Simple Scalable Storage, or S3 for short.
Make no mistake: this was a major event. And one of the reason why is because US East 1 is a special region in the AWS scheme of things. As Steve Chambers outlines in his informative article on the ITSM Tools website, US East 1 is special for several reasons, including the fact that it is the oldest and biggest region in the AWS cloud, with five availability zones; is the default region for S3 storage; and is a global region that serves as the default endpoint for several other AWS services.
So when an AWS employee made a typo in a command and accidentally took too many US East 1 servers offline last Tuesday in an attempt to debug an issue with the billing system, it created a serious problem. According to Amazon, removing this server capacity “caused each of these systems to require a full restart,” the company says in a blog post.
While the subsystems were being restarted, S3 was unable to service requests for other AWS services in the region, including the S3 console, Amazon Elastic Compute Cloud (EC2) new instance launches, Amazon Elastic Block Store (EBS) volumes, and AWS Lambda.
This set off a domino effect that resulted in a cascade of failures across the Internet. Online and mobile services from companies like Quora, Giphy, Instagram, IMDb, American Airlines, Imgur, and Slack were impacted. People were reportedly separated from their Twitter accounts for some length of time. And some people who rely on home automation systems that store data in S3 reported being stuck in their homes. (Such is the state of our modern world.)
IBM i Impact
As more IBM i workloads move to the cloud, it’s worth asking whether an outage like the one that just impacted AWS S3 could take down IBM i processing.
“Something like that, it’s hard to prevent, if it’s a user error,” says Pete Massiello, president of iTech Solutions Group, a Connecticut IBM business partner that provides Power Systems hosting and an IBM i cloud backup solution. “I’m sure Amazon has redundant power and redundant cooling – redundant everything – and that’s what you should be having.”
iTech Solutions’ backup cloud is based on a virtual tape library (VTL) solution based on the EVault online backup software now owned by Carbonite. iTech’s cloud replicates customers’ IBM i data across VTLs located in two data centers in the state of Michigan, Massiello says.
The company can also tap into IBM i capacity by way of a Power7 box stored in an Iron Mountain facility outside of Boston, Massachusetts, Massiello says. “The constant reoccurring theme is N-plus-one,” he says. “That’s the benefit of a true cloud solution, that you get the redundancy.”
Investments in high availability infrastructure and adherence to accepted IT processes and standards like ISO and ITIL are important elements of LightEdge Solutions‘ commitment to ensuring uninterrupted uptime of its Power Cloud, according to LightEdge Solutions Architect Roger Mellman.
“A significant value LightEdge delivers to iSeries customers is our hardened data center infrastructure and strict process adherence,” Mellman tells IT Jungle. “We employ a reference architecture, which forces elimination of single points of failure in our infrastructure, network, and systems. We are architected for high availability both within and across our multiple data centers.
The company runs Vision Solutions‘ MIMIX high availability software to replicate its customers’ production IBM i data in real time to secondary servers running in secondary data centers, which offers the highest levels of recovery point and recovery time objectives. For customers who can’t justify the expense of a full HA solution. LightEdge offers backups on EMC DataDomain VTLs, which also involves multi-site replication and DataDomain’s exclusive data reduction technology.
While there’s no way to stop freak occurrences from happening – like the AWS employee entering the wrong number of servers to take down will surely tell you – careful planning can mitigate most of the risk associated with a cloud solution.
“It is impossible to guarantee that you will never go down,” says Ralph Wasner, CTO of First National Technology Solutions, an IBM i cloud provider that we profiled in a story last week. “At the same time you can minimize the risk of an outage by building resilience in the systems and platforms we support and by proactively monitoring the environment.”
Careful construction of the private cloud environment can reduce – but not entirely eliminate – the possibility of an extended outage,” Wasner says.
“FNTS invests in hardening our systems with redundant network/systems and the best of breed hardware,” he says. “Nothing at FNTS is single threaded. We actively monitor for issues but we also monitor for the indicators of a pending issue.”
Hoping and Praying
According to Massiello, most managed service providers (MSPs) hosting private IBM i clouds take pains to ensure redundancies up and down the stack. But, there are exceptions.
“The IBM i community is small. We all know one another. All the people who are in this have the right infrastructure. They’ve been doing it a long time,” he says. “It’s the people who are pretending to be cloud providers – those are the ones you have to be worried about.”
IBM i cloud shoppers should make sure their prospective cloud providers have invested in the redundancies necessary to overcome most problems that can crop up.
“There have been some business partners in the market who have decided they’re going to have a cloud and host their customer on it, and it’s in the same computer room that they’re running their own workloads on,” he says. “The redundancies are not there.”
They should also look at the personnel involved in monitoring and managing the cloud operation, which is a 24/7/365 affair. “You just have a guy who works from 8-5 Monday through Friday and Friday he leaves and is going to come in Monday and check to see if the machine is up,” Massiello says. “That’s not a cloud. That’s a hope and a prayer.”