Hadoop and IBM i: Not As Far Apart As One Might Think
May 20, 2015 Alex Woodie
The worlds of IBM i and Apache Hadoop appear to be diametrically opposed. One is a proprietary, RISC-based platform used primarily to run transactional systems. The other is an open source, X86-based platform used primarily for big data analytics. But as far apart as the two platforms seem to be, at least one IBM i software vendor, mrc, is aiming to find some common ground between them.
Hadoop is a distributed data storage and processing framework that engineers and researchers at Yahoo and Google are credited with helping to create. While indexing the Internet was Hadoop’s first use case, it’s since been adapted and adopted to do all sorts of other stuff, such as executing machine learning algorithms, running SQL- and NoSQL-style data warehouses, and even graph databases. (You can read all about Hadoop at www.datanami.com, a big data analytics publication that I write for and manage.)
Chicago-based mrc used the recent COMMON conference to highlight the work it’s doing around Hadoop. Mrc, you will remember, is the company behind m-Power, the template-based Web application development tool that generates enterprise Java code that can run on any Java-supported platform and database, including (but not limited to) the IBM i and DB2 running on IBM Power Systems servers.
Mrc is supporting Hadoop in two distinct ways, including analytical and transactional workloads. For analytical workloads, the company is supporting m-Power-generated apps running against Hive and Impala, two SQL-based database engines that run atop Hadoop. This seems like a natural fit for mrc, considering that the majority of the applications that its customers generate are business intelligence and reporting applications.
Impala and Hive are standard elements of the emerging Hadoop stack (which is mostly written in Java), and basically create the analytical databases that you can find from the likes of Teradata, Hewlett-Packard Vertica, and IBM Netezza. The primary difference with Impala and Hive is that they operate against the Hadoop Distributed File System (HDFS), which can store hundreds of petabytes of data striped across tens of thousands of X86 servers. Traditional data warehouses, by comparison, typically max out at a few hundred terabytes, and they cost a lot more, too.
For transactional workloads, mrc has partnered with a Silicon Valley startup called Splice Machine. Founded by Monte Zweben, a veteran of the first dot-com boom (and bust), Splice has built a traditional row-oriented relational database that runs atop Hadoop. The companies have tested the combination, and any Java-based app that was initially generated to run on an IBM i-based Power Systems server can also run on the Splice RDBMs running on Hadoop.
This gives customers more options for getting the most bang for their buck out of programming investments, says Zweben. “Our partnership with mrc gives businesses a solution that can speed real-time application deployment on Hadoop with the staff and tools they currently have, while also offering affordable scale out on commodity hardware for future growth,” he says in a press release.
Not every IBM i shop is asking for Hadoop capabilities, but there have been some inquiries, says mrc’s marketing director Steve Hansen.
“Our message to the IBM i crowd is businesses have a lot more data that they realize,” he tells IT Jungle. “There’s a lot more data out there, with sensor data and software log files. Every piece of hardware these days produces data. IBM i shops need to start storing that, and Hadoop is the easiest way to start storing that data.”
What mrc is doing shouldn’t be viewed as a threat to the IBM i way of life, but rather a way to augment what it already does for you, Hansen says. “We’re not telling people it’s time to replace IBM i. We’re saying the data is getting bigger. There’s unstructured and social data, and businesses just aren’t doing much with it yet. I think it’s overwhelming.”
Hansen, who also oversees mrc’s website, used Hadoop to store website server log files that were going to waste, and used m-Power to build some simple dashboards against it that told him what areas of the site were being used and which ones weren’t. It’s those types of simple applications that customers can start with and begin to explore how Hadoop can benefit them.
“Right now we’re trying to build awareness to what Hadoop is and how people who are using IBM i can take this data that they’re not taking advantage of and put it into Hadoop,” he says. “I don’t see it as a replacement for their IBM i. It’s more something that can enhance what they’re currently doing and tracking all this data they’re not tracking.”
Don’t think for a second that the smart folks at IBM–in Armonk and Rochester and Somers and Austin–aren’t watching this trend closely and looking for a way to sell the IBM i customer base on this thing called Hadoop. Of course, that’s part of the problem–Apache Hadoop is free. IBM, of course, sells something called IBM InfoSphere BigInsights that is a distribution of open source Hadoop. But IBM i shops don’t have to pay a penny to get started with Apache Hadoop.