IBM Power Systems Can Do Big Data Analytics, Too
May 12, 2014 Alex Woodie
Server workloads are typically broken down into two broad categories: transactional or analytical. For most IBM i shops, the focus is heavily on the transactional side of the equation. But as IBM i shops peek over the wall into the rapidly expanding world of big data analytics, they will be pleased to find that IBM and its partners are bringing together a collection of big data tooling that will happily run on the system they already own.
Granted, the chances of the big data analytics software running directly on IBM i are slim to none. While IBMers like Mike Cain will stand toe to toe with DB2/400…er, DB2 for i…against any relational database out there when it comes to performance, security, and scalability, the fact is that big data analytics, as it exists today, largely lives beyond the realm of relational databases.
Most of the big data products, such as IBM’s distribution of Hadoop (called InfoSphere BigInsights), run on Linux, which is the de facto standard for the new generation of big data products. AIX still has a foothold, especially for traditional business intelligence tools like the SPSS statistics package and the Cognos reporting and data warehousing tools. But the platform focus for emerging data analytic product categories is primary Linux, according to Linton Ward, an IBM Distinguished Engineer and Chief Engineer Power Workload Optimized Systems.
“The preponderance of analytic growth over foreseeable future is in the Linux ecosystem,” Ward tells IT Jungle. “There are many players in the analytic space that we’ll be bringing out on the Linux on Power platform over the next year or so.”
You may be wondering why IBM thinks Power is a good platform for big data analytics when the vast majority of big data analytic products today were designed to run on X86 machines. (You think of good questions!) After all, one of the main selling points of Hadoop is that it basically gives you supercomputing-like power on cheap Intel hardware but without the supercomputing-like cost. And Power, we know, is a rung-up from Intel in the cost department.
The answer, not surprisingly, is money. We are at the very tip of a surge in big data analytic spending, and IBM wants a piece of the action with Power. But IBM is fighting an uphill battle in that its Power servers are typically more expensive than Intel-based servers (less so recently, but still so).
Positioning Power for platforms like Hadoop is also forcing IBM to shift from the big endian approach to a little endian one, and from a scale-up architecture to a scale-out one, Ward says.
The limitations of clustering traditional relational databases required IBM and other big Unix server makers to focus on building massive symmetric multi-processor (SMP) servers that could house large amounts of data in a single operating system footprint. But those limitations are starting to fade away, both for transactional and analytic workloads, thanks to Hadoop, NoSQL, and other big data technologies.
Ward, for one, is bullish on this approach to tackling the big data market opportunity with a scale-out Power architecture. A bit part of that plan is the OpenPower Foundation that IBM founded to bring together a group of partners to build “white box” servers built on Power8 chips. IBM has signed up the likes of Google, Nvidia, Mellanox, and Tyan to the OpenPower Foundation with the hope of giving IBM a scale-out story to go up against Intel.
IBM is hoping that the higher performance and greater efficiency of its Power server will allow it to take some share of new big data workloads away from Intel. And Big Blue is putting its money where its mouth is by shipping the ship smaller 2U and 4U Power8 servers that will serve as nodes in a cluster before shipping the big-daddy SMP Power8 machines later this year.
Whether or not placing Power’s chips on a distributed, scale-out bet works, IBM will be working to figure out ways to make big data easier to adopt for its established real-world customers running transactional systems on IBM i, AIX, and z/OS. “We’re not abandoning AIX. We’re not walking away from IBM i .We’re not that stupid to walk away from those,” Ward reassures us.
It’s more about enabling new workloads, like Hadoop, for systems of engagement or systems of insight, he says. “It’s the notion that, for interacting with clients, the Web 2.0 environments and–to use the IBM phrase–this new cognitive computing era, we believe there is substantial growth going forward.”
The move to a white-box, scale-out, little-endian approach with Power Linux is aimed at attracting new customers and new workloads to the platform. IBM has invested $1 billion in prepping Power Linux for its new big data and OpenStack roles. But that doesn’t mean that IBM i shops can’t ride the coattails of this investment. With a little investment in skills and software, a typical midmarket IBM i shop could position itself to start taking advantage of some of IBM’s big data investments.
The skills required to implement and run Hadoop or a column-oriented data warehouse such as PureData for Operational Analytics (formerly Netezza) are quite different than the skills required to maintain an RPG-based ERP system running on DB2 for i. In particular, it is getting tougher and more expensive to find people with the requisite math, programming, and data science skills necessary to make the fullest and best use of big, unstructured data, which IBM says accounts for 80 percent of all new data created today.
But it would seem that there are some cost savings to be had by utilizing Power to run both transactional workloads on IBM i and analytic workloads on Linux. At a certain level, there should be some overlap in the IT skills necessary to manage and maintain the Power environments. Having one vendor to support both server environments may also be a blessing (although it might also be a curse, depending on how you look at it).
IBM has a decent selection of big data software packages for Power today. It starts with the Big Insights package, and includes InfoSphere streams (for streaming big data), DB2 BLU for column-oriented queries within a relational database (available only on DB2 for LUW), Cognos, SPSS, and ILOG. IBM has in-memory capabilities, recently bought a cloud-based NoSQL database company, and has no shortage of master data management, encryption, deduplication, and replication tools. It has all the parts–perhaps more than it can sometimes coherently communicate without devolving into an overwhelming morass of options and parts numbers and license types (ahem).
And while there is a clear distinction between transactional and analytic, let’s not forget what IBM is doing with System z and the new zDoop Hadoop package it recently enabled to run there (in the Linux subsystem). Because it takes so many MIPS to move data off z/OS, some mainframe customers are better served by actually running Hadoop in place on the System z, where unstructured data from the great big world can be blended with the structured data that runs so nicely on the mainframe. If this works on System z, there’s the possibility that there could be a market for this on IBM i, which is equally encamped in the structured-data world.
At the very least, IBM should build a connector that allows DB2 for i data to be easily pulled into Big Insights running on Power Linux. The latest release of Big Insights, version 3.0, brings a new release of the Big SQL component that allows users to work with Hadoop data via SQL. Oracle‘s database is now supported, along with DB2 LUW and other sources, so why not DB2 for i?