• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Should Spark In-Memory Run Natively On IBM i?

    November 6, 2017 Alex Woodie

    There’s a revolution happening in the field of data analytics, and an open source computing framework called Apache Spark is right smack in the middle of it. Spark is such a powerful tool that IBM elected to create a distribution of it that runs natively on its System z mainframe. Will it do the same for its baby mainframe, the IBM i?

    So, what is Apache Spark, and why should you care? Great questions! Let’s introduce you to Spark.

    Spark came out of UC Berkeley’s AMPLab about five years ago to provide a faster and easier-to-use alternative to MapReduce, which at that point was the primary computational engine for running big data processing jobs on Apache Hadoop. While Spark has a learning curve of its own, the Scala-based framework has not only replaced Java-based MapReduce, but also eclipsed Hadoop in importance in the emerging big data ecosystem.

    Spark is useful for developing and running all sorts of data-intensive applications, including familiar programs like ETL jobs and SQL analytics, as well as more advanced approaches like real-time stream processing, machine learning, and graph analytics. This versatility, as well as well-documented APIs for developers working in Java, Scala, Python, and R languages and its familiar DataFrame construct, have fueled Spark’s meteoritic rise in the emerging field of big data analytics.

    IBM took notice of Spark several years ago, and has since worked on several fronts to help accelerate the maturation of Spark on the one hand, and to embed Spark within its various products on the other, including:

    • ML for z/OS, which executes Watson machine learning functions in a Spark runtime in the mainframe’s Linux-based System z Integrated Information Processor (zIIP).
    • Integrated Analytics System, which combines Spark, Db2 Warehouse, and its Data Science Experience, a Jupyter-based data science “notebook” for data scientists to quickly iterate with Spark scripts.
    • Project DataWorks, which brings Spark and Watson analytics together on the Bluemix cloud.
    • Open Data Analytics for z/OS, a runtime that combines Spark, Python, and the Anaconda package of (mostly) Python-based data science libraries from Anaconda.
    • And Spark running directly on its Bluemix cloud.

    And considering that IBM opened a Spark Technology Center in 2015, it’s safe to say that IBM is quite bullish on Spark. (That’s a major understatement, actually.) But perhaps the most interesting data point for this discussion came in 2016, when Big Blue launched its z/OS Platform for Apache Spark, which is a native distribution of Spark for the System z mainframe.

    Native Spark On The Mainframe

    IBM received kudos for the work from various industry insiders who participated in this video on the z/OS Platform for Apache Spark webpage. Among those singing IBM’s praise was Bryan Smith, the former CTO and VP of R&D at Rocket Software.

    “IBM did a really good job in porting Apache Spark to z/OS,” Smith says. “They could have just done a very simple port. But they didn’t. They didn’t cut any corners. They really exploited the underlying hardware architecture. They’re using specialty engines. They’re using the hardware compression facilities. They’re able to leverage the 10 TB of memory that you have on a z13 machine and the . . . processors, so you can actually run those Apache Spark clusters on z/OS.”

    Another software vendor that appreciates having Spark running natively on z/OS is Jack Henry & Associates, the Missouri banking software developer that also has a fairly big IBM i business.

    “The pain point to us is getting the data out to our customers,” Todd Hill, Jack Henry’s direct of card processing, says in the video. “Currently we have data on the mainframe. We have a distributed stack for across many types of applications. What Apache Spark does for us is to keep your data centralized in the one location. So instead of moving all that data off from multiple platforms into other applications, I can run Apache Spark directly on the mainframe, at low cost, and get it built out, and get the data to the people that need it.”

    Mike Rohrbaugh, zSystem lead for Accenture, says having Spark on the mainframe helps by automating the generation of intelligence and reducing the complexity. “It’s just so simple to bring the analytics engine back to the data to do intelligent automation,” he says in the video.

    IBM i Versus The Mainframe

    So how does this relate to the question in the headline of this story? For starters, let’s compare the similarities and differences between the IBM i and the z/OS mainframe platforms.

    First, the similarities. Both the IBM i server and the z/OS mainframe are relied upon to run transactional applications that are core to the businesses that use them. Both of them are used to store structured data that’s arguably the most critical data for the businesses that use them. They both store data in the EBCDIC format, and are heralded for best-in-class reliability and security. The also both run proprietary operating systems as well as open OSes like Linux, mostly utilize older languages (RPG and Cobol, respectively), and sport text-based interfaces that use the 5250 and 3270 datastreams, respectively.

    Now, the differences. Mainframes have their own processor type, while IBM i runs on the more popular Power processor. The mainframe stores data many different data stores (Db2 for z, copy books, etc.), while most IBM i data is stored in Db2 or IFS. Demographically, mainframe customers tend to be the largest companies in the world, whereas IBM i has a bigger installed base among small and midsized business. There’s also a large concentration of mainframes in banking, insurance, and healthcare, whereas IBM i has a stronger foothold in manufacturing, distribution, and retail.

    IBM i and mainframes are strong transactional systems, and are less known for their analytical prowess. However, data analytics are becoming increasingly important in this day and age, especially as part of a company’s digital transformation strategy. The pundits often say that all companies will need data analytics strategies to effectively compete in the coming decades. That’s probably a bit of an exaggeration, but only for the timing.

    The question, then, becomes the places where this analytical processing is going to take happen. Today, most mainframe and IBM i shops offload it to another system. It’s fairly common for users of both mainframes and IBM i servers to set up elaborate workflows to move data from the “big iron” transactional systems to dedicated analytical systems, including massively parallel processing (MPP) column-oriented systems like Teradata, Netezza, or Vertica. With the advent of Apache Hadoop clusters running on commodity X86 processors, many companies started experimenting with Hadoop computing, which invariably introduced them to the in-memory Spark framework.

    IBM wants to keep those analytic workloads on the mainframe if at all possible, which is why it made Spark run natively. This not only keeps costs down for its customers, but it also make the mainframe more “sticky” and lessens the urgency to migrate data and workloads off its biggest cash cow.

    The question, then, is whether IBM sees similar dynamics at play for the average IBM i user. Mainframe customers, owing to their size and tendency to be in financial services, are early adopters of new technologies, like Spark. They’re arguably closer to the cutting edge than the average IBM i shop, and the dollars at stake for each mainframe client are much larger.

    It’s safe to say that IBM i members of the Large User Group (LUG) probably are more closely resemble their mainframe brethren, and could benefit from having a powerful, cutting-edge tool like Spark running natively on the IBM i. They’re more apt to have a bigger investment in separate analytical environments, be it a Teradata machine or a Hadoop cluster. They’re also more likely to have some data science Skunk Works project running somewhere in their shop, and are more likely to already be running Spark in Linux, which is where it was originally developed to run.

    Spark On IBM i

    While Spark may not be on the radar of the average IBM i shop yet, folks within IBM are starting to ask questions about whether Spark will impact the IBM i installed base, and if it’s going to be important to them, how it ought to be introduced. If the company is planning to support Spark natively on IBM i, the company isn’t saying publically, which is not surprising.

    What we do know, however, is that IBM executives are at least talking about the prospect of bringing Spark to IBM i in some way, shape, or form. “It’s part of some discussions,” IBM’s product development manager for Db2 Web Query Robert Bestgen recently told IT Jungle.

    There are two general options for bringing Spark to the platform: porting Spark to run natively on IBM i or running in a Linux partition running on Power Systems. Spark was written in Scala, and therefore can run within a Java virtual machine (JVM), which the IBM i platform obviously runs. It may not be a stretch to get it running there, but there could be other factors that come into play, such as IBM i’s single level storage architecture, and how that maps to how Spark tries to keep everything in RAM (but will spill out to disk if needed).

    Should the Spark port be native? “Depends on who you talk to,” Bestgen said.

    The widely held thinking within IBM is that the Linux route makes more practical sense – if Spark is to come to IBM i at all (which, as far as we know, hasn’t been decided). “If you back up [and look at it] from an IBM i perspective, IBM would say that IBM i is part of the Power Systems portfolio, or what we call Cognitive Systems now,” Bestgen says. “For Power Systems, those platforms [like Hadoop and Spark] tend to run best on a… Linux kind of environment. That’s what folks think about it.”

    Few IBM i shops today are even running Linux partitions. According to HelpSystems‘ 2017 IBM i Marketplace study, fewer than 8 percent of organizations are running Linux next to IBM i on a Power Systems box, while about 9 percent are running Linux on other Power boxes. AIX’s penetration is about 50 percent higher, for what it’s worth.

    There’s a case to be made that IBM i shops are lousy at figuring out how to leverage the wealth of available tools for Linux, even after IBM went through the trouble of supporting little endian, X86-style Linux to go along with its existing support for big endian Linux within Power. “One of the areas that IBM could do a better job selling is saying, you seem to be willing to run Linux on a different platform. Why not run it on the platform that you have in your system now?” Bestgen says.

    At the end of the day, there are a lot of unanswered questions, including whether the IBM i installed base needs or wants such a powerful tool as Spark, let alone how it should run. So the question to the answer in the headline is no. “I don’t think we’re there yet in terms of running those things natively on i,” Bestgen says.

    RELATED STORIES

    Visual Data Exploration Comes To Db2 Web Query

    What Does IBM’s Embrace Of Apache Spark Mean To IBM i?

    Hadoop and IBM i: Not As Far Apart As One Might Think

    IBM Power Systems Can Do Big Data Analytics, Too

    What Does ‘Big Data’ Mean for IBM i?

    Big Data Gets Easier to Handle With IBM i TR7

    Inside IBM ML: Real-Time Analytics On the Mainframe (Datanami)

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags: Tags: Apache Spark, API, COBOL, DB2, IBM i, IFS, Linux, RPG, System z

    Sponsored by
    WorksRight Software

    Do you need area code information?
    Do you need ZIP Code information?
    Do you need ZIP+4 information?
    Do you need city name information?
    Do you need county information?
    Do you need a nearest dealer locator system?

    We can HELP! We have affordable AS/400 software and data to do all of the above. Whether you need a simple city name retrieval system or a sophisticated CASS postal coding system, we have it for you!

    The ZIP/CITY system is based on 5-digit ZIP Codes. You can retrieve city names, state names, county names, area codes, time zones, latitude, longitude, and more just by knowing the ZIP Code. We supply information on all the latest area code changes. A nearest dealer locator function is also included. ZIP/CITY includes software, data, monthly updates, and unlimited support. The cost is $495 per year.

    PER/ZIP4 is a sophisticated CASS certified postal coding system for assigning ZIP Codes, ZIP+4, carrier route, and delivery point codes. PER/ZIP4 also provides county names and FIPS codes. PER/ZIP4 can be used interactively, in batch, and with callable programs. PER/ZIP4 includes software, data, monthly updates, and unlimited support. The cost is $3,900 for the first year, and $1,950 for renewal.

    Just call us and we’ll arrange for 30 days FREE use of either ZIP/CITY or PER/ZIP4.

    WorksRight Software, Inc.
    Phone: 601-856-8337
    Fax: 601-856-9432
    Email: software@worksright.com
    Website: www.worksright.com

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Functionality Trumps Glitz in ERP Decision Mad Dog 21/21: Classics Then And Now

    One thought on “Should Spark In-Memory Run Natively On IBM i?”

    • trainerhadoop says:
      November 6, 2017 at 7:08 pm

      Great Article on Spark memory. I am from http://www.online-trainings.org

      Reply

    Leave a Reply Cancel reply

TFH Volume: 27 Issue: 73

This Issue Sponsored By

  • New Generation Software
  • ASNA
  • WorksRight Software
  • Maxava
  • Manta Technologies

Table of Contents

  • IBM Deal Prices Current Power8 Compute Like Future Power9
  • Database Modernization: Methodology To Solve Problems
  • Guru: At Last! A Tool To Search an Output Queue!
  • Mad Dog 21/21: Classics Then And Now
  • Should Spark In-Memory Run Natively On IBM i?

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • Big Blue Raises IBM i License Transfer Fees, Other Prices
  • Keep The IBM i Youth Movement Going With More Training, Better Tools
  • Remain Begins Migrating DevOps Tools To VS Code
  • IBM Readies LTO-10 Tape Drives And Libraries
  • IBM i PTF Guide, Volume 27, Number 23
  • SEU’s Fate, An IBM i V8, And The Odds Of A Power13
  • Tandberg Bankruptcy Leaves A Hole In IBM Power Storage
  • RPG Code Generation And The Agentic Future Of IBM i
  • A Bunch Of IBM i-Power Systems Things To Be Aware Of
  • IBM i PTF Guide, Volume 27, Numbers 21 And 22

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle