• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Apache Kafka And Zookeeper Now Supported On IBM i

    September 9, 2020 Alex Woodie

    IBM quietly added support for two open source technologies, Apache Kafka and Apache Zookeeper, on IBM i. Both projects are critical elements of emerging distributed computing frameworks in the big data space, although they have very different uses.

    Apache Zookeeper is a core underlying technology for enabling distributed computing and ensuring that applications running atop Zookeeper are highly resilient. The software functions as a centralized service for keeping track of nodes in a cluster, and for synchronizing data and services among the different nodes.

    All of the nodes in a Zookeeper cluster have access to a shared hierarchical namespace. If any one of the nodes in a cluster becomes unavailable, Zookeeper automatically performs a failover, and the services and data running atop the failed node are automatically migrated to an available node.

    Rely on Zookeeper to ride herd on unruly distributed apps.

    Zookeeper was originally developed to help coordinate nodes in Apache Hadoop clusters. Hadoop, of course, is the distributed data storage and processing system that emerged from Yahoo in the mid-2000s to handle the huge amount of data that made up the search engine’s index of the Web.

    Zookeeper was a sub-project of Hadoop at the Apache Software Foundation. But in recent years, Zookeeper has been adopted as the core underlying clustering technology by a number of distributed computing projects at the ASF, including HBase, Hive, Solr, NiFi, Druid, and Kafka, which are commonly considered part of the “Hadoop family” of products. It has since become a top-level project at ASF.

    Apache Kafka, meanwhile, can best be thought of as a next-generation message bus for event data. While it’s commonly linked to Hadoop and is included in Hadoop distributions, it really lives outside of the Hadoop family and institutes an entirely new way of storing and processing event data.

    Kafka was developed about 10 years ago by engineers at LinkedIn to handle the social media company’s fast-growing data collection system. Every time a LinkedIn user does something on the website or mobile app, such as click on a post or accept an invitation to connect, it generates an event in LinkedIn’s system.

    The company, which is now owned by Microsoft, had been using a traditional batch-oriented message bus to handle this data, but the engineers found that it was unable to keep up with data volumes. The LI engineers envisioned an entirely new type of platform that would treat event data as a first-class citizen, as opposed to the second-tier status that events get in a traditional relational database.

    Sitting atop Zookeeper, Kafka functions as a distributed system for storing and processing event data. The event data is sourced from components called “Producers,” which writes data into categories called “topics.” Users (or applications) can subscribe to these topics and receive the stream of data through a component called a “consumer.”

    As a publish/subscribe system for high-volume data, Kafka clusters can be used for extract, transform, and load (ETL) workloads, as well as for real-time analytics systems. Kafka clusters are composed of multiple servers, or “brokers,” and it can scale well into the petabyte range.

    Apache Kafka is a distributed pub/sub system used for real-time ETL and streaming analytics.

    Kafka’s scalability was put to use at LinkedIn. By 2011, when the company first implemented Kafka, LinkedIn users were generating about 1 billion messages per day. By 2015, the company was generating 1 trillion events per day.

    Most of the Silicon Valley Web giants have adopted the open source Kafka software to manage large flows of event data, including Netflix, Uber, Pinterest, and Airbnb. To service this emerging ecosystem, the original developers of Kafka at LinkedIn formed a spin-off called Confluent, which continues to lead development of Kafka and host its own cloud-based version.

    While Kafka may be new to IBM i, the application has certainly has seen its share of IBM i data over the years. Companies like Precisely (formerly Syncsort) and Attunity (now owned by Qlik) have developed connectors to pump data from IBM i and mainframe sources into the Kafka bus.

    Because of the power of Kafka to create extensible data pipelines and to perform real-time transformations on the data flowing thorough those pipelines, Kafka has emerged as a key architecture element in next-generation big data analytics systems. Many organizations use Kafka to pump data from source systems into data warehouse and cloud-based data lakes, such as Google Cloud’s BigQuery, Snowflake, Microsoft Azure Synapse Analytics (formerly SQL Warehouse), and Amazon Web Services RedShift. AWS, for its part, offers its own pub/sub system, called Kinesis.

    It’s unclear exactly why IBM added support for Kafka and Zookeeper, or exactly how these technologies will run on IBM i. (We hope to connect with IBM in the near future to get answers to these questions.) In any event, these are two of the most impactful projects in the open source big data community, along with Apache Hadoop and Apache Spark, so it’s good to see IBM taking steps to keep up with the wider data world, as it has recently done with open source databases like MongoDB, Apache Cassandra, and PostgreSQL.

    It’s also good to see that IBM is providing professional support for Kafka and Zookeeper on IBM i. IBM recently added these projects to the list of open source projects it supports on IBM i through its Technology Support Services (TSS) program. Check out the website at www.ibm.com/support/pages/open-source-support-ibm-i to see for yourself.

    RELATED STORIES

    What’s New In Open Source With The Latest TRs

    More Open Source Databases Coming To IBM i

    Open Source Is the Future, So Where Does IBM i Fit In?

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags: Tags: Apache Cassandra, Apache Hadoop, Apache Spark, Confluent, IBM i, Kafka, MongoDB, PostgreSQL, Zookeeper

    Sponsored by
    WorksRight Software

    Do you need area code information?
    Do you need ZIP Code information?
    Do you need ZIP+4 information?
    Do you need city name information?
    Do you need county information?
    Do you need a nearest dealer locator system?

    We can HELP! We have affordable AS/400 software and data to do all of the above. Whether you need a simple city name retrieval system or a sophisticated CASS postal coding system, we have it for you!

    The ZIP/CITY system is based on 5-digit ZIP Codes. You can retrieve city names, state names, county names, area codes, time zones, latitude, longitude, and more just by knowing the ZIP Code. We supply information on all the latest area code changes. A nearest dealer locator function is also included. ZIP/CITY includes software, data, monthly updates, and unlimited support. The cost is $495 per year.

    PER/ZIP4 is a sophisticated CASS certified postal coding system for assigning ZIP Codes, ZIP+4, carrier route, and delivery point codes. PER/ZIP4 also provides county names and FIPS codes. PER/ZIP4 can be used interactively, in batch, and with callable programs. PER/ZIP4 includes software, data, monthly updates, and unlimited support. The cost is $3,900 for the first year, and $1,950 for renewal.

    Just call us and we’ll arrange for 30 days FREE use of either ZIP/CITY or PER/ZIP4.

    WorksRight Software, Inc.
    Phone: 601-856-8337
    Fax: 601-856-9432
    Email: software@worksright.com
    Website: www.worksright.com

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Driving System TCO With IBM Global Asset Recovery Services Just How Big Is The Whole Power Systems Business?

    One thought on “Apache Kafka And Zookeeper Now Supported On IBM i”

    • Brian Randle says:
      September 10, 2020 at 9:05 am

      The link to Technology Support Services lists many packages that include support but states “This offering is available for multiple platforms, so not everything in the list runs on IBM i”. I don’t see either package available to install on the ACS Open Source package management.

      Reply

    Leave a Reply Cancel reply

TFH Volume: 30 Issue: 54

This Issue Sponsored By

  • Maxava
  • COMMON
  • Datanational Corporation
  • UCG Technologies
  • MAGiC

Table of Contents

  • Apache Kafka And Zookeeper Now Supported On IBM i
  • Driving System TCO With IBM Global Asset Recovery Services
  • COVID-19 Delivers 2020 Clarity for Omnichannel
  • Four Hundred Monitor, September 9
  • IBM i PTF Guide, Volume 22, Number 36

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • Power Systems Grows Nicely In Q3, Looks To Grow For All 2025, Too
  • Beta Of MCP Server Opens Up IBM i For Agentic AI
  • Sundry IBM i And Power Stack Announcements For Your Consideration
  • Please Take The IBM i Marketplace Survey
  • IBM i PTF Guide, Volume 27, Number 43
  • IBM Pulls The Curtain Back A Smidge On Project Bob
  • IBM Just Killed Merlin. Here’s Why
  • Guru: Playing Sounds From An RPG Program
  • A Bit More Insight Into IBM’s “Spyre” AI Accelerator For Power
  • IBM i PTF Guide, Volume 27, Number 42

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle