• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Data Is Streaming Into The Cloud, Including From IBM i

    November 14, 2022 Alex Woodie

    We are in the midst of a vast migration of data into public clouds, as companies seek to leverage technological innovation occurring in the cloud for competitive advantage. While there are no IBM i runtimes in the public cloud yet, IBM i shops are still getting value by moving their valuable Db2 for i data into the cloud, often using streaming technologies and tools like Apache Kafka and Apache Camel, as well as an updated ETL offering from IBM.

    It is tough to understate the magnitude of the transition to the public cloud is having on the $4 trillion IT industry. The analyst firm Gartner expects the cloud to account for half of enterprise IT spending by 2025, as we have previously written. Much of the action is occurring on the operational front, but even more progress has been made in migrating analytical data and analytical (OLAP) workloads into the cloud.

    In the past, where companies might have used a Netezza or a Teradata machine to process and analyze data – or even an Apache Hadoop cluster if they were brave enough to go the open source route – today they’re embracing cheap and massive data lakes running on AWS, Google Cloud, and Microsoft Azure as the platform for their data analytics efforts. Thanks to the separation of storage and compute afforded by native cloud architectures, they can quickly spin up Azure Synapse, Databricks, or Snowflake instances to crunch their data and build machine learning models, and spin them down when they are done.

    Need to load 100 PB of data into the cloud? AWS Snowmobile is probably your best bet.

    Not surprisingly, IBM i shops would like to get in on this digital gold rush to the cloud, too. While IBM i shops can’t yet provision CPWs in the major public clouds to crunch their data in a native IBM i environment, they can still make use of the array of cloud services, ranging from traditional OLAP and advanced analytics to streaming analytics and even machine learning and AI.

    The extract, transform, and load (ETL) industry has existed for as long as OLAP has been a thing, and there are established ETL vendors out there that have experience tapping into IBM i data. IBM, Informatica, Talend, Fortra (previously HelpSytsems), and Matillion all have ETL offerings that work with IBM i data. But there’s one more name that IBM i shops might want to add to their lists: DataMigrator.

    DataMigrator Another ETL Option

    The optional add-on to Db2 Web Query was originally unveiled back in 2015. But as part of the fall 2022 IBM i Tech Refresh and the release of Db2 Web Query version 2.4 last month, DataMigrator received a major upgrade in the area of cloud data warehouse connectivity.

    John Westcott, a Db2 for i analytics consultant for IBM Technology Services (formerly IBM Lab Services) provided a compelling overview of DataMigrator’s low-code and no-code data transformation capabilities, as well as its new cloud streaming features, during the COMMON NAViGATE conference last month.

    According to Westcott, the fact that DataMigrator is a native IBM i solution plays heavily in its favor.

    DataMigrator provides a GUI for creating ETL data pipelines. (Image courtesy John Westcott, IBM)

    “It’s an IBM i product. It’s not some open-source product that we’re using,” he says. “And because it’s an IBM i product, it understands Db2 for i. So it understands all the weirdness of Db2 – the multiple members [and] that we have bad data.”

    As a SQL-based Db2 product, DataMigrator processes data sets, as opposed to processing rows of data. That’s important, Westcott says, because other ETL tools process data row by row, which can cause additional latency. It’s also significantly less expensive than competing ETL products, including the mainstream ETL product from IBM, he says.

    “We’ve had a number of people come to us because they’ve gotten prices from either other IBM products or vendor products and the cost is extremely high to where DataMigrator is,” Westcott says. “Did I mention it’s low cost?” he emphasized again. “It is really low cost.”

    There’s no point beating around the bush when it comes to IBM i’s bad data problem. Whether it’s dates that are characters, a single file that’s been blown up into a dozen separate files, files with multiple members, or flat files from a System/36 environment, it’s best to just admit that Db2 for i data isn’t always the easiest to work with, Westcott says.

    “It’s a fact. We all talk about it. We all know it,” he says. “We’re all over the place. So maybe we need to pre-process that data before we even try to get it down to some kind of data science or BI system.”

    DataMigrator sports an array of data transformation capabilities. Some of the simpler transformations can be configured in a no-code, drag-and-drop fashion using the product’s graphical interface. More advanced transformations, such as aggregations of sales or transformations of date data, may need a bit of coding.

    And while DataMigrator is an IBM i product, it can also be used to extract and transform data from non-IBM i data sources, such as a SQL Server database, a JDBC connection, or an Excel spreadsheet. Westcott demonstrated how DataMigrator can be used to grab SQL Server data and join it with Db2 for i data and load it into another IBM i system during his NAViGATE session.

    “If you had to develop this application writing an RPG application, how long would it take to write this and test this. Days? Weeks? Months?” he asks. “How long did it take me to write this flow? Maybe five minutes.”

    Real-Time Data Migration

    Westcott also demonstrated how DataMigrator can be used in combination with other tools to create data connections to cloud data warehouses, which is an increasingly popular use case.

    Depending on the size and type of the data, as well as the latency requirement, users may choose to deploy DataMigrator in combination with Apache Kafka, a popular open source publish and subscribe system that runs on IBM i via PASE, or used in conjunction with Kafka and Apache Camel.

    DataMigrator can be used with Kafka to stream data to cloud data warehouses. (Image courtesy John Westcott, IBM)

    DataMigrator can be used to extract Db2 for i data, transform it or aggregate if it desired into JSON, then and directly load it into a Kafka topic. Once it’s on the Kafka topic, it is available for anybody who has access to that Kafka topic, whether it’s IBM Watson Studio or Azure data lake. There’s no need for the IBM i developer to know anything about the targets, which is one of the advantages of this approach, Westcott says.

    “I don’t need to know how to talk to Watson. I don’t need to know how to talk to Azure,” he says. “Those services will pull from Kafka for what they want, when they want, if they want. We’re just making the data available, which in today’s world is a huge thing. It’s all about data availability.”

    In the above scenario, the connection between DataMigrator and Kafka is handled through HTTP-based REST calls, including PUTS and GETS. There are certain advantages to that approach, including the ability to move just about any type of data of any size. But there is another way to integrate DataMigrator with Kafka (and thus downstream systems like Azure or Watson) by using data queues and Apache Camel, which has its own advantage, Westcott says.

    Camel is an open-source routing tool that, like Kafka, runs on IBM i via the PASE runtime. The software allows users to configure rules that control what happens when certain data or workloads are triggered or completed. When DataMigrator is configured to send data to Camel via data queues, Camel essentially takes over the process of keeping the Kafka system full of data.

    Each approach has its advantages, Westcott says. “We’ll be somewhat limited in the data that can go this way [data queue], because it has to meet whatever requirements of a data queue,” he says. There are also limitations in what data types can be sent via data queue.

    However, performance can be better using the data queue approach, with the Camel-to-Kafka connection, “because once you pump the information into the data queue, DataMigrator is then free to go on to do the next task,” Westcott says. “So it’s a consideration for how close to near real time do you need it.”

    The folks at IBM Technology Services have been getting queried recently about another open-source product called Debezium that simplifies this whole change data capture (CDC) stack. IBM typically points them to some combination of DataMigrator and Kafka or DataMigrator and Camel-Kafka.

    “There is no Debezium IBM i connector,” Westcott says. “But that doesn’t mean we can’t emulate what that does.”

    RELATED STORIES

    Db2 Web Query Gets ML-Powered Insights, Cloud DW Connection

    Inside IBM i’s New Geospatial Functions For Db2

    IBM Unveils Fall 2022 Tech Refreshes for IBM i

    The Cloud’s Future Is So Bright, So Why Are You So Glum?

    Analytics Moves To The Cloud, And IBM i Data Goes With It

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags: Tags: DataMigrator, DB2 for i, Db2 Web Query, Debezium, ETL, IBM i, JDBC, Kafka, OLAP, PASE, REST, RPG, System/36

    Sponsored by
    WorksRight Software

    Do you need area code information?
    Do you need ZIP Code information?
    Do you need ZIP+4 information?
    Do you need city name information?
    Do you need county information?
    Do you need a nearest dealer locator system?

    We can HELP! We have affordable AS/400 software and data to do all of the above. Whether you need a simple city name retrieval system or a sophisticated CASS postal coding system, we have it for you!

    The ZIP/CITY system is based on 5-digit ZIP Codes. You can retrieve city names, state names, county names, area codes, time zones, latitude, longitude, and more just by knowing the ZIP Code. We supply information on all the latest area code changes. A nearest dealer locator function is also included. ZIP/CITY includes software, data, monthly updates, and unlimited support. The cost is $495 per year.

    PER/ZIP4 is a sophisticated CASS certified postal coding system for assigning ZIP Codes, ZIP+4, carrier route, and delivery point codes. PER/ZIP4 also provides county names and FIPS codes. PER/ZIP4 can be used interactively, in batch, and with callable programs. PER/ZIP4 includes software, data, monthly updates, and unlimited support. The cost is $3,900 for the first year, and $1,950 for renewal.

    Just call us and we’ll arrange for 30 days FREE use of either ZIP/CITY or PER/ZIP4.

    WorksRight Software, Inc.
    Phone: 601-856-8337
    Fax: 601-856-9432
    Email: software@worksright.com
    Website: www.worksright.com

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Court Throws Out Lawsuit From Chinese Power Systems Partner Against Big Blue IBM Sets Maintenance Withdrawal On Power8 Iron For 2024

    One thought on “Data Is Streaming Into The Cloud, Including From IBM i”

    • Hunza says:
      November 14, 2022 at 3:01 am

      Hura

      Reply

    Leave a Reply Cancel reply

TFH Volume: 32 Issue: 77

This Issue Sponsored By

  • Fresche Solutions
  • DRV Technologies
  • WorksRight Software
  • ARCAD Software
  • TL Ashford

Table of Contents

  • Data Is Streaming Into The Cloud, Including From IBM i
  • Court Throws Out Lawsuit From Chinese Power Systems Partner Against Big Blue
  • A Few More Power Systems Updates Before 2022 Ends
  • IBM Gives Temporary Reprieve For Power8 Core And Memory Activations
  • IBM i PTF Guide, Volume 24, Number 46

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • Big Blue Raises IBM i License Transfer Fees, Other Prices
  • Keep The IBM i Youth Movement Going With More Training, Better Tools
  • Remain Begins Migrating DevOps Tools To VS Code
  • IBM Readies LTO-10 Tape Drives And Libraries
  • IBM i PTF Guide, Volume 27, Number 23
  • SEU’s Fate, An IBM i V8, And The Odds Of A Power13
  • Tandberg Bankruptcy Leaves A Hole In IBM Power Storage
  • RPG Code Generation And The Agentic Future Of IBM i
  • A Bunch Of IBM i-Power Systems Things To Be Aware Of
  • IBM i PTF Guide, Volume 27, Numbers 21 And 22

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle