Finding IBM i’s Place In Data Fabrics And Data Meshes
December 5, 2022 Alex Woodie
The commercialization of the Internet kicked off an unpredictable series of developments in the computing world. One of those is the proliferation of Web applications, each with its own data silo, bringing chaos to data management and data governance. Application architects are now seeking to tame that data chaos through novel infrastructure architectures, including the data mesh and the data fabric.
While the lure of digital transformation gives companies hope that they, too, can benefit from breakthroughs in analytics, machine learning, and IoT, the data at most companies is in such a state of mess that in most cases it’s closer to a pipe dream than a real possibility.
Data fabrics and data meshes are two new ways companies are seeking to find a path through the data management and data governance problems afflicting the modern data-driven enterprise.
There are some similarities to data mesh and data fabric, but there are also important differences. Let’s explore each concept, and then see how IBM i might fit into each.
The concept of a data fabric was first conceived in the mid-2000s as a way of to unify the disparate collection of tools that database engineers and others use to manage data across an enterprise. These tools include data access, discovery, security, integration, governance, lineage, and orchestration.
Instead of letting each application team set their own rules governing things like data access and security, an enterprise adopting a data fabric (sometimes called a data plane) will have a single tool or suite of common tools to enforce access and security across its systems. It’s a federated technique that seeks uniformity in the rules and enforcement actions, which are applied against transactional systems (like the IBM i) as well as operational and analytic systems running on prem and in the cloud.
A key element of the data fabric is that users can interact with it in a self-service manner. Once the tools involved in the data fabric are configured (usually in a low-code or no-code way), then they can be enforced at a logical level via metadata.
Companies can cobble together their own data fabric using disparate tools, or they can buy a shrink-wrapped data fabric suite from vendors like Informatica and Talend, both of which have supported IBM mainframe and midrange systems with their ETL solutions (data quality and having a common set of data semantics is another big aspect of the data fabric).
IBM is bullish on data fabrics and sells a data fabric offering dubbed Cloud Pak for Data that can be deployed in the cloud or on prem. The OpenShift-based offering includes a variety of tools to handle the various requirements of a data fabric, including a data catalog, data replication, and connections to databases, including Db2 and third party databases. The company just released Cloud Pak for Data 4.6; you can check out the “what’s new” document here.
In its white paper Data Fabric Architecture Delivers Instant Benefits, IBM discusses how data fabrics address challenges in hybrid data management by “striking a balance between decentralization and globalization by acting as the virtual connective tissue between data endpoints.”
Inderpal Bhandari, IBM’s global chief data officer, is a big backer of data fabrics too. “At IBM, I’ve seen the value first-hand that a data fabric architecture provides when it comes to simplifying data access,” he wrote recently in an article in CDO Magazine.
“A data fabric can, for example, leverage AI to continuously learn patterns in how data is transformed to automate data pipelines, which makes finding data easier and automatically enforces governance and compliance,” Bhandari continues. “It significantly improves productivity, accelerates time-to-value for a business, and simplifies compliance reporting.
The data mesh is another popular new concept taking hold in data circles. The data mesh shares some similarities with the data fabric concept, including the desire to put an end to the endless headaches associated with having wildly different standards around data access, quality, security, especially when moving data between data warehouses and data lakes.
But there are important differences too. For starters, the data mesh concept is primarily focused on enabling independent teams of developers to work in a decentralized manner, albeit unified by some overarching principles. The data fabric, by contrast, is more technology-centric.
The data mesh concept was spearheaded by Zhamak Dehghani, who laid out many of her ideas in a May 2019 paper titled How To Move Beyond A Monolithic Data Lake To A Distributed Data Mesh.
Dehghani’s key insight was that data engineers cannot hardwire data transformation into technology. Instead, she thought data transformation should be a type of filter that’s applied on a common set of data that’s available to all users.
So instead of building ETL data pipelines that are brittle and will invariably break, in a data mesh the data is retained in roughly its original form, while a series of domain-specific teams take ownership of their data as they develop data products for use by the enterprise.
IBM i’s Role In Data Meshes and Fabrics
Companies that run IBM i often keep some of the most valuable and sensitive data on their IBM i server. While companies don’t usually run analytics or machine learning on the platform, the system is no stranger to ETL and the need for enterprise-strength data integration.
To that end, the IBM i server definitely has a role to play in the new generation of data fabrics and data meshes being devised. The data fabric’s roots in ETL and data security make the connection with IBM i more apparent.
In other words, by being just a bit more decisive and intentional about the need to create control points into all of the systems holding data silos – including extremely important silos like the Db2 for i database – data fabric and IBM i have a clear future together.
The IBM i opportunities around data mesh are less obvious at this point. IBM i professionals are not, for the most part, struggling to find their place amid the clutter of data lakes. Their role is clear: run the transactional systems as efficiently, safely, and securely as possible.
There’s not a lot of evidence that IBM i professionals are experimenting widely with the creation of data products at this point. While the time may come for IBM i shops to open the kimono and explore the potential for Db2 for i data to blossom amid a myriad of creative digital use cases, that’s far from the minds of most CIOs at this point, making the data mesh a solution for a problem that does not exist – not yet, anyway.