In Search of Multi-Modal Data Integration
May 5, 2021 Alex Woodie
As the number of data sources that companies use goes up, so too does the complexity of the data pipelines that bring it all together. Instead of using a mix of different tools and technologies for ETL, ELT, and batch integration, some users are seeking a single multi-modal data integration tool that can deliver multiple integration approaches.
One company providing multi-modal data integration is Equalum. The company’s DataOps offering works in various methodologies, including capturing real-time database updates using change data capture (CDC) technology; batch-based extract, transform, and load (ETL) and ELT; and REST- and SOAP-based data integration using web services and microservices.
“You have to be able to support batch integration,” Equalum founder and CEO Nir Livneh says. “You have to support streaming ELT, which is what’s happening these days and is popular with data warehouses and data lakes. Then you have to also have to support streaming ETL, which means you’re doing the data massaging and transformation within the pipe itself.
“That’s called a multi-model approach,” he continues, “because if you can come up with a technology that can handle all three use cases and support that in one architecture, you’re saving costs, you’re creating a single point of management and maintenance. And the entire project becomes something that’s more future proof on top of it.”
Much of what the company does starts with CDC. It developed its own CDC technology to capture binary data from relational databases, which enables it to get the freshest possible data out of the database. It supports Db2 for i and Db2 for Linux, Unix, and Windows (LUW), in addition to other popular databases, like Oracle, SQL Server, MySQL, and Postgres.
Other CDC tools take shortcuts to get to relational data, but Equalum decided to invest the time to build its CDC technology the right way, Livneh says. For example, Oracle’s LogMiner can be used to extract data from Oracle’s database, but it’s limited to about 1,000 transactions per second, which is not sufficient for today’s environments.
“There are other ways of doing it, like network sniffing,” says Livneh, who lead data integration product development at Quest Software before founding Equalum in Tel Aviv, Israel, in 2015. “But the best one is binary data extraction. We make sure we do that with our sources.”
Once Equalum’s software has captured data, then it enables customers to do something with it. In many instances, the company will want to perform some type of transformation upon the data before loading it into a destination, which could be an on-prem data warehouse or one running in the cloud. This is the traditional ETL style workflow, which most IBM i shops will be familiar with.
“You can take your real time data and you can start to do real time transformations on the data while it’s in flight — correlations, joins, aggregations, manipulating the data, masking the data — anything you want with the data,” Livneh says.
The other option is ELT, where the data transformations will occur in the destination. This has become a common data integration technique for customers who are amassing their data in cloud data lakes and cloud data warehouses. Because these cloud providers have separated compute and storage tiers, it has become more cost-effective for customers to process the data in the clouds, rather than prepping the data beforehand.
Equalum supports both use cases with its software, which is based on open source infrastructure components Apache Spark and Apache Kafka. The company uses Kafka as a real-time message bus to move data as it flows from sources to destinations; it is not used when batch integration methods are used. Equalum uses Spark to run transformation routines upon the data (primarily for ETL-style workflows, since the transformation step will be performed inside the cloud with the ELT workflow).
Customers could build their own ETL and ELT data integration workflows using Spark and Kafka, as many companies have done. However, while Spark and Kakfa provide are powerful and modern technologies, it’s not necessarily simple to run the computing clusters that Spark and Kafka require.
With Equalum, customers get the data integration benefits of Spark and Kafka, but without the management headaches that they bring with them, Livneh says.
“You don’t have to worry about a Kafka broker going down. You don’t have to worry about Spark out of memory executors or any type of problems that you would run into,” he says. “That’s the beauty. By the way, you can access the Kafka or Spark system if you want. We give you that ability. Sometimes people want to use Spark or Kafka for their own needs. We give you that ability. But this entire thing is managed by Equalum. You don’t have to manage that if you don’t want.”
The software runs atop Linux and has been tested on Linux distributions like CentOS and Red Hat. Companies can deploy it on-prem or in the cloud. The software doesn’t deploy by default on Kubernetes and Docker, but Equalum has that capability, Livneh says.
Equalum is one of a handful of young companies seeking to make a name for itself in the burgeoning big data world. The company has attracted $25 million in venture capital, along with customers like Warner Bros. and Siemens.