QlikTech Adapts In-Memory Analytics for External Big Data

October 23, 2012 Alex Woodie

QlikTech made its mark in the business intelligence field by simplifying the BI experience and delivering results quickly from an in-memory associative database. But with the advent of “big data,” the company’s total reliance on in-memory technology was challenged. Last week, the company unveiled a new Direct Discovery mode that allows customers to process large data sets stored externally on disk, while keeping the associative data model in tact.

QlikTech has ridden the in-memory wave quite successfully over the last decade. While its big BI competitors like Oracle, SAP, and IBM have developed or acquired in-memory database technologies to complement their heavy online analytical processing (OLAP) products, QlikTech has focused on enhancing and selling QlikView, its in-memory reporting tool that was rolled out in 1993. QlikTech went public in 2010, and today counts more than 26,000 customers and more than $300 million in annual sales. The Pennsylvania company isn’t the largest BI vendor, but it has influenced the direction toward simpler, smaller, and more nimble BI systems.

Then came Big Data, and everything changed. Even midsize companies, like the ones that QlikView catered to, now want to analyze massive stockpiles of data to make an insight and use it for competitive advantage.

This presented a challenge for QlikTech. It wasn’t so much that big data sets couldn’t be loaded into QlikView. You can get a Windows server equipped with multiple terabytes of RAM, and use QlikView’s compression algorithms to squeeze all that big data into its database.

The big problem was that customers’ big data sets were of questionable value, it changed often, and wasn’t accessed regularly. One of QlikTech’s customers, for example, has billions of insurance claims from which to pull information. But loading all of the claims into memory simply wasn’t cost effective, since it would require a big hardware upgrade to enable that capability. This approach is also anathema to QlikTech’s smaller, nimbler mantra.

So the QlikView developers went out and created Direct Discovery, a new feature that allows QlikView users to see external data sources on their screens, and to query those data sources for answers, just as they are used to with in-memory data. While the data is not in memory, the associative data model remains intact with the external data. This means that users continue to benefit from the “green, gray, white” color-coding of query results that shows them which categories of data fell outside of their query, thereby giving them the context to ask more intelligent questions the next time.

Elif Tutek, technical product marketing manager for QlikTech, last week briefed IT Jungle on how Direct Discovery works. “You may have SAP data or some Facebook information, or maybe Teradata or Google Big Query. You don’t want to bring that data into memory as a part of the in-memory data model, but you still want to make it available to users,” she says. “With Direct Discovery, you can merge big data with other data sources. And as a user, I can still leverage the associate experience that allows me to ask the next question on the external data sets as well.”

Direct Discovery, which is enabled as a keyword during the ETL process, uses standard ODBC connections to load external data sets as they are needed. For some bid data sources, such as Teradata, QlikTech has developed a custom connector (also announced last week). Standard ODBC should work for loading data from standard relational data stores, like DB2, Oracle 11g, MySQL, SQL Server, as well as newer big data stores, like Hadoop, Cassandra, Google Big Query, and others. Even big data sets stored in DB2/400 can be accessed.

Performance will not be as snappy when a user is perusing data in the Direct Discovery dashboard, since the data is being accessed directly from disk. In a QlikTech test, Direct Discovery was able to return a query of more than 3 billion rows from a Teradata data warehouse in about three to four minutes. This is obviously much slower than the sub-second response that’s typical with QlikView’s in-memory technology. But it would take hours for competitors to return the same query, Tutek says.

“I think this will help truly to solve the problem with big data,” she says. “The performance with in memory will always be much faster. That’s why we truly would like to position this as a hybrid approach where people will be leveraging the in-memory power of QlikView, and also access external data sources as well.”

Direct Discovery also includes a data caching mechanism that customers can use to force QlikView to refresh the external data at regular intervals. The data caching enables any data already loaded to be used for further analysis within QlikView. But if customers want the latest real-time information, they can set the threshold very low and force QlikView to continually refresh the data.

Tutek expects Direct Discovery, which ships in December as part of QlikView version 11.2, to be very complementary to the new data governance dashboard that QlikView launched several weeks ago. The data governance dashboard shows a demonstrator which data is being used, what selects users are making, and what metrics they’re using.

Together, these tools will allow customers to make decisions about their big data usage. “Maybe, as a BI team leader, I just want to see how people will be using that data,” Tutek says. “Then by using our dashboard, I can see what type of data people are using, and then maybe I can decide that if they’re very frequently using Direct Discovery against a piece of data, to put it in memory, because the performance will be better.”

                     Post this story to del.icio.us
               Post this story to Digg
    Post this story to Slashdot

Tags:

In this 3-Part webinar series, we will lay out a plan for a Phased and Step-by-Step Approach to DevOps. Starting with a simple IBM i workflow in ARCAD’s “classic” DevOps suite, we’ll upgrade to Git for source control along with Azure DevOps (or Jenkins) for automated CI/CD. And finally, we’ll introduce testing and code quality tools to achieve a complete DevSecOps pipeline on IBM i.

Don’t miss this Series to see how to smooth your DevOps adoption on IBM i!

Part 1: “Classic” without Git
Thursday, April 11 (30 minutes) 12pm ET
Speakers: Ray Bernardi and Alan Ashley

Take the first step in a practical IBM i approach with “classic” ARCAD for DevOps (without Git). Migrate from an in-house process or upgrade from a legacy tool to one that is more functional, automated, and extensible. Join us to see:

A fully automated workflow through analysis, version control, build and deploy
Concurrent versioning
Key strengths that include a rich cross-reference, ‘smart’ build, advanced support for SQL and ILE, and workflow options that range from simple ‘out of the box’ to customized.

We’ll set the stage for Part 2, ready for pipeline automation and Git with the same components, same workflow and same IDE!

Part 2: Git & CI/CD
Thursday, April 18 (30 minutes) 12pm ET
Speakers: Jeff Tickner and Alan Ashley

Take automation to the next level with Git and an orchestrated CI/CD pipeline! In our example, we’ll use Azure DevOps, but you can achieve the same with Jenkins, Bamboo, or another tool of your choice. Whether you use RDi, VS Code or 5250, upgrading to Git is easy, and we keep your same workflow and metadata. Join us to see:

A seamless move to Azure DevOps (or Jenkins. . . ) on IBM i
Easy concurrent development with Git
How to turn open projects into branches

Spoiler alert — .net developers love the integration and the visibility of active changes

Part 3: Quality & Test
Thursday, May 9 (30 minutes) 12pm ET
Speakers: Ray Bernardi and Alan Ashley

Build out the full pipeline with automated unit testing and code quality checking. ‘Shift defects left’ to detect errors early in the development phase, from either RDi or VS Code. Join in to see:

Rich ARCAD iUnit features on both modular and monolithic code,
Built-in Quality and Security code checking with ARCAD CodeChecker and its 100+ predefined rules.
How to make quality a “continuous process” in your development team

Register Now

Manta’s IBM i COBOL Training Trifecta Thanks For The (Higher Priced) Memories?

Table of Contents

Content archive

Recent Posts

Subscribe

Pages

Search