Informatica Aims to Virtualize Data with PowerCenter 8
by Alex Woodie
There has been a lot of talk lately about virtualizing hardware and running multiple operating system images on the same machine to make it easier to manage applications. Informatica is now applying that general idea to data transformations with a new version of its flagship PowerCenter product unveiled this week. With PowerCenter 8, the company is looking to break down the barriers to building pipelines that connect physical repositories of data and applications accessing the data.
Informatica's PowerCenter product began life as an extract, transformation, and load (ETL) tool for building data warehouses from data stored in the transactional systems of large corporations. While business intelligence is a booming area and continues to drive demand for ETL tools, Informatica is also seeing customers using PowerCenter in other ways, like migrating to new ERP systems and building "competency centers" that set the standard in a company for how data is managed, often with regulatory compliance as the driving factor.
But whatever the need for a data transformation tool like PowerCenter, the amount and type of data continues to explode, which means that PowerCenter needs to grow with it. This is what Informatica has sought to do with PowerCenter version 8, which has been in development for more than a year under the "Zeus" codename. The highlights of PowerCenter 8 can be broken down into three major new features, including data virtualization, support for unstructured data, and grid enablement.
The new data virtualization capability is expected to help companies complete tough IT tasks, like quickly integrating new data following an acquisition. PowerCenter 7, like many other data transformation products on the market, enabled companies to physically move data from one place to another, like from a DB2/400 production system to a SQL Server-based data warehouse.
With the new data virtualization capability in PowerCenter 8, data can remain on the physical production system, while also appearing to reside on other systems, says Ivan Chong, product marketing manager with the Redwood City, California, company.
For example, one of the early testers of PowerCenter 8 was a bank that has a growth strategy that involves making acquisitions. "The challenge they had is the call center had to immediately provide an aggregate view of the new customers across all systems," Chong says. "With PowerCenter 8, the developer can specify the application once, and then change the mode by which it's accessed, physical or virtual."
Data virtualization provides customers with greater flexibility, Chong says. While having the data housed locally may provide better performance, physically moving the data from one system to another takes time and effort from IT professionals. By virtualizing the data across all supported data sources and targets, customers enjoy more choices in terms of how they access it.
Not all pertinent data is stored in relational databases or other structured forms of information, such as XML. In fact, up to 90 percent of the information that companies use resides in unstructured or semi-structured formats, such as Word documents, Excel spreadsheets, Web pages, e-mails, HIPAA documents, and PDF documents, according to Informatica.
With the amount of unstructured and semi-structured data increasing dramatically, Informatica felt the need to increase this area of support within PowerCenter 8. Users can bring their unstructured data into the PowerCenter fold by using the product's parsing designer, which basically lets them do one-off mappings of certain specified documents.
For example, say a given company's business process requires it to access a credit report before clearing an order, Chong says. Building the pipelines that enable a user to quickly assemble the documents necessary to form a decision, including the Dunn and Bradstreet credit report, is easier with PowerTerm 8, he says.
PowerCenter 8 also includes new grid computing capabilities that will benefit users by making high availability a built-in feature of the product. While grid features were supported in the previous release, PowerCenter 8 includes more intelligence that enables the product to more effectively handle the grid resources users have allotted to their PowerCenter implementation.
Grid infrastructures make it easy for users to scale up on the hardware side by simply plugging another blade into the system, Chong says. "The problem is software hasn't scaled with the grid. A customer has to specify in applications where processing is to be applied. Machines come and go, data volumes spike, and every time, a developer must decide how to rework it," he says. "With PowerCenter 8, we allow developers to specify logic, but PowerCenter takes responsibility for detecting and exploiting where processing takes place." In short, this means that PowerCenter 8 now has built-in failover capabilities.
PowerCenter 8's new grid awareness has been put to the test at LinkShare, a company that connects e-commerce Web sites with advertising, and maintains a multi-terabyte data warehouse on a Linux DB2 cluster. "They did a beta test of our grid capability and verified the automated parallelization of grid infrastructure is inline with their requirements," Chong says.
In addition to these new features, PowerCenter 8 received several other enhancements, including the new "push-down" capability that enables data transformation processing to be sent to the target relational database, instead of being performed by the PowerCenter server. This release also brings new support for importing, creating, compiling, and debugging transformation routines that have been written in Java (previous releases relied entirely on visual templates, Chong says), as well as a new Web-based console designed to automated repetitive administrative and performance-tuning tasks.
While PowerCenter agents support many different targets and sources of data, including OS/400 applications and DB2/400 data, the PowerCenter engine installs on Windows, AIX, HP-UX, Solaris, Linux, and z/OS servers. The company has considered a native OS/400 port but currently has no plans to finalize such a product.
PowerCenter 8 should be available on a limited First Customer Ship (FCS) basis in December, with general availability planned for April 2006. The company goes through a FCS stage to ensure that real-world guidelines and reference implementations are in place before widespread adoption. The standard edition of PowerCenter 8 will start at $140,000, while customers will pay $40,000 more for the Advanced Edition, which is required for grid enablement and brings additional reporting, metadata analysis, and team development functions.
PowerCenter 8 marks the completion of the second stage of a three-part roadmap Informatica unveiled in February (see "Informatica Unveils 18-Month Roadmap for Enterprise ETL"). The final stage of that plan involves the development of "Hercules," the codename for the next release of PowerCenter. Initially slated for a fall 2006 release, Hercules, which is focused on completing the product's SOA story, has been pushed back to 2007, Chong says.