tfh
Volume 18, Number 4 -- January 26, 2009

Data Warehouses: Know One When You See One?

Published: January 26, 2009

by Dan Burger

People may have problems defining a data warehouse. They may have problems designing and building a data warehouse, too. But that hasn't slowed down the desire to get a better grasp on data and use it more effectively to drive the business and preserve revenues. "What many people describe as a data warehouse is actually a data mart or something else," says Bill O'Connell, chief technology officer of data warehousing within IBM's Information Management division.

What distinguishes a data warehouse is an enterprise design and a definition of the business problem to be solved, O'Connell says. The emphasis is on consolidating data in an environment that can grow in a linear, scalable way from the low terabytes up to the hundreds of terabytes. It means being able to do heavy-duty data mining to operational analytics within the same system.

Although data warehouse projects are often associated with large enterprises and big budgets, this is a notion disproved by many companies in the SMB arena. "There is a cost involved in building an enterprise data warehouse system," O'Connell says. "It's not just IT infrastructure or the cost of the warehouse. You have to consider data governance, stewardship, business processes, and your organization structure. All those factor into the maturity of the system and the business."

Large enterprises were leading the data warehouse adoption early on because they could afford it, but the motivation came from understanding how these projects could ultimately be used to drive business.

From where O'Connell sits, he sees data warehouses reaching the SMB shops "where they have always done analytics. I wouldn't call it warehousing, but that's changing. They are building small warehouses now."

Putting an accurate gauge on the current data warehouse market isn't particularly easy. The latest IDC report on data warehouse software is based on numbers from 2007, which seems like eons ago. The figures speak loudly, however, showing a 15 percent gain in software revenues compared to 2006. Sales of data warehousing software also delivered double-digit increases in the previous two years.

Based on the revenue numbers relating to 2007, IDC ranked IBM number one in the category of data warehouse generation tools. Big Blue was followed by SAS Institute, Informatica, Microsoft, and Oracle. In the data warehouse management tools category, again based on revenue, the leader was Oracle. Following was IBM, Microsoft, Teradata, and SAS.

The complete list of data warehouse software vendors is indicative of the thriving nature of this market. These revenue leaders are the top of the pyramid, and it's no coincidence that they design, market, and sell databases in almost every case.

"The future of the data warehouse platform software market remains bright," says Dan Vesset, vice president of business analytics research at IDC. "As various business intelligence and analytics projects remain high on the priority lists of organizations of all sizes, the demand for data warehouse platform software to support these business intelligence and analytics projects is likely to continue to grow."

Among System i users, data warehouse projects probably lag behind the market as a whole, but all the same reasons exist for implementing a project.

"From its beginnings as the AS/400, the box has enabled users to do a lot of things in terms of reports and queries that other platforms and databases were unable to do or could only do with great difficulty," says Alan Jordan, vice president at Coglin Mill. "For a long time, the AS/400 was a head of its time and maybe people came to rely on that too much and for too long. Data warehousing and business intelligence technology has advanced and left many of the iSeries and System i users behind."

"A lot of organizations think all they need is a reporting tool," Jordan continues. "They lived with Query/400 for years. And now there are all sorts of reporting tools with great capabilities. But there is a reason for having a well-designed business intelligence architecture, which is what a data warehouse is.

Bill Langston, the director of marketing at New Generation Software, has a slightly different perspective on what the System i user is looking for. His view is more toward the line of business user or the departmental user, an area where data marts are popular.

"We find companies are very focused on a combination of real-time analysis and reporting based on live, production data in areas like shipping, logistics, inventory, and customer service and historical performance trends, in areas like finance and sales," Langston says. "Data marts are the preferred way to obtain the historical information. Mid-market companies, especially today, don't have many dedicated business analysts and even senior managers in these companies are often very hands-on when it comes to day-to-day operations. As a result, they tend to be more tactical in their perspective and much more cautious about data warehousing."

The terms data warehouse and data mart are often used interchangeably, which is fine with some people and argued about by others. Without going down that road too far, let's just say that you should always make certain you are on the same page with whomever you are talking with when these terms come up.

"The whole issue with the System i is you want to exploit that operating system," O'Connell says. "Because it's now running on Power hardware, users can put AIX and Linux on it. The benefit is that we can integrate databases to the operating system and it becomes a 'black box' approach."

The types of data warehouse implementations that O'Connell typically encounters involves multiple platforms, but the System i is no stranger. He calls IBM's low-end SMB customers the System i "sweet spot," particularly in Europe. Most often the data warehouse handles less than 10 terabytes of information in a departmental situation rather than an enterprise-wide system.

"A lot of companies rely on the i because it is very simple and it is a hands-off box," O'Connell says. "If you are going to take data off an i and do analytics off of that, and decide what applications will be brought into a single warehouse, you will put the data warehouse on the i. The people have the skills and are used to it. In an enterprise-class warehouse, I'm bringing together many different sources, and many different lines of business and different departments--some running on i, some on x, some on p, and some mainframe--that's a different game."

"We wouldn't use a System i to build a warehouse that scales up into the tens of terabytes of raw data," O'Connell says. "Those are very complex and are usually done on System p or System x. They use a lot of small servers and they grow the warehouses linearly by adding servers."

Of course, there is no good reason why you could not use DB2 clustering technology to cluster a bunch of low-end Power Systems i boxes together to make a big database engine to run a data warehouse on. DB2 Multisystem exists, and has for 13 years.

O'Connell says the task of implementing a data warehouse is no more difficult on a System i than any other platform.

"I can do anything [in terms of data analytics] on a System i that I could do on a System x or System p," he says. "Some of the tools may not run on every server, but in a client server relationship, I can run the tools somewhere else. The tools can be accessing data on the i no matter where they are running."

There has been some feedback from System i users that they would prefer tools that run native, however.

Building data warehouses can be fairly simple or very complex. It depends on what O'Connell calls the maturity of a warehouse. A first phase might be a system that does ad hoc and batch reports. This differs from what most organizations are doing now, because building a data warehouse requires scrubbing the data--eliminating the "garbage in, garbage out" factor. As the data warehouse "matures," it gains the capability to analyze data. It provides the why something happened in addition to telling you what happened. As the data warehouse advances and more functionality is added, it can do predictive and discovery analysis as well as matching and mirroring data, which means taking past data and comparing it to on-the-fly, current data.

Often overlooked in the excitement of building a data warehouse and gaining valuable business intelligence capabilities is the taking care of business side.

"You can't just build an enterprise warehouse right away," O'Connell warns. "To do this right requires changes in business processes, changes in the business itself, and in organizational structure. All these things must happen in parallel. Helping customers move a long as fast as possible is what we do. Learn to deal with their data, sunset old applications, consolidating environment, reconciling the data, understand the data in relation to data governance, publishing that data around metadata representation so the business can see it. All that must happen as well. The technology is the easy part. But what it takes to exploit this is much more complex."

Among the misconceptions of what a data warehouse truly is, the big picture concept is one thing that is often missing.

Data warehousing is not business intelligence or a fancy derivative of queries and reports. In Jordan's words a data warehouse "stores all the relevant information that can be used for business intelligence. It is detail-level information without the irrelevant data such as control fields and other meaningless codes that are useless from a business analysis perspective. The data becomes understandable and error free. It's quality control on data."

Correcting misconceptions and fine-tuning definitions so that everyone is speaking the same language is part of the process O'Connell goes through with just about every customer. Beyond that, he has a few questions and some all-purpose advice ready.

"When I work with a customer, I look at two things," he says. "Let's determine where you want to be in five years. Then start building toward that. We always build a warehouse one application at a time. That way we get value back quickly. The cost of putting up the initial stages of a warehouse should be short. It should be running in three months. And you get value right away. Then you add another application or another function and then that's up and running in another three months. And value follows that. When it stops growing, it becomes a cost center. But it should always be growing and applications should always be added to increase value. We have warehouses that have dozens of applications going live every week."


RELATED STORIES

Coglin Mill Debuts Lower Cost Versions of ETL Tools

Data Quality Tool from AMB Now Supports i and z/OS Platforms

IDC Gives 2006 Report Cards for Data Warehousing Vendors

iSeries Shops Have Choices for Business Intelligence Tools

IDC Ranks Data Warehousing and Business Analytics Tool Vendors



                     Post this story to del.icio.us
               Post this story to Digg
    Post this story to Slashdot


Sponsored By
LOOKSOFTWARE

TOUGH ECONOMIC TIMES ARE THE RIGHT TIMES TO MODERNIZE AND REUSE!

In times of economic downturn, more organizations look to modernize their apps rather than to develop or replace. Reuse-based modernization projects require significantly less funding, can be implemented more quickly, are less risky and deliver greater cost savings and faster ROI than the alternatives.

Examples of how reuse-based modernization reduces costs and improves productivity:

Desktop integration can automate back-end to front-end integration
Transactions can easily expose web services for 'any-to-any' integration and automation
Modernization can significantly reduce error rates, and reduce training times
Reduce costs by streamlining the call center experience
Deliver a new self-service channel like a customer web site, for example
Reduce people costs by integrating with technologies like voice, IVR and unified comms
Improve productivity by offering 'anywhere, anytime' access to your core applications
Automate your supply chain by creating plug-n-play services
Improve business user productivity by delivering your application in their most productive
   interfaces like Outlook or Notes
Any time business users enter data, becomes an automation opportunity

Don't reinvent the wheel!

See live, practical examples and demonstrations of real customer System i applications being reused and extended with smart, thin and mobile user interfaces - without changing any code!

View the On-Demand Webinar NOW!

www.looksoftware.com


Editor: Timothy Prickett Morgan
Contributing Editors: Dan Burger, Joe Hertvik, Brian Kelly, Shannon O'Donnell,
Mary Lou Roberts, Victor Rozek, Kevin Vandever, Hesh Wiener, Alex Woodie
Publisher and Advertising Director: Jenny Thomas
Advertising Sales Representative: Kim Reed
Contact the Editors: To contact anyone on the IT Jungle Team
Go to our contacts page and send us a message.

Sponsored Links

Vision Solutions:  Journaling for System i resilience. Learn more.
looksoftware:  Tough economic times are the right times to modernize and REUSE!
Profound Logic Software:  Learn how to pick the right modernization approach. FREE Webinar!

 

 

IT Jungle Store Top Book Picks

Easy Steps to Internet Programming for AS/400, iSeries, and System i: List Price, $49.95
Getting Started with PHP for i5/OS: List Price, $59.95
The System i RPG & RPG IV Tutorial and Lab Exercises: List Price, $59.95
The System i Pocket RPG & RPG IV Guide: List Price, $69.95
The iSeries Pocket Database Guide: List Price, $59.00
The iSeries Pocket Developers' Guide: List Price, $59.00
The iSeries Pocket SQL Guide: List Price, $59.00
The iSeries Pocket Query Guide: List Price, $49.00
The iSeries Pocket WebFacing Primer: List Price, $39.00
Migrating to WebSphere Express for iSeries: List Price, $49.00
iSeries Express Web Implementer's Guide: List Price, $59.00
Getting Started with WebSphere Development Studio for iSeries: List Price, $79.95
Getting Started With WebSphere Development Studio Client for iSeries: List Price, $89.00
Getting Started with WebSphere Express for iSeries: List Price, $49.00
WebFacing Application Design and Development Guide: List Price, $55.00
Can the AS/400 Survive IBM?: List Price, $49.00
The All-Everything Machine: List Price, $29.95
Chip Wars: List Price, $29.95


 
Four Hundred Stuff
Jobscope's Customer Focus is Made-to-Order

What's Next from IBM Lotus?

Infor Shows Flexibility as Reseller Channel Evolves

i OS Jobs Spawn Anew with Halcyon's Updated Scheduler

*noMAX Supports i OS Disk Encryption for HA

Four Hundred Guru
Redundant Join Criteria: Good or Bad Idea?

Do Your File Specifications Lie?

Trouble-Shooting WebSM to HMC Connectivity Problems

Four Hundred Monitor
Four Hundred Monitor's
Full iSeries Events Calendar

System i PTF Guide
January 24, 2009: Volume 11, Number 4

January 17, 2009: Volume 11, Number 3

January 10, 2009: Volume 11, Number 2

January 3, 2009: Volume 11, Number 1

December 27, 2008: Volume 10, Number 52

December 20, 2008: Volume 10, Number 51

TPM at The Register
Citrix resurrects King George as hypervisor

Programmers take to the clouds

HP and Microsoft trumpet blade marriage

IT vendor layoffs: The axeman cometh

Rackable gets physical with the virtual

Citrix and Intel go to bare metal to virtualize PCs

IBM defies hardware woes with record 2008

Red Hat revs Enterprise Linux distro

Cisco 'California' blade server launch imminent?

ToutVirtual goes agnostic on virtualization management

IBM reaches out to SAP, RIM with Notes

IBM helps partners punt software to midrange shops

US stimulus bill smiles on IT

'Miracle' plane crash was no miracle

THIS ISSUE SPONSORED BY:

Profound Logic Software
MKS
looksoftware
VAULT400
RJS Software Systems


Printer Friendly Version


TABLE OF CONTENTS
i Roadmaps: Here Be Dragons

IBM Closes 2008 on a High, i Sales Unclear

Data Warehouses: Know One When You See One?

The X Factor: Head in the Clouds

UNICOM Acquires Macro 4, Sees i OS Synergy with SoftLanding Tools

But Wait, There's More:

IBM Layoffs Started Last Week; Time for a New Kind of Corporation . . . IT Workers Conflicted, Dice Salary Survey Reveals . . . IBM Opens App Services Center in Michigan, Support Center in Iowa . . . Dataram Offers Try and Buy Deal for Server Memory . . . BOSaNOVA Outlines the Green Effects of Thin Clients . . .

The Four Hundred

BACK ISSUES




 
Subscription Information:
You can unsubscribe, change your email address, or sign up for any of IT Jungle's free e-newsletters through our Web site at http://www.itjungle.com/sub/subscribe.html.

Copyright © 1996-2009 Guild Companies, Inc. All Rights Reserved.
Guild Companies, Inc., 50 Park Terrace East, Suite 8F, New York, NY 10034

Privacy Statement