tfh
Volume 19, Number 13 -- April 5, 2010

Data Warehouse Mistakes Begin with i Avoidance

Published: April 5, 2010

by Dan Burger

If you ask Greg Veal what platform is best for creating a data warehouse, he'll tell you it's whatever platform you are using for your operational data. For readers of The Four Hundred, that platform is the IBM AS/400, which is officially referred to as the Power System running IBM i these days. Veal is intimately familiar with the IBM i. His experience dates back to the System/32, but his ideas are thoroughly modern.

From the viewpoint of a consultant who specializes in data warehousing projects and works on those that are big enough to warrant the hiring of an outside expert, Veal says the number of shops building data warehouses is increasing. However the number of shops building them on the i platform is dwarfed by those that choose to build them on Microsoft SQL Server.

"The most popular route is to port data to SQL Server," is Veal's observation.

"The three choices are: leaving the data on the i, which I believe is the best option; moving it to Oracle, which is the second best option except that it is expensive; and moving it to SQL Server. I used to think of SQL Server as a third-rate database, but it has come quite a ways and is pretty capable now."

Of the data warehouse projects he sees, all of which have an IBM i as the main operational system, close to 100 percent involve moving data from DB2 for i, the integrated database in i5/OS and i, to SQL Server.

The problem with that strategy is that companies have a staff that knows IBM i and that knowledge is being wasted in many cases. There is a lack of understanding on the part of IT directors and business executives regarding the platform, the skills of the staff, and the benefit of using familiar tools and languages.

"Unfortunately, they don't consider building the warehouse on i. They consider Oracle and SQL Server and they believe that Oracle is a little more robust, but they believe SQL Server will do the job for a lot less money," he says.

Cost is certainly a consideration, but it can be misleading. IBM, as it often does with projects that it controls, helps propagate the belief that a data warehouse is an expensive proposition. Big Blue prefers to build its data warehouse projects using a framework that allows many plug-in features and automation for large-scale data transformations. The big enterprise shops will sign up for these types of packages and hire IBM services to come in and do the project.

Much can be accomplished without the highest levels of complexity and expense, Veal says. And the project manager in him offers some advice on how projects should be funded internally.

"All too often, IT funds these projects. That is a huge mistake," he says. "The business or the department that benefits from the project should take responsibility for the project and pay for it. For instance, who knows how valuable adding another column of data is? The user does. If it's on the IT budget, the added column idea might get cut. If it's on the beneficiary department's budget, it might get added."

A better chance for success with a data warehousing project comes from involving business users. Staff is often working overtime doing two jobs while the project is in progress because the end users get involved in recommendation committees and helping with analyses. When the project isn't owned by the executive in charge of the department that benefits, problems arise and IT will take the heat and get hit with the expenses.

"Politics factor into these things, and it works much better when IT is a paid assistant in the effort," Veal says. "IT should be responsible for infrastructure and delivering the data accurately, but should not be the stewards of the data or the owners of the data. The business users validate whether the answers they are getting are right. It needs to be their project."

It's not uncommon for companies to ask for cost estimates and completion schedules prior to launching into a data warehouse project. That's good business sense, right? In reality, Veal says, those estimates are almost always faulty. The person providing the estimate will typically tell the customer what they want to hear. No one should be surprised when projects don't come in on budget and on schedule. If estimates were accurate, they would not often be approved. As it is, jobs are frequently awarded to the best liar.

Good project management can draw a box around one data mart and certain requirements, but projects seldom follow the original blueprint. "Scope creep," those factors that expand the requirements and the cost of the project and the timeline for getting it done, can be counted for changing the final project enough to allow low-ball cost estimates to recover expenses that may have been left out of original estimates.

Pitfalls? Sure, there will always be mistakes made and toes will be stubbed. And all projects have some natural fluidity. Getting a handle on the variables, however, is important. Wading into the deep end of the pool before learning to swim is always more risky. You might want to consider a lifeguard.

"I have a bias because I am a consultant," Veal admits, "but at the same time, you need to find someone who has done it successfully and has a track record. You want to make sure that knowledge transfers to your staff. And you bring in extra bodies, if necessary, to do the maintenance work while you get some of your internal people working on this project. When you own this project, it is a living thing. It doesn't end because once people realize the power and the potential benefits, the project grows. If you get into this in a big way, understand that you will need someone for eternity--not just for the duration of the project. This person needs to be knowledgeable about building and managing data warehouses and how to optimize performance. You need a database administrator who knows data warehouses. Data warehouses are not built the same way as databases."

This is a fairly typical scenario that Veal sees: Someone or some group within an organization says, "Hey, we could do some really cool reporting things and get away from printed reports or PDF conversions of those reports. We could actually have a system that allows the user to put in criteria. It can be done with SQL Reports and Reporting Services. So let's put our data on SQL Server. We can do these reports that are more flexible for the user."

That starts a project rolling. The company has staff with some SQL Server skills and they start putting ideas together without an in-depth look at the options. There's an assumption that data is simply moved from one place to another and then users get a reporting tool. After the wheels are put in motion, the realization comes that data transformation is necessary and that someone needs to understand DB2 and how to use the SQL Query Engine.

Here's another truism about data warehouse projects. When it comes to data cleansing, Veal says, expect to spend three to four times the effort than you first think you need to spend.

"This depends on what you are coming from," he says, "but if you are coming from homegrown systems, you often have big-time troubles. People have data that will not go into a data warehouse." As Veal points out, data warehouses require hierarchical structures. If-then logic can be done with code, but a data mart doesn't have code. It has data. For data to be transferred into a data mart, it needs to be accessible without business logic stuck in the middle.

One form of data "dirtiness" is data that no longer has the relations that allow it to be asked any question and come back with an accurate answer.

"I am a huge advocate of writing your own programs to do data transformation," Veal says. "Why? The vendors will say power users should do the definitions for the transformation. But you don't want power users to be rule-based programmers every time a rule gets changed. The users will have to know a language. Each time they do it, they have to remember how to do it. Programmers, on the other hand, do it everyday, so it's easy for them to make rule-based changes. Almost nobody's data is so simplistic that the tools can do a good job with rules-based transformations. The vendors provide hooks for writing programs to handle the exceptions, but if you have to write RPG programs to handle the exceptions, and have users do the rest, why not let people do it in the language they know? I believe in custom ETL. It is cost efficient and it gets done better."

There probably should be a lot of similarities from one data warehouse project to another, but there isn't. Different companies have different ideas about what they want to do, how much they want to spend, what they think they need.

It bothers Veal that regardless of the size and scope of the project, most shops are unaware of the capabilities and the power of IBM i. Although it's considered a workhorse, it's known as a business machine that does transaction processing. Many of its most ardent proponents have not kept up with the advances the operating system and the hardware. This is symptomatic of companies that are not planning for the future.

"I believe we are in an information age--an information economy--and companies need to strongly informationalize, which means building data warehouses and taking advantage of business intelligence. If companies want to thrive, they need to invest in becoming knowledge organizations. Employees should get accurate and timely information quickly and easily. The thinking that all that's needed are dashboards for the top-level managers is shortsighted. They are not seeing how powerful information in the hands of all the workers."

Veal regularly presents sessions on data warehousing topics at the COMMONannual meetings. He's also a subject matter expert for COMMON's Business Intelligence/Database course of study, and an instructor at the RPG World conferences. For more information on him see: www.gregveal.com.


RELATED STORY

Data Warehouses: Know One When You See One?



                     Post this story to del.icio.us
               Post this story to Digg
    Post this story to Slashdot


Sponsored By
PROFOUND LOGIC SOFTWARE

It has arrived! A native GUI for RPG.

Register to attend our April 21st educational
webinar and learn how to visually build Web 2.0
rich user interfaces with RPG.

See a live demonstration of our native
graphical user interface platform for
RPG developers - Profound UI.

Register for the FREE webinar today.


Editor: Timothy Prickett Morgan
Contributing Editors: Dan Burger, Joe Hertvik, Brian Kelly, Shannon O'Donnell,
Mary Lou Roberts, Victor Rozek, Kevin Vandever, Hesh Wiener, Alex Woodie
Publisher and Advertising Director: Jenny Thomas
Advertising Sales Representative: Kim Reed
Contact the Editors: To contact anyone on the IT Jungle Team
Go to our contacts page and send us a message.

Sponsored Links

Help/Systems:  Robot/CONSOLE monitors System i resources automatically
Patrick Townsend Security Solutions:  Find the encryption solution that's right for your enterprise
COMMON:  Join us at the annual 2010 conference, May 3 - 6, in Orlando, Florida

 

 

IT Jungle Store Top Book Picks

Easy Steps to Internet Programming for AS/400, iSeries, and System i: List Price, $49.95
The iSeries Express Web Implementer's Guide: List Price, $49.95
The System i RPG & RPG IV Tutorial and Lab Exercises: List Price, $59.95
The System i Pocket RPG & RPG IV Guide: List Price, $69.95
The iSeries Pocket Database Guide: List Price, $59.00
The iSeries Pocket SQL Guide: List Price, $59.00
The iSeries Pocket Query Guide: List Price, $49.00
The iSeries Pocket WebFacing Primer: List Price, $39.00
Migrating to WebSphere Express for iSeries: List Price, $49.00
Getting Started With WebSphere Development Studio Client for iSeries: List Price, $89.00
Getting Started with WebSphere Express for iSeries: List Price, $49.00
Can the AS/400 Survive IBM?: List Price, $49.00
Chip Wars: List Price, $29.95


 
Four Hundred Stuff
CoralTree Toolkit Streamlines CGIDEV2 Web App Development

Info Builders Updates Dashboarding Solution

Unitrends Adds Cloud-Based Storage to Backup Offering

HiT Bolsters Data Synchronization Tool

LTO Tape Drives: More than 3 Million Served

Four Hundred Guru
Variable Procedure Calls in Free-Format RPG

Global Temporary Tables and Host Variables

Implementing 128-Character Passphrases in i/OS

Four Hundred Monitor
Four Hundred Monitor's
Full iSeries Events Calendar

System i PTF Guide
March 27, 2010: Volume 12, Number 13

March 20, 2010: Volume 12, Number 12

March 13, 2010: Volume 12, Number 11

March 6, 2010: Volume 12, Number 10

February 27, 2010: Volume 12, Number 09

February 20, 2010: Volume 12, Number 08

TPM at The Register
IT losses despite US economic thaw

Cray nabs $45m nuke lab petaflops super deal

SGI peddles cut-down Altix UV supers

PCs, servers push Micron to profits

IBM goes elephant with Nehalem-EX iron

Red Hat injects RHEL with new iron love

Blue Coat virtualizes WAN appliance

Intel (finally) uncages Nehalem-EX beast

Chip makers slammed in 2009

Red Hat previews Terabyte-bustin' next virtual machine

Ex-contender for top IBM job pleads guilty on securities charges

AMD claims no premium for four-way chips

THIS ISSUE SPONSORED BY:

Help/Systems
CCSS
Profound Logic Software
inFORM Decisions
WorksRight Software


Printer Friendly Version


TABLE OF CONTENTS
IBM Holds i 6.1 Prices Steady, Slashes Application Server Fees

The Power7 Chip Gets Some Stiff X64 Competition

Data Warehouse Mistakes Begin with i Avoidance

Mad Dog 21/21: When Price/Performance Outruns Elasticity

Rimini Street Counter Sues Oracle

But Wait, There's More:

Reader Feedback on Madoff's RPG Coders Indicted in Ponzi Scam . . . iManifest Regroups, Plans to Meet at COMMON . . . IBM Promotion Cuts PowerVM Hypervisor Upgrade Fees . . . Oracle Squeaks Out Growth, Promises Revenues from Sun . . . Gabriel X64 Server Survey: Brother, Can You Spare Some Time? . . .

The Four Hundred

BACK ISSUES




 
Subscription Information:
You can unsubscribe, change your email address, or sign up for any of IT Jungle's free e-newsletters through our Web site at http://www.itjungle.com/sub/subscribe.html.

Copyright © 1996-2010 Guild Companies, Inc. All Rights Reserved.
Guild Companies, Inc., 50 Park Terrace East, Suite 8F, New York, NY 10034

Privacy Statement