Home
TFH
OS/400 Edition
Volume 11, Number 45 -- October 28, 2002

Coglin Mill Connects RODIN to Other Databases, Shows iSeries Performance


by Timothy Prickett Morgan

One of the hardest things about building a data warehouse is finding the right tool to extract the information from various databases, transform it, and pump it into that data warehouse. On the iSeries platform, Coglin Mill has long since taken advantage of parallel database technologies built into OS/400 to provide a fast ETL (extraction, transformation, and loading) tool for data warehouses. With RODIN V4R1, the ETL tool can now interface with non-OS/400 databases, which should broaden the appeal of this tool among companies with heterogeneous database platforms.


The biggest change with RODIN V4R1 is the cross-platform data sourcing module. Until now, RODIN could extract information from DB2/400 databases on AS/400 and iSeries servers and then load it into a DB2/400-based data warehouse. The cross-platform data sourcing module allows the RODIN ETL tool to reach into other relational and flat-file databases and extract information from them for OS/400-based data warehouses. Specifically, the new module supports extractions from IBM's DB2 UDB for Unix, Linux, and Windows; IBM's DB2 for its mainframe operating systems; Software AG's Adabas; IBM's Informix databases for Unix and Windows; Oracle's Oracle8i and Oracle9i; Microsoft's SQL Server; and Sybase's Adaptive Server. The cross-platform extract module of RODIN V4R1 can also extract information from flat-file database structures, such as IBM's IMS, Computer Associates' IDMS (a clone of IBM's IMS), and IBM's VSAM databases. Coglin Mill is using native SQL to extract information from flat files and pull it into DB2/400, and it says that the extraction method it uses to talk to non-DB2/400 databases is faster than using the ODBC interface that most databases have and which is used by many ETL tools and cross-platform applications to allow incompatible databases to share information. Information can be extracted from all of these data sources in real time as information changes in production databases, or it can be extracted in a batch mode and staged for loading into data warehouses.

The cross-platform data sourcing module for RODIN will be available in December. Pricing for the module starts at $10,000, plus $5,000 for each database source connector. Each database--DB2 UDB, SQL Server, Oracle, and so forth--has a different connector that interfaces between those databases and the RODIN tool, and customers have to buy them separately.

The rest of RODIN V4R1 will be available in November, and it includes default support for SQL naming standards for users who are more familiar with SQL than they are with native OS/400 naming conventions. SQL naming conventions have always been supported in RODIN, but now it can be set up from the get-go to support SQL conventions. To Unix, Windows, and mainframe shops, OS/400 looks very unfamiliar, and SQL is an Esperanto that all of these platforms understand and, more significant, something that data warehouse administrators are already familiar with or are eager to learn because it is a useful skill. Coglin Mill says further that RODIN V4R1 has been enhanced with improved versioning and change control, so data warehouse administrators can keep better track of the data they have moved and the ETL definitions they have created. The company says that it has enhanced the performance of the RODIN tool and its client program, as well as beefing up its security. New data types are supported in V4R1, and the data dictionary in the latest releases of J.D. Edwards' OneWorld suite is also supported.

At the same time it announced RODIN V4R1, Coglin Mill released a detailed ETL performance white paper based on benchmark tests that the company ran at IBM's Teraplex Center in Rochester, Minnesota. Coglin Mill has run ETL tests on the top-end machine of every generation of AS/400 and iSeries machine since 1997. Back in 1997, a four-way AS/400 53S-2157 rated at 650 CPW was able to load about 7.6 million rows per hour, loading a single table; with the latest RODIN release running on a 32-way iSeries Model 890-2488 rated at 37,500 CPW, RODIN could load 520 million rows per hour using the same data set. On a separate benchmark run where RODIN was used to load two tables--one for detailed information, one for summary information--the same 32-way iSeries 890 server was able to load up to 625 million rows per hour.

Alan Jordan, vice president of development at Coglin Mill, attributes these results in large part to the parallel database technologies that IBM has woven into OS/400 and the AS/400-iSeries platform since 1995. RODIN, which is programmed in ILE RPG, has some clever tricks to take advantage of the parallelism in the box and other tricks as well, to be sure. But that native parallel database support in OS/400 makes the iSeries a screamer for data warehousing. Incidentally, this parallel database support would allow an iSeries cluster of 32 of IBM's 32-way Model 890s to be linked together to handle a theoretical load rate of 15 billion rows per hour using RODIN; that's a load rate of about 5 TB per hour. A few iSeries customers have two- or three-node machines in distributed database environment, says Jordan. But because IBM has added faster processors and extended the symmetric multiprocessing capabilities of the AS/400 and iSeries, such gargantuan OS/400 server clusters have not been necessary. It's nice to have the headroom, though. You can download the full benchmark report at www.coglinmill.com/benchmark.

In a further testament to the iSeries, RODIN's test on the iSeries platform smoked a test on essentially the same iron, but bearing the pSeries label running DB2 UDB and Ascential Software's DataStage XE tool. (Ascential is the company created out of the data transformation portion of Informix, which IBM didn't buy.) Ascential tested a 24-way pSeries machine and attained a load rate of about 177 GB/hour, and it reckoned that a 64-way pSeries configuration (presumably two clustered 32-way Regatta servers) would be able to handle load rates of over 300 GB/hour. Coglin Mill's Jordan says that Ascential had to partition the data in its test database by hand into 192 separate tables to parallelize the load for DataStage XE. RODIN running on OS/400 did its run without tuning, and on a single source database file, which is how real companies work. Coglin Mills says that load rates of 500 million rows per hour, or about 172 GB/hour, are attainable on 32-way iSeries machines without any database partitioning or tuning using RODIN, and this would presumably translate to around 344 GB/hour on a clustered iSeries Regatta configuration similar to the theoretical pSeries 690 configuration that Ascential cited it its benchmark, which could handle about 312 GB/hour--and only then after mucking around in the database tables.


Sponsored By
FAST400

What makes IBM different from Microsoft regarding Fast400??

What is Fast400?

You are hearing a lot about Fast400 aren't you? But what is Fast400? Fast400 is a "tuning" product for the iSeries. Fast400 will allow an iSeries server to utilize the available CPW for interactive processing. IBM would have you believe that these interactive cards that cost thousands to millions of dollars, actually add value to your server. By buying Fast400, you do not ever need to buy anther interactive card for your iSeries. For a free demonstration of Fast400, please visit www.fast400.net .

Why Fast400?

A few years ago Microsoft would not let other software companies build tools to work with the Windows operating system. Microsoft did all kinds of scurrilous things to stop other manufacturers software from working on their platform. They would put code in the base operating system that prevented other companies code from working properly. IBM even had these issues with Operations Navigator. In the early days of Operations Navigator, the developers in Rochester had to scrap early versions because Microsoft did not want IBM leverage on what was proprietary to them. Netscape also had a few problems using the Windows operating system.

The result

Now we all know what happened to Microsoft. After spending tens of millions of our tax dollars in the trial, the US government told Microsoft that they were acting as a monopoly and what they did was not right or fair.

The similarity

IBM is doing exactly the same thing to Fast400 as Microsoft did. IBM has changed the operating system of the iSeries 400 to prevent Fast400 from working. In fact this has been done several times now, and each time the Fast400 developers produce a new fix to circumvent the IBM action. Why does IBM do this? because Fast400 takes money out of IBM's pocket. The potential for IBM to make billions from its user base, for delivering virtually no product is tantamount to corporate deception! Did IBM change the operating system when EMC introduced a low cost storage solution for the iSeries?

The future

The cat and mouse game between IBM and Fast400 is already a year old. Every time IBM changes the operating system to disable Fast400, the developers of Fast400 produce a new version within days to enable it again. Does Fast400 have a commercial agenda? Of course it does. Fast400 is in business to provide its clients with added benefits, which will maximise the interactive performance of iSeries 400 servers. And as we are a business, why shouldn't we charge a nominal fee for that service? A fee that our clients see as being fair and proper. After all, it's not Fast400 that is making enemies in the user base. As long as IBM wants to play "David and Goliath" we will continue to "out" the giant. Fast400 is not running, you can be assured!!

For more information, please visit www.fast400.net.


THIS ISSUE
SPONSORED BY:

Aldon Computer Group
Jacada
Midrange Direct
iTera
Quadrant Software
FAST400
RJS Software Systems
Coglin Mill


BACK ISSUES

TABLE OF
CONTENTS
WebSphere Is Not the Only Game in Town, and That's Good

IBM's New PowerPC 970: Just What the iSeries Needs

VP of iSeries Biz in the Americas Explains His Plan

Admin Alert: Deciphering Group PTF CD Labeling

The iSeries Can Compete in an Intel Iron Age

Coglin Mill Connects RODIN to Other Databases, Shows iSeries Performance

Mad Dog 21/21: The GUI Lag Archipelago

But Wait, There's More...


Editor
Timothy Prickett Morgan

Managing Editor
Shannon Pastore

Contributing Editors:
Dan Burger
Joe Hertvik
Kevin Vandever
Shannon O'Donnell
Victor Rozek
Hesh Wiener
Alex Woodie

Publisher and
Advertising Director:

Jenny Thomas

Advertising Sales Representative
Kim Reed

Contact the Editors
Do you have a gripe, inside dope or an opinion?
Email the editors:
editors@itjungle.com



Last Updated: 10/28/02
Copyright © 1996-2008 Guild Companies, Inc. All Rights Reserved.