Talend Adds i5/OS Support to Open Source ETL Tool
October 23, 2007 Alex Woodie
A software company called Talend is trying to do for extract, transform, and load (ETL) data integration tools what MySQL and SugarCRM have done for databases and CRM applications–namely, leverage open source development techniques, free downloads, and a procession of chargeable add-ons, to build a successful business. Talend came one step closer to that goal yesterday when it announced support for DB2/400 databases with Talend Open Studio version 2.2.
Talend Open Studio is a free download that helps organizations move and integrate disparate sources of data. Uses of the tool are as varied as real world application and server topologies. It can be used to move data from DB2-based ERP systems used for sales and accounting to Oracle enterprise data warehouses. It could be used to replicate that Oracle data to smaller SQL Server data marts used for business intelligence reporting, or to transform that data to e-commerce applications residing on a MySQL database. With version 2.2, support has been added for inputting and outputting data to and from DB2/400 tables, using SQL.
The basic Talend Open Studio download is designed to help individual users at a small- or medium-sized business to move and integrate their disparate sources of data, while more sophisticated capabilities cost a fee. The basic product includes about 150 connectors providing support for all major databases, enterprise applications like SugarCRM and Salesforce.com, file formats like XML and Excel, and transport mechanisms like FTP and HTTP. Just about everything in the tool is performed using drag-and-drop motions, including workflow modeling and data mapping and transformations, the job modeler, and the metadata manager. At runtime, the product generates either Java or Perl executable code, which provides the SQL connectivity to the sources and targets of data. (Obviously, for the System i, a user would choose the Java executable, while a Unix or Linux shop might choose Perl.)
Talend Open Studio supports both ETL and ELT (extract, load, and transform) functionality, making it the only product that supports both forms of data integration, claims Yves de Montcheuil, the company’s vice president of worldwide marketing. ELT is particularly useful for customers that prefer to execute the data transformations on the source of the data, de Montcheuil says.
There are few restrictions on the use of Talend Open Studio, as it is licensed under the GPL open source license. For an annual subscription of several thousand dollars, customers can purchase add-ons that enhance the basic tool. The Talend Integration Suite is such an add on. This product brings elements such as shared repository, a job conductor, an activity monitoring console, and a CPU balancer, and is geared toward larger organizations that need to coordinate the movement of data across several sites. Users can also purchase a subscription to Talend On Demand, which delivers access to the software as a service. Talend also charges for technical support, as per the models set by successful purveyors of open source software, such as Linux operating system vendor Red Hat and JBoss, the developer of open source Java application servers (which was recently bought by Red Hat).
Talend was created by two veterans of the enterprise data integration market, Bertrand Diard and Fabrice Bonan, in 2003. The two Frenchmen worked on Talend Open Studio, which was developed in the Eclipse Java IDE, for about three years, and posted the first public beta of the product in the second half of 2006. Financing for the startup was provided in early 2006 by AGF Private Equity and Galileo Partners, two respected European venture capital firms.
Since getting the product off the ground, the company has expanded by opening an office in Los Altos, California, and more recently, a sales office in Germany. Up to this point, Talend Open Studio has been downloaded 150,000 times, and nearly half of these downloads are occurring in North America. The company has also attracted a community of about 500 individuals who are active on the Talendforge.com Web site. Talend uses these community members primarily for testing beta versions of new releases and rooting out the bugs. Most core development work is done by a team of 30 Talend employees who work at the company’s headquarters in France and at a offshore development shop in China.
While it can be tough to gauge how many of these 150,000 free downloads make it into real world production use, Talend estimates that, based on the fact that 30 percent of users are receiving automatic updates, that there are anywhere from 30,000 to 50,000 users of the tool around the world. “That means they’re opening the product on a regular basis,” de Montcheuil says. “It might be that they like the look of the UI, but most likely it’s because they are doing some kind of work inside it.”
Yesterday Talend announced the general availability of Talend Open Studio version 2.2. This new release brings a number of new enhancements, such as support for DB2/400 and more than 90 other enhancements across both the Perl and Java versions, including; support for stored procedures across DB2 UDB, MySQL, SQL Server, Oracle, and Sybase databases; support for reading and writing entries to an LDAP server; support for Windows Vista; support for vtiger (a developer of open source CRM software based in India); and support for JDBC data access, among many other features.
With version 2.2, Talend is also releasing Activity Monitoring Console /Personal Edition. This chargeable add-on gives users the capability to monitor the distributed execution of all data integration jobs from one centralized tool, and to receive error notifications and to detect bottlenecks before they occur. These sorts of capabilities are also available with Talend Integration Suite, but with the new Personal Edition, users don’t have to pay for all those other features to get activity monitoring.
Talend’s support for ETL processing (as opposed to ELT) makes the product attractive to iSeries and System i customers, de Montcheuil says. “We have large deployments in banks and insurance companies, and those people have a problem. The data has to stay on the AS/400s–they need to process it locally, and the ELT approach that Talend Open Studio uses allows you to do that,” he says. “You can generate a program that will process the data inside the AS/400 without asking it to leave to go through a transformation engine to get back. So that’s a critical piece of our AS/400 strategy is to allow local processing of the data inside the AS/400.”
One Talend customer making good use of Open Studio’s System i support is ETAI, a €60-million French company that serves as a clearinghouse of information about automobile parts. Because there is no standard data format used by automobile makers and third-party part manufacturers, ETAI needed a great degree of extensibility as it strived to consolidate information about 50,000 different types of vehicles, half a million parts, 300,000 images, and 30 million links between parts and vehicles, according to Philippe Bobo, director of software and information system at ETAI.
This data resided across MySQL, DB2/400, Access, SQL Server, Oracle, Excel, XML, and other formats, while data from manufacturers usually arrived in complex flat files. ETAI houses its technical database is hosted on Oracle and deployed via MySQL, according to ETAI, which runs its own backend applications on the System i platform. “Talend Open Studio’s proven data integration capabilities enables us prevent inaccuracies and ensure our clients and their customers are receiving the most up-to-date and accurate information,” Bobo says.
Talend Open Studio version 2.2 is available now. For more information and free downloads, visit the company at www.talend.com.