Does IBM i Need More Databases?
August 14, 2017 Alex Woodie
There are many things that differentiate IBM i from other platforms, including its storage, security, and programming models. But one of the most unique aspects of the IBM i platform is its integrated DB2 database, which is used exclusively by practically all IBM i customers. You just don’t find this on other platforms. But perhaps it’s time for IBM i to diversify its database support?
In many respects, DB2 for i is the platform’s greatest strength. IBM i is renowned for its transaction processing prowess that drives ERP and other line of business applications, and DB2 for i (formerly DB2/400, but do not call it that around Mike Cain) is the sturdy engine that powers those transactions.
It is theoretically possible to run an IBM i server without DB2 for i. Perhaps you want to serve HTTP requests in concert with a separate Oracle or SQL Server database running on an X86 server. But doing so would seem to go against the strengths of the platforms, let alone the laws of computing nature. In a heterogeneous setups, which are common in Oracle JD Edwards and SAP Business Suite worlds, the IBM i server nearly always powers the database, while mainstream X86 servers and their Windows and Linux OSes power the application and HTTP servers.
This database-centric identity has served the IBM midrange platform well for decades. The DB2 for i database is very mature and powerful, supports both SQL and DDS programming constructs, and is more ANSI SQL compliant that other relational databases (just as how, ironically, IBM i is more POSIX compliant than all Unix operating systems, even though it’s not Unix). What’s more, DB2 for i just keeps getting better and better with each new release of IBM i. So why in the world would an IBM i shop ever need something else?
It’s The Data, Stupid
The most answer to that question may be that the nature of data itself is changing, and therefore the ways that we store it need to change, too.
When IBM first developed the AS/400, storage (both main and disk) was expensive relative to today, and so companies wanted to minimize the amount they stored. IBM’s storage architecture reflected this reality, and therefore the data written to disk was highly refined, or structured.
This focus on structured data served the industry well, and a relational database management system (RDBMS) evolved to support the well-defined business processes defined and executed within the ERP systems that blossomed during the 1990s.
But as the years wore on and storage costs decreased, there was less of a need to optimize data to fit in small spaces, and data became less and less structured. Around the year 2000, we started to hear about this thing called extensible markup language (XML) that would dramatically open up the things we could do with our data and our applications by making data self-referenceable – that is, each piece of data, or XML document, would define itself through the accompanying metadata that came with it and was accessible through translators written with extensible stylesheet (XLS) language.
XML was supposed to dramatically simplify data integration challenges. But it didn’t work, for a variety of reasons, the most obvious of which was the slowness of parsing XML documents and the processor drain of running XLS programs into a format understood by DB2 for i. It was all a bit complicated, particularly for the traditional RPG programmer who valued simplicity and ease of use.
Rise Of NoSQL
JSON has largely overtaken XML as the favored data format used on the Web today. And it has also spurred the creation of a new class of non-relational database management systems, commonly dubbed NoSQL. The most popular NoSQL database is a document store called MongoDB that stores data in a JSON-like format called BSON. Couchbase is another NoSQL database that stores data in a JSON-like format and is very popular among developers.
To be sure, RDBMSes still dominate the industry. According to DB-engines, Oracle’s eponymous database, MySQL, SQL Server, and PostgreSQL – all relational systems – are the four most popular databases in the world. There is a long tail of installed RDBMSes around the world powering applications that cost trillions of dollars to develop. Those aren’t going away anytime soon.
But NoSQL databases are rapidly gaining market share for new applications, particularly for Web and mobile applications that are the hot ticket in IT these days. The big strides of the NoSQL movement can be traced squarely back to the fact that developers, given a choice, would much rather program their applications to read and write to a NoSQL database than to a relational database. That’s because NoSQL databases, with their focus on storing self-defining data types, lack a fixed schema, which makes it much easier to modify and adapt a database over time.
RDBMSes like DB2 are evolving to support some NoSQL tricks, including horizontal scalability and support for JSON. IBM is also in the early stages of adding support for JSON to DB2 for i, which may be proof that old dogs can learn new tricks (or a sign that DB2 for i isn’t such an old dog, after all). IBM’s JSON story is not complete when it comes to DB2 for i, and more features are expected with upcoming Technology Refreshes.
Why Other Databases?
While DB2 for i dominates the database story on IBM i, it’s not the only officially supported database on the platform.
Over a decade ago, IBM officially welcomed MySQL, an open source relational database created by Monty Widenius that’s used to power the majority of PHP applications, to its proprietary platform, largely as a way to goose the adoption of PHP. Today, owing to Oracle’s selfish behavior when it comes to MySQL – not to mention it’s ill-advised 2011 decision to drop support for MySQL on IBM i – another Widenius data store, dubbed MariaDB, is supplanting MySQL everywhere, including on IBM i.
However, MariaDB and MySQL are both relational databases. While they have the backing of the PHP, Perl, and Python development communities and will therefore streamline the adoption of applications written in those languages on IBM i, they don’t offer data storage capabilities that are dramatically different than what DB2 for i offers.
MariaDB and MySQL are the best examples of alternatives databases on the platform, but there are a few other projects you may not have heard of. One of them is the Inuendo project spearheaded by Christopher Burns, an IBM i developer who lives in New York state.
Inuendo is an associative database for IBM i that’s designed to dramatically simplify how data is stored, and therefore speed up transaction processing by a significant amount. Instead of building a database schema with dozens of inter-related tables, indexes, and access paths, the associative database flattens out the schema considerably, while relying on a unique set of identifiers to provide access to business entities stored in the database. While SQL is supported, access is primarily through a set number of APIs.
Another example of an alternative database on IBM i is the ERROS Connectionist Database, which was patented decades ago by Rob Dixon, an IBM i developer in the UK.
Dixon originally developed ERROS in support of a S/38 application he was developing called STIPPLE (System for Tabulating and Indexing People, their Possessions, Life and Everything), which was originally designed to provide a recorded history of fine arts. ERROS would need to catalog data about these entities in a manner that allowed future flexibility while respecting authorities and maintaining high performance.
As a “connectionist” database, it is particularly suitable for defining bi-directional relationships, where “navigation of these [relationships] in either direction without requiring a query language, the ability to store hierarchies of almost unlimited depth, with ‘children’ able to have multiple ‘parents,'” Dixon tells IT Jungle.
Most notably, instead of replacing DB2, as other databases do, ERROS can sit on top of DB2. The “Universal Data Structure” at the heart of ERROS stores all data definitions, application definitions, and authorities, as well as all user data, Dixon says.
“This allows incremental development, without detailed user specification, without physical file design or normalization, no SQL, and mostly without program coding,” Dixon writes. “It happens that I created ERROS to create an advanced system for the humanities, but ERROS is equally suitable for creating major internet ready systems in a commercial environment, including transaction processing.”
Databases As Tools
As the nature of the data we generate changes, so too do the database we use to store it. IBM i may have its strengths in transaction processing, and that is not going away anytime soon. But much of the data that organizations want to store today isn’t relational in nature, and doesn’t fit neatly within a RDBMS.
IBM is doing its best to adapt to these changes with DB2 for i, while others, like MariaDB, provide plug-in compatibility for programs developed in open source languages. Just as the spectrum of development tools available to the programmer has proliferated in recent years, so has the number and variety of databases.
In many ways, developers are picking and choosing databases – relational or NoSQL here, an associative or connectionist database there – in the same manner that they used to pick up and use Java, .NET, or PHP developer tools. IBM has done an admirable job of incorporating the latest development technologies into IBM i. To stay relevant in the changing IT space, it should make it easier for programmers to plug in different types of data processing engines into IBM i, too.