Forget Oracle 10g. Let's Talk About i5/OS V5g
by Timothy Prickett Morgan
I haven't been around the computer business as long as many of you, but I have been around long enough to see a lot of technologies come around on the guitar a time or two. As I have said in the past when talking about new technologies, show me a new idea in IT and I will show you a twist on a very old idea. A television ad by Oracle, which has just such a twist, has been getting under my skin lately.
And that twist has given me an idea of how to take the i5 System for SMB I proposed in last week's issue up another notch and make a scalable, rock-solid, affordable computing platform that goes right for the jugular vein at Oracle and Microsoft as they try to peddle what some have called diagonally scalable systems.
Being an advanced platform, the AS/400-iSeries-i5 line has had more than its fair share of technology breakthroughs, and as such, you would think that it would be the dominant business application platform in the world. Back in June 1995, just as Big Blue was rolling out its first PowerPC-based servers--and the first 64-bit server platforms IBM ever delivered, I might add, and I would add further the PowerPC "Cobra" chip was designed by the Rochester Labs and is the great-great-granddaddy of the Power5--it also introduced two new cool technologies with OS/400 V3R6/V3R7 (the first RISC versions, if you will remember) called DB2 Multisystem and DB2 SMP. This story is not going to get into the guts of these features, which have been around for a decade now, but rather suggest how they can be tweaked and used in a new way. I also want to thumb my nose a bit at Oracle, which didn't even invent Oracle 9i Real Application Clusters or the Oracle 10g kicker to that.
First, let's talk about that Oracle commercial. Because Oracle has lots of money to blow on buying up software companies as well as advertising, you have probably seen the same 15-second spot I have seen like a zillion times:
"The Oracle Grid. A group of low-cost servers connected by Oracle software. Now, if a server fails, the grid just keeps running. The Oracle Grid. Runs faster. Costs less. And never breaks."
Throughout this little speech, there are a bunch of pizza-box servers connected by a giant light beam, which symbolizes the clustery-gridiness of the Oracle 10g setup. "Wow," you are supposed to think, "isn't that Oracle just so clever?" Here's the deal on Oracle 10g. After Compaq bought Tandem Computers back in 1996 (a provider of clustered Unix servers in a fault-tolerant configuration with a cluster-aware database integrated into that Unix) to take a run at the enterprise server market, it didn't get very far, so it bought Digital Equipment, which had a similar thing in its VMS and Tru64 Unix. Because Compaq needed the cash and was seeing a huge number of Windows-Intel server sales supporting Oracle and other applications, it had some goodies in Tru64 and VMS that it could sell to Oracle: its cluster file system and the clustering technologies that made the Tru64 and VMS platforms the envy of a lot of server providers. Compaq sold its jewels to Oracle, and the clustering technology was embedded at the database layer instead of at the system layer. That code was eventually commercialized as Oracle 9i RAC, and it has been tweaked for a second rev with Oracle 10g. Because of the goodies Compaq supplied, Oracle 9i RAC and Oracle 10g are light years ahead of the older Oracle Parallel Server editions, which were a nightmare to administer and rarely used except for in the largest installations in the world back in the 1990s.
We're all familiar with vertical scaling of servers, which means making a bigger and bigger symmetric multiprocessing (SMP) machine that tightly couples a bunch of processors and their cache and main memories into a giant, virtual single system image. And we are also familiar with horizontal scaling, which means replicating copies of servers with the same software and allowing them each to take a piece of the work, such as on a Web server farm. With diagonal scaling, which is a relatively new term even though the idea is not so new, vendors clustering databases running on many machines for the sake of performance as well as availability. That's what Oracle 10g does, and it is also what IBM's "Stinger" release of DB2 Universal Database, which is version 8.2, does with the Integrated Cluster Environment (ICE). And, it is also what an i5 server could do with some tweaking with DB2 Multisystem. And IBM has another cool feature called DB2 SMP for DB2/400 that would be very cool to throw into the mix. While DB2 Multisystem allows database tables for applications to be spread across multiple physical machines that have been clustered (using OptiConnect or some other fabric, like InfiniBand, which is due in the future Power6 servers), DB2 SMP allows an SQL query is used for parallelizing queries across the processors in an SMP cluster. DB2 SMP is a tightly coupled, vertical scaling feature, while DB2 Multisystem is a loosely coupled, horizontal scaling technique. DB2 Multisystem is, as far as applications are concerned, utterly transparent. This is important.
While we are all well aware that the iSeries has clustering for high availability, the third-party solutions in the OS/400 market that do clustering are not clustering for performance. They surely can offer this functionality in conjunction with IBM, and they sure can offer an easy-to-use front end for it as they do for the HA clustering technologies that are embedded inside OS/400 today. And I think that this is exactly what IBM should do. I want i5/OS 5g, and I want to put it on machines very much like the i5 System for SMB boxes I offered as a potential addition to the iSeries lineup last week.
I know, I know. IBM already offers big, scalable, wonking Power5 servers that offer more bang than any servers ever created. This is great, but let me tell you this: there is a reason Dell doesn't care about big SMP servers: SMB customers don't buy big SMP servers. They buy smaller machines, and they buy lots of them. They don't do this for love of Dell, they do it for the love of their own money. Those SMP boxes that I get all excited about when I discuss them are awfully expensive to develop and sell. Pizza box servers offer reasonable pep individually, but clustered diagonally, they can be clustered to offer approximately the same performance at about a third to half the cost for a given sized OLTP workload.
You heard that right. There's a reason Dell is in love with Oracle. They both need each other to make this whole Oracle Grid idea fly. Dell doesn't have a mainframe, OS/400, or Unix server business, it doesn't have scalable SMP boxes, and it doesn't develop much of its own server technologies. So it needs Oracle. And Oracle, which wants to continue growing and beat Microsoft in the SMB market, knows full well that there is no way it is going to talk SMB shops into buying SMP Unix servers and putting Oracle on them. SMB shops will do everything in their power to avoid that moment, no matter how much server makers correctly point out that logical partitioning on a fairly small SMP box is a better option.
Now, here is how IBM can beat Oracle at what it claims to be its own game. Imagine, if you will, that DB2 Multisystem were used with an extremely fast interconnection fabric (again, think of the fastest InfiniBand today or the better stuff coming next year). Imagine that this interconnection is so fast that it can be used to link many pizza-box style i5 Systems for SMBs into a cluster for diagonal scaling. Imagine that IBM stops charging $25,000 per server for DB2 Multisystem, but gives it away for free on the hypothetical i5 System for SMB boxes.
By doing so, you could have an unclustered, standalone product line that spans from about 1,600 to 6,000 CPWs. If IBM allowed customers to cut the servers in half and do high availability clustering across OS/400 partitions (as I suggested last week), that would be a lot of computing and a lot of reliability. But the single system is still a single point of failure, and that is not good. Even still, it would probably only take about 40 of the two-core i5 System for SMB model 800 servers (rated at 6,000 CPWs each) to match the raw OLTP performance of the 64-way i5 595 server. (Provided the interconnection fabric was fast enough.) It might cost $2 million for such a 40-server configuration at the prices I suggested, which would be for 320 GB of main memory and about 80 TB of mirrored disk capacity, or about $50,000 a month with interconnection fabric thrown in. A 64-way i5 595 with 512 GB of main memory with 5250 features activated costs $8.7 million at list price, and that is without storage. What I am suggesting might be adding complexity, but it sure does cut the price tag, now doesn't it?
If diagonal clustering were transparent--like relational database programming was originally in the System/38 and the AS/400 and still is today--then IBM would have another weapon in its quiver with which to take out the iSeries competition. It is hard to say what the theoretical scaling capability of a potential i5/OS 5g might be, but raw performance is not the point. Transparent, easier to use clustering for performance and high availability is. And IBM absolutely would have to get all of the current iSeries HA vendors on board with this diagonal clustering and let them compete to offer the best management tools and add-on features for it--much as they do for the integrated HA clustering in OS/400 and i5/OS today. What I do know is that such diagonal clustering of the current i5 520 Express machines is not feasible, since they are too limited in their performance and too expensive in their price. To do this, the iron, the operating system and database, and the prices all have to change.
The important thing is to meet Oracle 10g head-on, and do it better. And then, for the love of all that is holy, spend some big money on some 15-second television ads that people actually see. Here's one:
"The i5 Grid. Take the most reliable server on the planet. Extend its integrated relational database to the grid. Make it as inexpensive as a Wintel box, and if you don't know what that is, we will make it so you never care. The i5 Grid. Now thousands of applications can run faster and more reliably than is possible on any other platform, bar none. Don't just get one. Get a bunch, and then take that long lunch you never got when you used Windows servers."