A Peek Inside IBM's Smart Analytics System
Published: August 10, 2009
by Timothy Prickett Morgan
While The Four Hundred was on holiday at the end of July, IBM hosted a shindig at one of its adjunct T.J. Watson Research Center campuses in Hawthorne, New York, which is just a 30-minute drive from my house in Upstate Manhattan. And even though I was busy with a bunch of non-work work--you know what I am talking about here, people--I decided to put on some clean clothes and go listen to some of the top brass at Big Blue talk some about this business analytics and optimization (BAO) opportunity that IBM is chasing and the systems it is deploying to chase it.
The fact that IBM decided to shell out $1.2 billion to buy statistical and predictive analytics software vendor SPSS that very morning I was visiting IBM was actually a coincidence. But just the same, the SPSS code is going to play a big part in this BAO opportunity and the systems that the company is going to build to do predictive analysis on real-time business data. I was amused no end that no one, and I mean no one, at the event had any clue that IBM might even be interested in SPSS, much less that it was in the process of buying it, which only goes to show either how much we need predictive analytics or how much this stuff is actually snake oil.
But seriously, I don't think the BAO market that IBM has defined as distinct from the business automation segment (meaning, enterprise resource planning, supply chain management, customer relationship, management, and that mission-critical, back-end stuff) is some kind of mirage, and IBM definitely needed some heavy duty and predictive analytics to accomplish its goals for building BAO boxes, as I have come to call them since first hearing about IBM's plans back in May at its annual IT and Wall Street analyst day.
As we go to press on Friday (August 7), it looks like IBM might have to fight to keep SPSS now that it is in play. The rumors are already swirling around that Oracle, Hewlett-Packard, SAP, and Microsoft might offer SPSS shareholders a better deal, setting off a bidding war. The kind of capabilities that IBM wants to put into the Smart Analytics System is something all the big players want--and need--to sell. And that, by the way, puts privately held SAS Institute, which had $2.26 billion in sales in 2008, into play as well. Whether SAS wants to be acquired is another matter--the company has resisted that temptation for more than three decades.
So what is in this future-predicting machine that IBM wants to sell you? A lot of familiar technology, all highly integrated and optimized for the particular job at hand, and more importantly, sold under a single product number with six different configurations, making the process of buying it easier than buying piecemeal parts, and having a single means of supporting the entire stack, including twice-yearly tunings by Big Blue's engineers to ensure that the data warehouse and analytics code is running at its most efficient level.
Let's start with the hardware and software. The underlying iron in the Smart Analytics System is familiar enough: a Power 550 server with half its complement of processor cores--two dual-core Power6+ chips running at 5 GHz--and 32 GB of main memory. Each server node in the shared-nothing database cluster has two dual-port Gigabit Ethernet ports, with two being used for managing the server nodes or extracting or loading data into the data warehouse and two being used to cluster the boxes so they can talk to each. Each server also has two dual-port 4 Gb/sec Fibre Channel adapters to link out to a DS5300 disk array, which is cross-coupled to four Power 550 nodes and which is in turn back-ended by eight EXP5000 disk drawers for active data and another EXP5000 that has hot spare drives. You build up the Smart Analytics System by cookie-cutting multiple Power 550, DS5300, and EXP5000 boxes together. The servers are cross-linked to multiple DS5300 arrays through redundant SAN40B switches and the servers are linked to each other and to the outside world through EX4200-48T Ethernet switches from Juniper Networks. Each server has a bunch of 146 GB 15K RPM disks, and the DS5300s and EXP5000s use the same drives such that each Power 550 server has 32 disk drives of its own to play with for data warehousing and others for supporting operating systems and applications. The disks are protected with RAID 5 algorithms and also have hot spares, and the data warehouse has a 4 TB user space to play with.
Now, I know what you are thinking. Because this BAO box is a Power 550, that means it could use i 6.1 and its integrated DB2 for i database as the foundation of the data warehouse. It could also use Linux and the DB2 variant for that operating system, too. But, alas, the machine is based on AIX 6.1 at Technology Level 2 and Service Pack 3 and DB2 9.5 at the Feature Pack 4 level. IBM is using its General Parallel File System (GPFS) V3.2.1, the 64-bit implementation of its parallel file system, to support the data warehouse underlying the Smart Analytics System. On the data warehouse, each Power 550 supports four logical data partitions, or LDPs, with each having one processor core, 8 GB of memory, and eight disk drives (1.17 TB) of disk capacity associated with them. (The LDPs are database partitions, not logical partitions carved up with the PowerVM server virtualization hypervisor that comes with Power Systems iron.) The basic system also has IBM's Tivoli System Automation V184.108.40.206 software, and then adds on IBM's InfoSphere Warehouse 9.5.1 and Cognos 8 BI Server, BI Samples, and Go Dashboard 8.4 FP2 software.
In the base configuration of the Smart Analytics System, customers have two Power 550s supporting data for the warehouse (one active and one a high availability backup) plus a management node running the Tivoli software and another being used as an administrative node for the software. I am no expert, but this seems to be an incomplete configuration, even if it does take up two racks of space with the servers, storage, and switches. There are six different configurations of the Smart Analytics System, which range up to a hefty box with 19 racks of iron, including 53 data nodes, a slew of standby gear, and giving the analytics applications a 200 TB user space. (This is the XXL size, and the entry configuration is known as the XS size. Each successive size above the S "T-shirt" sized BAO box basically doubles the size of the user space from 12 TB to 25 TB (M), to 50 TB (L), to 100 TB (XL), to 200 TB (XXL). These configurations support between 100 and 5,000 named users, but only 50 concurrent users except for the largest XXL setup, which is rated at 100 concurrent users.
In terms of performance, the BAO boxes are showing scale as well as better performance compared to prior setups running the InfoSphere Warehouse and Cognos software. How's this for scale: IBM is working with Northrop-Grumman on a version of the BAO box for some spook agency of the U.S. government that has 200 TB of active data and 20 PB (that's petabytes) of archived data that is capable of handling 20,000 queries per day. Most companies don't need that kind of scale for their data warehouses and analytics, of course. But they definitely want a setup that comes in the door ready to suck data out of their production systems and start fielding up some answers to complex questions. On the Cognos Mixed Marketing benchmark test, the Smart Analytics System, with all of its tunings and optimizations as well as database compression and storage tweaks, was able to handle three times as much work and do so with 50 percent less floor space than a "best practice" Power Systems cluster running the same Cognos tools.
IBM is not talking about price, but I get the feeling that all of the dough that customers might have paid consultants to design, configure, and integrate the servers, storage, and software in a data warehouse with analytics extensions is not going to be passed down to customers as a discount when they buy a BAO box. No, my friends, this Smart Analytics System is going to be an AS/400 in terms of how it is sold (even if it doesn't support i 6.1 and its version of DB2). And that means, I am guessing, that IBM will charge more than the sum of the parts because of the lower cost of running the Smart Analytics System and the optimizations it has put in there to make it sit up and bark.
Still, it would be nice to see a blade version of this running i 6.1 and DB2 for i as the data warehouse, aimed at midrange i shops who really don't want to use AIX unless they have to. It could yet happen, if you all start yammering in IBM's ear and stamping your feet a little.
IBM Gets Hybrid with Servers, Talks Up BAO Boxes
IBM turns back on server history: To give and to hybrid (The Register)
Beep, Beep: Roadrunner Linux Super Breaks the Petaflops Barrier
IBM Aims for Server Expansion in 2008, Including System i Reincarnation
Brazilian Game Site Chooses Hybrid Mainframe-Cell Platform
IBM's Plan for an Adjacent, Custom Systems Market