Where Is DB2 BLU Accelerator For IBM i?
April 15, 2013 Timothy Prickett Morgan
IBM has created a neat new database feature for its DB2 database for Linux, Unix, and Windows operating systems that will hopefully make its way into the integrated DB2 for i database that resides inside the IBM i operating system. For now, this BLU Accelerator feature, which can radically speed up the sifting through data, is only available for DB2 10.5 and only for reporting and analytics, but there is every reason to believe Big Blue will put it on the IBM i and mainframe versions of its DB2 database and use it to help goose transaction processing.
Like other IT vendors, IBM wants companies to think that every bit of data that they generate or collect from their systems or buy from third parties in the course of running their business is valuable, and the reason is simple. This sells storage arrays, and if you can make CEOs think this data is potentially valuable, then they will fork out the money to keep it inside of various kinds of data warehouses or Hadoop clusters for data at rest or in InfoSphere Streams systems for data and telemetry in motion. There is big money in them there big data hills, and with server virtualization pulling the rug out from underneath the server business in the past decade, hindering revenue growth, the funny thing about these big data jobs is that none of them are virtualized and based on the massive amounts of data they need to absorb every day, they keep swelling like a batch of yeast.
IBM is not making any promises about bringing BLE Accelerator, which can goose analytics queries by between a factor of eight and 25 times while at the same time reducing storage capacity needs for data sets thanks to columnar data compression, to other databases, but Tim Vincent, who is chief architect for DB2 on the Linux, Unix, and Windows platforms, who is an IBM Fellow, and who is chief technology officer for IBM’s Information Management division, hinted pretty strongly. “We do plan on extending this,” Vincent said at the BLU Accelerator launch in early April, “and we are going to bring the technology into new products going forward.”
So what exactly is BLU Accelerator? Well, it is a lot of things. First, BLU implements a new runtime that is embedded inside of the DB2 database and a new table type that is used by that runtime. These BLU tables coexist with the traditional row tables in DB2, and have the same schema and use storage and memory the same way. The BLU tables orient data in columns instead of the classic row structured table used in relational databases, and this data is encoded in such a manner (using what Vincent called an approximate Huffman encoding algorithm) that has an extra feature whereby the data is kept in order so it can be searched even while it is compressed. The BLU Accelerator has a memory paging architecture so that an entire database table does not have to reside in main memory to be processed, but the goal is to use the columnar format to allow the database to be compressed enough so it can reside in main memory and be much more quickly searched. But again, it is not required, like some in-memory database management systems, and you can move chunks of a BLU database into main memory as you need to query it. The BLU Accelerator knows about multiple core processors and SIMD engines and vector coprocessors on chips, and it can take advantage of these units to compress and search data. The Actionable Compression algorithm, as IBM calls it, is patented and allows for data to be used without decompressing it, which is a neat trick. The accelerator feature also can do something called data skipping, which means it can avoid processing irrelevant data in a table to do a query.
Here’s the compare and contrast between the way DB2 works now, with all of the snazzy features to improve its performance that have been added over the years, and the way the BLU Accelerator feature works:
OK, I am not a database expert or a comedian, but that is funny. The freaky thing about BLU Accelerator is that it does have database indexes. You don’t have to do aggregates on the tables, you don’t have to tune your queries or the database, and you don’t have to make any changes to SQL or database schemes. “You just load the data and query it,” as Vincent said at the launch of the product.
The reason that you don’t need a database index is that data is compressed so a BLU table can, generally speaking, reside in memory. Vincent said that 80 percent of the data warehouses in the world had 10 TB of capacity, so if you can use the Actionable Compression and get a 10X compression ratio, then you can fit the typical data warehouse in a 1 TB memory footprint. But there are more tricks that speed up those database queries, as you can see here:
Once you have compressed the data so it all fits into main memory, you take advantage of the fact that you have organized the data in columnar format instead of row format. So, in this case, you put each of 10 years of data into 10 different columns each, for a total of 100 columns. And when you want to search in 2010 only for a set of the data, as the query above–find the number of sale deals that the company did in 2010–does, then you reduce that query down to 10 GB of the data in the entire set. The data skipping feature in this case knows to look for sales data, not other kinds of data, so that reduces the data set down to around 1 GB. The machine you are using to run this BLU Accelerator feature not only has 1 TB of main memory but 32 cores, so you parallelize the query and break it up so 32 MB chunks of the data are partitioned and parceled out to each of the 32 cores and their memory segments. Now, use the vector processing capability in an X86 or Power processor, and you get around a factor of four speedup in scanning the data for the sales data. And the result is that you can query a 10 TB table in a second or less.
Sounds pretty useful, right? So when do the other DB2s get it? We’ll try to find out.