• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • What Does ‘Big Data’ Mean for IBM i?

    November 12, 2013 Alex Woodie

    When IBM i 7.1 Technology Refresh 7 (TR7) ships on Friday, it will contain several updates to the DB2/400 database designed to help it handle big data, including an expansion of SQL indexes, easier movement to SSDs, and tools to track the growth of tables over time. But what exactly does big data mean on the IBM i? We set out to find some answers.

    The traditional definition for big data has to do with the three “Vs,” which refer to the volume, velocity, and variety of data types. IBM sometimes likes to add a fourth “V” to the mix to represent veracity, or the lack thereof.

    Data volumes are no doubt increasing, but that’s been true since the first 8-bit processors started to make their way into businesses. What’s different today is that data volumes are starting to get really, really big. The numbers are actually mind-boggling. According to IBM, every day the world generates 2.5 quintillion bytes (the equivalent of 2.5 exabytes, or 2,500,000,000,000,000,000 bytes) of data. Data volumes are roughly doubling every year.

    The variety of data is getting wider and more disparate, too. Structured data, such as transactions logged into DB2/400 or other relational database systems, are growing, but at a slower rate than less-structured data types, such as HTML Web pages, pictures taken with smart phones, social media posts, and PDFs. IBM estimates that, by 2020, more than 40 percent of all data will be machine-generated data coming from Web servers, RFID and GPS sensors, financial transactions, medical devices, HVAC systems, and other machines that will encompass the so-called “Internet of Things.”

    As people try to capture all these increasing data volumes and data types, the velocity becomes apparent. These pieces of data–such as Web clickstreams, call record details, and transaction information–are valuable, but that value can diminish as the data ages. Hence, it’s important to act on data as it arrives, or soon thereafter.

    New data processing paradigms have emerged to help people store and process these big new data sets. The most popular is Apache Hadoop, an open source framework that enables users to turn ordinary X86 Linux servers into huge distributed clusters that can apply supercomputing-like capabilities against petabytes worth of unstructured data.

    Then there are new NoSQL and NewSQL databases, such as MongoDB, that can easily handle semi-structured data and also scale-out horizontally in a fault-tolerate manner more easily than their relational cousins. Hadoop and the NoSQL/NewSQL databases are changing the economics of data storage and processing, and have become the building blocks of a new paradigm of big data-driven applications.

    Big Data on IBM i

    So where does the IBM i server fit into this new big data landscape? As you might imagine, you’re not going to run Hadoop or NewSQL on IBM i; those products run primarily on Linux. The proprietary nature of IBM i means it’s shielded somewhat from the big data goings-on of the wider IT universe. The fact that the IBM i server is primarily used by brick-and-mortar companies, as opposed to companies that make their money from the Web, also helps to keep the platform grounded in a firmer reality.

    But on the other hand, there’s no doubt that IBM i is being impacted by the explosion of information. While the general IT world goes ga-ga for anything with “Hadoop” in the name, and NewSQL database companies continue to sprout up like mushrooms after a spring rain, organizations are counting on their IBM i servers to quietly deal with steadily increasing data volumes, if not necessarily varieties or velocities.

    The biggest big data issue facing IBM i shops is growth of structured data stored in the DB2/400 relational database, according to IBM i experts with IBM and SEQUEL Software who talked with IT Jungle for this story.

    It used to be fairly rare for IBM i customers to have super massive databases, but now it’s become quite common, according to Mike Stegeman with the Help/Systems‘ subsidiary. “It seemed at one time to be gradual growth, but then all of a sudden it exploded,” he said.

    One SEQUEL Software customer had a requirement to access a single database file that had a billion records in it. The file supported a critical transactional system that, due to the structure of the file and the database tables, could not be purged, he said.

    “With the IBM i, a lot of it is the transactional data,” Stegeman said. “We’re getting these customers who have these extremely large files on the i, and maybe some other databases that we can access. That’s kind of what their pain points are, and they want to have a tool that’s easy to use and can access the information without breaking the bank.”

    Another common big-data related pain point has to do with partitioned tables. There’s a limit to the number of records that can be stored in a table, which leads some IBM i shops to utilize table partitioning. However, some business intelligence tools can’t support partitioned tables, and must run separate queries against them, according to SEQUEL Software, which touts its capability to run single queries against partitioned tables as a competitive advantage.

    The IBM i server excels as a database machine, and since that database is relational in nature, people aren’t going to try to squeeze into it all the different data types. There is some growth in storage of Binary Large Objects (BLOBS) and Character Large Objects (CLOBS) on IBM i, but it appears to be minimal outside of specific industries (such as healthcare, with its requirement to store diagnostic images) and ERP systems (such as SAP‘s Business Suite running on IBM i, which is apparently weird). However, many customers are starting to store lots of PDFs in their IFS systems, which is worth noting.

    Big Data Causes on IBM i

    In the wider big data world, the big data phenomenon is being driven by the desire (and the new capability) to detect and exploit business opportunities in much shorter timeframes. Companies like Facebook, Google, and Twitter use big data technologies to serve ads based on all sorts of things they know about their users, while Netflix and Amazon use it to make product recommendations based on their collected intelligence.

    Things are a little different on IBM i. In the IBM i world, the big growth of mostly relational data appears to be driven by two things: regulation and forecasting.

    Purging unused data from DB2/400 used to be a standard part of good housekeeping on the platform. But today less than 20 percent of IBM i shops purge their data on a regular basis, according to informal polls taken by Help/Systems’ vice president of technical services, Tom Huntington.

    “You have these various regulations and people aren’t able to purge their data,” he says. “[Through PowerTech] we see more and more people who are struggling with how to keep audit data around it.”

    The combination of the declining cost of storage and availability of new data warehousing technology like Hadoop are impacting IBM i shops and what data they decide to keep, Stegeman says.

    “You don’t need a whole floor on a gigantic skyscraper just to hold your hard drives to handle all your data,” he says. “They’re keeping their history around longer, either for auditing purposes or to find out how the company is doing overall. Or they may say, ‘Hey we’re not using it now, but maybe we will two or five years down the road.'”

    RELATED STORIES

    Big Data Gets Easier to Handle With IBM i TR7

    Big Data, OpenPower Are Big Levers For Power Systems

    Gartner Says Big Data Getting Bigger, Skills Lag

    Analytic Skills Is The Top Big Data Priority, Lavastorm Says

    IBM Completes One Big Data Analytic Acquisition, Announces Another

    Power Systems Marketing VP Sees Big Data Bulls Eye



                         Post this story to del.icio.us
                   Post this story to Digg
        Post this story to Slashdot

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags:

    Sponsored by
    WorksRight Software

    Do you need area code information?
    Do you need ZIP Code information?
    Do you need ZIP+4 information?
    Do you need city name information?
    Do you need county information?
    Do you need a nearest dealer locator system?

    We can HELP! We have affordable AS/400 software and data to do all of the above. Whether you need a simple city name retrieval system or a sophisticated CASS postal coding system, we have it for you!

    The ZIP/CITY system is based on 5-digit ZIP Codes. You can retrieve city names, state names, county names, area codes, time zones, latitude, longitude, and more just by knowing the ZIP Code. We supply information on all the latest area code changes. A nearest dealer locator function is also included. ZIP/CITY includes software, data, monthly updates, and unlimited support. The cost is $495 per year.

    PER/ZIP4 is a sophisticated CASS certified postal coding system for assigning ZIP Codes, ZIP+4, carrier route, and delivery point codes. PER/ZIP4 also provides county names and FIPS codes. PER/ZIP4 can be used interactively, in batch, and with callable programs. PER/ZIP4 includes software, data, monthly updates, and unlimited support. The cost is $3,900 for the first year, and $1,950 for renewal.

    Just call us and we’ll arrange for 30 days FREE use of either ZIP/CITY or PER/ZIP4.

    WorksRight Software, Inc.
    Phone: 601-856-8337
    Fax: 601-856-9432
    Email: software@worksright.com
    Website: www.worksright.com

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Sponsored Links

    RJS Software Systems:  A to Z Forms Management Webinar. November 14
    BCD:  Recorded Webinar: Presto 5 gives IBM i green screens a more modern web GUI
    ASNA:  RPG Goes Mobile! Free Webcast! Thursday, November 21.

    More IT Jungle Resources:

    System i PTF Guide: Weekly PTF Updates
    IBM i Events Calendar: National Conferences, Local Events, and Webinars
    Breaking News: News Hot Off The Press
    TPM @ EnterpriseTech: High Performance Computing Industry News From ITJ EIC Timothy Prickett Morgan

    Silver Surfers Shredding Up The Technology Market Power Systems Provisioning For Enterprise-Level Academics

    Leave a Reply Cancel reply

Volume 13, Number 33 -- November 12, 2013
THIS ISSUE SPONSORED BY:

CCSS
ASNA
HiT Software
Essextec
RJS Software Systems

Table of Contents

  • What Does ‘Big Data’ Mean for IBM i?
  • Mobile Password Management App Supports IBM i
  • Sugar Says CRM-to-Social Opportunity is Sweet
  • HCS Moves IBM i Health Care App to Connectria’s Cloud
  • Deconstructing IBM i Cloud Migration Myths
  • MyEclipse Secure Gets the Web Goodness
  • CGC Announces More Customer Wins for Construction ERP
  • Zend Hits the Throttle with PHP Dev Tool
  • Manufacturing ERP Costs Remain High, Panorama Says
  • JD Edwards Security Found In the ‘Q’

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • Public Preview For Watson Code Assistant for i Available Soon
  • COMMON Youth Movement Continues at POWERUp 2025
  • IBM Preserves Memory Investments Across Power10 And Power11
  • Eradani Uses AI For New EDI And API Service
  • Picking Apart IBM’s $150 Billion In US Manufacturing And R&D
  • FAX/400 And CICS For i Are Dead. What Will IBM Kill Next?
  • Fresche Overhauls X-Analysis With Web UI, AI Smarts
  • Is It Time To Add The Rust Programming Language To IBM i?
  • Is IBM Going To Raise Prices On Power10 Expert Care?
  • IBM i PTF Guide, Volume 27, Number 20

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle