R Comes To i
May 1, 2019 Alex Woodie
IBM is now officially supporting the language R with its IBM i operating system, the company announced last week. The open source language, which has long been a favorite of data scientists, statisticians, and others involved with scientific computing, gives IBM i an immediate boost when it comes to data mining and machine learning.
The R language was originally developed 25 years ago by a pair of computer science professors, Ross Ihaka and Robert Gentleman, at the University of Auckland, New Zealand. The software, which was developed largely in C and Fortran, was based largely on S, a programming language for fast program prototyping developed by John Chambers at Bell Labs in the 1970s.
Today, R is among the most widely used languages for machine learning and artificial intelligence. Most data scientists working with R today get it through an open source GNU package that includes an array of pre-built programs, including linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, and clustering.
The R language is widely taught in universities, where it’s one of the go-to languages students learn for scientific and statistical endeavors. Data scientists who come from one of the “hard sciences” like physics and biology will typically have learned R in college and feel comfortable using it for manipulating big data sets and building machine learning models in a scientific and non-scientific settings alike.
R usage has grown over the years, and today there are thousands of R packages that allow users to interpret, interact with, and visualize large data sets in a variety of specialty use cases. The interpreted language, which currently ranks 16th in the TIOBE Index, is used every day by large companies for uses as varied as analyzing social media data for marketing trends, building financial models to assess risk, and developing weather models to predict climate change.
In the data science world, R is often positioned against or alongside Python, which is a more general scripting language that is taught in university-level computer science courses (and a growing number of high school classes too). Together, R and Python are responsible for a big chunk of the machine learning modeling work that’s being done today, especially compared to more proprietary analytical languages and environments, such as SAS, MATLAB, and IBM‘s own SPSS tools (although a resurgent MATLAB recently cruised by R to number 12 on the TIOBE Index).
Microsoft has been one of the biggest backers of getting enterprises to use R to mine big dataset for competitive advantage. The software company – which recently overtook Apple as the world’s largest company with a $1 trillion market cap – is one of the founding members of the R Consortium. Several years ago, Microsoft acquired a firm called Revolution Analytics that provides a parallelized version of R’s typically single-threaded runtime, giving it an advantage for running R on enterprise-level data mining workloads. It also supports large-scale R-based data analysis on its Azure cloud.
It’s unclear at this point if IBM’s R runtime will be parallelized. (Since it’s based on the open source GNU package, and it’s running in PASE, chances are good that it has not been parallelized for IBM i). But what matters more at this point is that R is supported on the platform in the first place, which shows that IBM is serious about giving IBM i customers access to the same tools that the wider IT world is using to analyze their big data for business advantages. IBM says machine learning and AI are important workloads for its IBM i business, and adding support for tools like R shows that it believes it.
IBM is delivering R to IBM i shops through RPM (Red Hat Package Manager), the new open source delivery method that it rolled out last summer. IBM has already delivered more than 300 packages through the RPM method, and that number is constantly growing. IBM i 7.4 and IBM i 7.3 TR6 will both support R via RPM, according to IBM’s announcement letters (and while IBM doesn’t say it, there doesn’t seem to be any reason why R couldn’t be supported on older versions of the OS, too).
IBM i shops will be able to use R to access data residing on Db2 for i using RODBC, a special version of the ODBC driver that was created especially for R. They can also use the newly delivered ODBC driver, IBM says.
Besides R, the new operating systems bring other open source goodies, delivered via RPM, including:
- Apache ActiveMQ, a message broker
- Apache Ant and Maven, build automation tools
- vim, a terminal-based editor
- yum-utils, a collection of tools and programs for managing yum repositories and installed software for more advanced users
- Midnight Commander, a terminal-based utility for exploring the filesystem and performing various tasks, which we previously covered
And we would be remiss if we didn’t at least mention that the addition of R gives i fans yet another reason to celebrate September 19, which of course is International Talk Like a Pirate Day. Can we get an “argh!” and an “aye!”?