How Long Before Big Blue Brings Code Assist To IBM i?
August 23, 2023 Timothy Prickett Morgan
IBM Research and Big Blue’s Software group have been collaborating to bring generative AI capabilities to market through the Watsonx stack of large language models and related tools. Watsonx is a basis for customers to create customized LLMs based on data from their own businesses and to integrate the quasi-cognitive capabilities of LLMs into their applications.
Watsonx is also being used by IBM to augment some tools of its own, and the most recent one that will be in tech preview in September and generally available sometime in the fourth quarter is called Watsonx Code Assistant for Z, which as the name suggests is a programming assistant that will eventually be built into the open source VSCode integrated development environment that was created by Microsoft and that is becoming increasingly popular. In this case, the Watsonx Code Assistant is being explicitly trained to help programmers take applications coded in COBOL and convert them to Java under an effort code-named “Project Hopper.”
Watsonx is comprised of over 20 different AI large language models – sometimes called foundation models – that scale to tens to hundreds of billions of parameters and that exhibit some of the emergent behaviors that are embodied in much larger models like the GPT4 model underpinning the ChatGPT service created by OpenAI and commercialized by Microsoft or the Bard service from Google that is powered by its DeepMind Gemini model.
In this case, IBM Research has created an LLM called Code that has been trained on IBM;s own COBOL data and continues to undergo training, and it has been tested on the Project CodeNet dataset, created by and open sourced by IBM that includes over 14 million code samples, to see how well it converts COBOL code snippets to Java code snippets. CodeNet, which was opened up more than two years ago, is intended to be the analog of the ImageNet labeled image library that was used to train the ResNet-50 model, which of course was the first AI neural network that was capable of identifying objects in images at an accuracy level that surpassed humans. The code samples are written in over 50 different programming languages but C, C++, Python, and Java dominate the database. The code snippets have a huge amount of metadata describing what the code does, which informs the AI models that can be used to create code raw or convert from one programming language to the other much as GPT4 can translate from English to French or converse on any manner of topics through a prompt interface.
The Code LLM, which has 20 billion parameters, is the key component of the Ansible Lightspeed Code Assistant that was previewed by IBM’s Red Hat division earlier this year to create playbooks for its Ansible system automation tool. This was a particular tuning of the Code LLM, and so is the one used with the Cobol-to-Java tuning that will be delivered within the Code Assistant for Z plug-in for VS Code.
In a prebriefing ahead of the launch this week, IBM executives were at pains to point out that it has not created a tool that will take COBOL applications running on its System z mainframes and port them wholesale to Java running on any platform that supports a Java application server. Rather, the Code Assistant for Z has the precise job of letting programmers who are refactoring COBOL code into Java to get suggestions for specific functions within a COBOL application stack and, working in conjunction with IBM’s Application Discovery and Delivery Intelligence (ADDI) inventory and analysis tool, it can figure out the web of dependencies within the COBOL application and help create links with code chunks as they are moved into Java, resulting in an application that is being transformed piecemeal in a careful, “de-risked” fashion, as IBM put it.
Here is the important bit: These COBOL applications can be converted to Java, but the resulting Java is tuned for the System z mainframe and for its data sources and data models, without changing schemas or anything else. The IMS, CICS, Db2, and other software remains unchanged and the Java code speaks to it through its APIs.
Such a tool is necessary for a lot of reasons, not the least of which being that COBOL programmers are getting harder and harder to come by. IBM says that over 84 percent of its System z clients use COBOL – many have been programming in Java and SQL for decades now, too – and a lot of the oldest and most entrenched code at IBM mainframe shops is in COBOL code that was created decades ago and changed by many different programmers, with their own logic and what might be called different dialects of COBOL over those ensuing years. IBM reckons that there are more than 230 billion lines of COBOL code in the mainframe base, and that the typical customer has 10 million to 50 million lines of code. That is just too much code to mess with and possibly introduce risk through errors in logic in a running business. And more importantly, according to Skyla Loomis, vice president of IBM z software at Big Blue, taking into account both coding and testing together in the complete toolset from IBM, programmers that were part of early testing for Code Assistant for Z were an order of magnitude more productive than those who did not have the tool.
Hence, IBM is creating Code Assistant for Z rather than have someone else do it and perhaps do the whole shebang and try to reactor COBOL applications off the mainframe entirely.
Now, everything that we just said about COBOL on mainframes applies to COBOL on the IBM i platform and, importantly, RPG in its various guises on the IBM i platform.
Imagine, if you will, a Code Assistant for i, which among other things might have LLMs trained specifically on RPG III, RPG-IV, and ILE RPG code conversion to free format RPG. Or perhaps it would be trained differently to do RPG conversions to Java, Node.js, Python, or PHP as needed. Again, as IBM has done on the mainframe, there would be very tight bindings to the Db2 for i database as part of these conversions, keeping the applications on the IBM i platform but allowing customers to have a modicum of code portability for performance or political or economic reasons. And importantly, human programmers are still in the mix, optimizing the resulting code as it is transformed and creating code that would be recognizable to other human Java programmers.
When we asked IBM about this possibility, we got this predictable and formal answer:
“COBOL to Java is the first conversion use case for watsonx Code Assistant. Other transformation use cases will be considered in the future. Java was selected first because of its popularity for over 25 years, enabling customers and partners to modernize their applications. The maturity of Java APIs, libraries and frameworks increases developer velocity and provides a consistent hybrid cloud development and deployment experience. Java can dynamically detect and optimize for IBM Z hardware where business-critical applications are deployed. It enables hardware and software stack optimizations to maximize the performance and benefits of IBM Z hardware and middleware, including WebSphere Liberty, CICS, IMS, MQ and Db2. This level of integration enables numerous architectural options for building business solutions.”
IBM may create a Code Assistant for i, or it may work with the several developers of IBM i programming tools to help them create them. We think the latter is a more beneficial answer, much as IBM created a single remote journaling protocol and embedded it into the OS/400 and IBM i operating system and all of the high availability clustering tool makers eventually adopted this rather than create their own.
We shall see what Big Blue and its partners do. But the opportunity is much the same as it is on the System z platform, clearly. And with orders of magnitude more customers that will be affected.
Enterprises Are Not Going To Miss The Fourth Wave Of AI (The Next Platform)