• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Counting The Cost Of AI Inference – And Projecting It Far Out

    March 30, 2026 Timothy Prickett Morgan

    It is probably a good thing that most IBM i shops did not spend a lot of money trying to figure out AI in the past decade. It was enormously expensive to develop first generation machine learning algorithms, and they had limited applicability. With large language models and their generative capabilities, the use cases for AI have skyrocketed, but the costs for training have been crazy expensive since the end of 2022, when the chattybot eureka moment – some might say emergent behavior – happened.

    The cost of training what are called foundation models – very large models with hundreds of millions of parameters, which are akin to the number of neurons in our brains, and the weights in the AI model being akin to the strength of synaptic signals in that collection of neurons – continues to go up as parameter counts and dataset sizes rise. Even mixture of expert models – which talk amongst themselves and mull over information and try to come to a logical conclusion when a query is posed – still have a lot of parameters collectively. They give better answers, though. It is the difference between the thoughts of a few very experienced people contrasted with a blurty drunk or five year old.

    In any event, whatever metaphor you want to use, the cost of actually running AI in production comes down to what it costs to generate a snippet of text called a token, which in practice is not a word (oddly enough) but an average of around four letters. (It depends on the language.) Data is tokenized and then turned into numerical vectors, which are then used to create the weights which drive what sure looks like the thought process of a GenAI model. So, right now and for the foreseeable future, what matters is the cost per token.

    This week, both IDC and Gartner provided some insight in how rapidly this price has come down and how quickly they expect it to continue to fall.

    Here is the historical recap that Matt Eastwood, senior vice president at IDC, put out on X:

    Eastwood is looking at the costs of training the GPT-3 model when its API came out in beta in June 2020, which is a few years before the GenAI boom hit, and comparing it to what it costs to generate tokens through APIs from OpenAI, the creator of the GPT model, of course. The cost per million tokens was a whopping $32 bucks back in 2020, and the good news is that most queries were short, most contexts were short, and most answers were short back in 2020. They had to be. And today, almost six years later, the cost of 1 million tokens is under 10 cents. That is more than a factor of 320X reduction in cost in six years. Moore’s Law improvements alone would leave us to expect only a factor of 6X improvement, and if you go from 32-bit data down to 4-bit data, that gets you another 4X for a combined 24X. The remaining 13.3X improvement in the cost per token is coming from other hardware and software advances.

    This is, actually, an amazing level of bang for the buck change for the better.

    But it doesn’t stop there, according to the researchers over at rival Gartner:

    It looks like IDC is pretty much at the base (black line) in the curve on the left side of the chart above in 2026, and the cost per million tokens is going to continue down its exponential curve, dropping by another factor of 9X between 2026 and 2030. So, call it a penny per million tokens, and yes, the rate of change is slowing because there are limits of physics to contend with.

    Wait, wasn’t everyone going to get rich selling tokens? Well, as it turns out, mixture of expert (MoE) models will probably use somewhere between 100X and 1,000X more tokens to do their reasoning. So the price to get an answer will go up, and the hope is that the quality of the answer will rise faster than the price.

    For those of us who like our thinking jobs, maybe that is not something to be desired.

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags: Tags: AI, IBM i

    Sponsored by
    WorksRight Software

    Do you need area code information?
    Do you need ZIP Code information?
    Do you need ZIP+4 information?
    Do you need city name information?
    Do you need county information?
    Do you need a nearest dealer locator system?

    We can HELP! We have affordable AS/400 software and data to do all of the above. Whether you need a simple city name retrieval system or a sophisticated CASS postal coding system, we have it for you!

    The ZIP/CITY system is based on 5-digit ZIP Codes. You can retrieve city names, state names, county names, area codes, time zones, latitude, longitude, and more just by knowing the ZIP Code. We supply information on all the latest area code changes. A nearest dealer locator function is also included. ZIP/CITY includes software, data, monthly updates, and unlimited support. The cost is $495 per year.

    PER/ZIP4 is a sophisticated CASS certified postal coding system for assigning ZIP Codes, ZIP+4, carrier route, and delivery point codes. PER/ZIP4 also provides county names and FIPS codes. PER/ZIP4 can be used interactively, in batch, and with callable programs. PER/ZIP4 includes software, data, monthly updates, and unlimited support. The cost is $3,900 for the first year, and $1,950 for renewal.

    Just call us and we’ll arrange for 30 days FREE use of either ZIP/CITY or PER/ZIP4.

    WorksRight Software, Inc.
    Phone: 601-856-8337
    Fax: 601-856-9432
    Email: software@worksright.com
    Website: www.worksright.com

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    IBM i PTF Guide, Volume 28, Number 13 Early Bob Excels In Medhost IBM i Tryout

    Leave a Reply Cancel reply

TFH Volume: 36 Issue: 12

This Issue Sponsored By

  • GiAPA – The IBM i Developer’s Best Friend
  • Mason Associates, Inc.
  • WorksRight Software
  • Raz-Lee Security
  • FalconStor

Table of Contents

  • What IBM i Ideas Are Cooking In IBM’s Ideas Portal?
  • Early Bob Excels In Medhost IBM i Tryout
  • Counting The Cost Of AI Inference – And Projecting It Far Out
  • IBM i PTF Guide, Volume 28, Number 13
  • The Next Generation Of IBM i Talent in GenAI Action

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • Bob 1.0 Users Bugged By Lack Of One Feature
  • Here Come The AI-Based Code Modernization Offerings
  • Guru: Cohesion First – What A Procedure Should Be Responsible For
  • IBM Offers Trade-Ins On Storage To Grease The Upgrade Skids
  • IBM i PTF Guide, Volume 28, Number 14
  • What IBM i Ideas Are Cooking In IBM’s Ideas Portal?
  • Early Bob Excels In Medhost IBM i Tryout
  • Counting The Cost Of AI Inference – And Projecting It Far Out
  • IBM i PTF Guide, Volume 28, Number 13
  • The Next Generation Of IBM i Talent in GenAI Action

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle