Building A More Perfect IBM i Cloud On Power10 Iron
January 24, 2022 Timothy Prickett Morgan
As we get closer to the launch of the entry and midrange Power10 machines, we can’t help but think about the innovative uses that these machines might be put to. We think, for instance, that these machines could be the foundation of a new generation – and a new kind – of IBM i cloud based on a mix of entry one-socket Power S1021 and two-socket Power S1022, and Power S1024 machines augmented in a very special way with four-socket Power E1050s.
To one way of thinking, the easiest way to build a big cloud capable of supporting thousands of customers on logical partition slices is to just buy multiple “Denali” Power E1080 machines. If you wanted to get clever about it, you would create Enterprise Pools across a network of Power E1080s to allow IBM i partitions and their workloads to move from machine to machine. You could put Db2 Mirror on the Power E0180 cluster to make the data replicated and secure on the logical partitions. This Power E1080 cluster would also have network-attached NVM-Express storage pooled and shared over Ethernet, kind of like a flash SAN that behaves like local storage to the nodes without having to pay for an actual SAN. And if you wanted to get innovative, you could invoke the “memory inception” memory area network to allow for memory to be pooled across the Power E1080 nodes so that at any given time any particular machine could grab a very large chunk of memory to throw at any number of cores up to the 240 core maximum of a single Power E1080 machine.
With memory being so expensive, it might even make more sense to put a cheaper Power E1050 machine at the center of a cloud compute network comprised of Power E1080s and use it as a memory server for the big machine. On the memory inception network, the delay jumping out over the system interconnects between nodes is only between 50 nanoseconds and 100 nanoseconds, which is the kind of delay you see hopping between far nodes in a NUMA configuration. A Power E1080 with 240 cores and 2 TB of main memory has on the order of 5.27 million CPWs of aggregate raw IBM i OLTP throughput and would cost of $2.8 million, as we explained in our bang for the buck analysis for big iron back in November 2021. That works out to a mere 50 cents or so per CPW, and since the machine tops out at a mere 1,000 logical partitions, you have to have 1,000 customers share that compute. Depending on the software stacked up on top of that, you might be talking about $6,000 to $9,000 per partition rated at 5,266 CPWs if you allocated the capacity equally across the partitions, and over 60 months, that would work out to a cost of a mere $150 per month per partition.
But workloads on a cloud are not supposed to be static, and the whole point of the cloud is not just to have utility pricing for compute, storage, and networking, but to have access to vast slices of these for a short period of time to get work done faster. Imagine, for example, that for a brief ten minutes, you had 80 percent of that Power E1080 to run and complete a batch job – say a set of complex queries on your transaction data and generating reports from that, for example, that takes eight hours – that takes all night on a Power9-based Power S924 in your own shop? How does that change your business? It only costs $3.20 per minute to have the whole Power E1080 – $8.4 million with hardware and software divided by five years’ worth of minutes, which is 2.63 million minutes – and so if you had 80 percent of it for ten minutes, that query and report would cost $25.60. That’s it. But to snap you back into perspective, even if you ran that report every day of the year, it’s only $9,344. Over five years, now you are talking about $46,720 for that daily report. That seems like a lot until you consider paying $8.4 million for an E980 or even something on the order of $150,000 for a Power S924 and using one third of its time each day running a report. That will cost you $50,000, too. If you have a heavy software stack, that daily report could be costing you $75,000 over five years.
This is the way you have to start thinking about what you do in your datacenter. And you need to count the cost of the capacity that you are not using, too. If you are only utilizing a machine at 30 percent over five years, then on that Power S924, you are literally throwing $100,000 out the window. But you don’t think about it that way because it is yours.
So, I would argue that what I am talking about above with the network of shared Power E1080s is the kind of shared cloud that needs to be architected, and it may require for the Db2 for i database to be partitioned, or at least to be spun up from a small image on a small logical partition to a large partition very, very quickly.
Now, while this is nice, I think we can probably make it even cheaper and more resilient. The Power E950 based on Power9 processors was considerably cheaper than a single node of the Power E980. A Power E950 with 48 cores running at 3.15 GHz would have delivered around 610,000 CPWs if it could have supported IBM i, and cost around 46 cents per CPW to buy at a cost of $282,636 for a machine configured with 2 TB of main memory. You could boost the memory to 4 TB, but it would have been a lot more expensive using 128 GB CDIMMs.
If you want to have a 2 TB memory footprint (or more than that) and 240 cores with support for 1,000 partitions, then the Power E1050 is going to be considerably cheaper than the Power E1080. Let’s so some math. Using the Power10 dual chip module (DCM), IBM will be able to get 120 SMT8 or 240 SMT4 cores into a single four-socket machine, and SMT4 threading is going to be fine for a lot of people. Depending on the clock speeds, that 240 core Power E1050 machine using SMT4 threading will deliver somewhere around 3.74 million CPWs and will scale up to 16 TB of main memory, just like the Power E950 did, using 256 GB memory sticks. This machine might cost $600,000 to $700,000 for the base processor and memory instead of $2.8 million like the Power E1080. Now we are talking 16 cents to 19 cents per CPW to buy it. Call it a third as expensive as a CPW on a Power E1080.
Here is where it gets interesting, and where we assume the economics work out in favor of hybrid memory clusters of Power10 machinery.
Imagine taking a bunch of two-socket Power 1022 machines that have 60 Power10 cores per socket using the DCMs, or 120 cores per machine. Equip them with no memory and no flash. Plunk a Power E1050 in the middle of a hub and spoke memory network, with spokes going out to, I dunno, ten, twenty, or thirty Power E1022s. Call it three dozen for fun. Each of those Power E1022s has 120 SMT cores and can support 1,000 logical partitions (I sincerely hope), which is the upper limit of PowerVM. (It would be nice if the upper limit really was 20 LPARs per core, and that these machines could have as many as 2,400 partitions per box and that the Power E1080 could do 4,800 partitions because it has 240 cores; ditto for the DCM, Power10 SMT4 version of the Power E1050.) Anyway, given the current PowerVM limits, ten of these Power E1022s would have a total of 1,200 cores with a maximum of 10,000 partitions, with an aggregate of 15.55 million CPWs, and would probably cost maybe $1.3 million for all the processing in all of these boxes. Add in the PowerE1050 as the memory hub and the big iron machine to do large (but short run) batch work, you add another 3.74 million CPWs and another $600,000 in cost. The combined memory area network of systems has 19.3 million CPWs and costs $1.9 million for the base hardware; even if storage and networking costs another $1 million, you are under $3 million to deliver a system that can support 11,000 total partitions and do a lot of work. It is 32 percent less expensive to buy the base systems in the memory network cluster than buying a single Power E1080, and it has 3.7X more aggregate performance and can still handle the occasional big load and a lot of little loads.
And because it is a shared utility, you can interleave the work of multiple customers across a lot of different time zones across all of this infrastructure.
This is what I would do if I was building an IBM i cloud.