Big Blue To Bring Live Migration To IBM i 7.1
February 13, 2012 Timothy Prickett Morgan
While talking about the sunsetting of support for the i5/OS V5R4, also known as IBM i 5.4, operating system last week, which you can read about elsewhere in this issue of The Four Hundred, Ian Jarman, manager of Power Systems software at IBM, said that Big Blue was going to deliver something for IBM i that all virtualized operating systems in the 21st century need: live migration of virtual machines or logical partitions from one physical server to another.
Jarman was talking about how well-received the Technology Refresh process was for updating the IBM i 7.1 operating system, mainly because it allows IBM to slip new functionality and support for new hardware into the platform without requiring the company to do a complete rebuild of the system as a dot release. And just for fun, and presumably because he knows I have been bugging IBM for a long time to bring live migration to OS/400, i5/OS, and them IBM i partitions, Jarman said that the Technology Refresh process was going to be used to bring live partition mobility, Big Blue’s term for one kind of live migration, to Power Systems machines.
This live migration support will only be available for IBM i 7.1, not IBM i 6.1 or earlier releases that are no longer even sold. This stands to reason considering that IBM’s official policy right now is that customers can and should upgrade directly from i5/OS V5R4 to IBM i 7.1, skipping IBM i 6.1 entirely. It is hard to argue with this. For one thing, IBM i 6.1 does not have Technology Refreshes and, for another, it is not as tuned up for the new Power7 servers as is IBM i 7.1. Perhaps more importantly, IBM i 6.1 is older and while it is still available and supported, IBM i 7.1 will be supported for more time starting from this point today compared to IBM i 6.1. If you are going to move, you may as well move to the more recent release, especially considering that they both have the same exact price.
Live migration is one of the two killer apps for server virtualization hypervisors, with the first being the ability to consolidate the workloads from many physical servers running at poor utilization to one bigger box running at a higher utilization. Having put more application eggs inside of a single server basket, you have therefore created a potential big single point of failure for a lot of applications, and live migration was created to help mitigate that risk.
While the approach is different depending on server processor and hypervisor architectures, the concept of live migration is the same. If servers are running their operating systems and applications from a centralized storage area network, most of that server’s entire image running in a logical partition or virtual machine partition is stored in a file on that SAN. Some of what is going on inside the LPAR or VM is residing in main and cache memory, but this is a relative handful of megabytes to gigabytes compared to perhaps hundreds of gigabytes to multiple terabytes of total data on the SAN that completely encompasses that LPAR or VM. With live migration, the hypervisor gets some warning that an LPAR is going to make a jump, it temporarily halts the LPAR, captures all of the data in the main and cache memory that describes that LPAR, passes it over the network to another physical machine running its own hypervisor, loads it into memory inside of a partition, and then switches the pointers for the disk file stored on the SAN for that LPAR to the new hypervisor and partition, and hits the unpause button, and the apps that were running on this server in a partition are now running in that server in a different partition.
This is a lot harder than it sounds, and considering how different the guts of the OS/400 family of operating systems are compared to Unix, Linux, and Windows, I am not surprised that it took Big Blue some time to figure out how to do it. I am a bit disappointed that it has taken this long, and it is not even here yet, so we have to wait some more time. I think it was a question of need and economics, considering the long practice of using application-level clustering for high availability. IBM didn’t want to hurt its high availability software partners or its own iCluster business, which it got through acquiring DataMirror a few years back. IBM sold off iCluster to Rocket Software at the beginning of this year, and that’s not an issue for Big Blue any more.
When AIX 6.1 was launched in November 2007, it had two different kinds of live migration. One, now called live partition mobility, was used to beam a whole running LPARs on Power Systems running AIX or Linux from machine to machine. The other used a technology called a workload partition, which is akin to a virtual private server, sometimes called a container, that doesn’t put a full operating system inside of a partition but instead takes an AIX kernel and file system and puts a virtualization layer on top of that that makes applications think they all have their own AIX when, in fact, they do not. To some ways of thinking, workload partitions, or WPARs, are akin to OS/400 subsystems.
IBM first started talking about WPARs in early 2006, when VMware was making a lot of noise about its VMotion live migration feature for its ESX Server hypervisor. The company was expected to put out a patch for AIX 5.3 that year to bring WPARs to market, but it took until November 2007 and AIX 6.1 to deliver it. (That’s not a surprise. This is complicated stuff.)
I have been a gadfly about the lack of live migration for OS/400, i5/OS, and IBM i workloads for the past six years, and because this is such a useful feature and, more importantly, is considered base functionality for all hypervisors these days. It is a tick mark, plain and simple, and something that people want to have even if they never use it.
Just after live partition mobility was announced for AIX back in the fall of 2007, I asked Jim Herring, formerly director of product management for the iSeries line and at the time director of high-end Power Systems, about the plan for live migration.
“We have not announced this capability yet, but it is on our roadmap,” Herring told me in an interview. “As you might imagine, i5/OS is a more complex operating system than AIX, and the fact that we have this single-level storage paradigm going on makes it a bit tricky, too. It is a little harder to do. But I will tell you this: I want to put this functionality into i5/OS as soon as I possibly can.”
It never quite got here during Herring’s tenure. And last June, in an interview I did with IBM i chief architect Steve Will, I brought up the issue of the lack of live migration for the IBM i operating system once again. He said that the issue was that live migration required the virtualization of system I/O, and the largest IBM i shops told them that it was going to take years for them to virtualize their I/O and therefore be ready for live migration.
As I said then, I believe that this is a chicken-and-egg problem. If IBM had live migration for IBM i, then IBM i shops could think about building cloudy infrastructure to run RPG, COBOL, and Java applications smacking DB2 for i databases. I understand that the advent of live migration puts pressure on the HA software vendors, who cluster at a much higher level in the system, but there is no reason why IBM cannot bring the HA tool makers in the IBM i space into the tent, just as it did by making the internalized remote journaling features that it put into OS/400 with the help of Vision Solutions so many years ago and then made it available to all HA vendors, who sell add-ons to make use of this core feature.
Anyway, all I know is that I think it is a good thing that IBM i 7.1 will get this feature at some point in the near term. Any later than mid-2013 to early 2014 and we are probably talking about IBM i 8.1.
In the meantime, if you want to read up on live partition mobility and how it works with AIX and Linux, check out this Redbook.