 Welcome, everybody. My name is Phil Estes. This is Sean Murakami. We're both from IBM Cloud, Open Technologies. We're going to talk to you today and hopefully present something that is more to get you thinking about these topics. These capabilities that we're going to try and demonstrate are not integrated into OpenStack yet in any way. In fact, they're just now appearing in the underlying engine that we're going to show you. I think it's an interesting topic, hopefully one that you can take something away and we'd love to hear feedback on what we put together. I hit the wrong button because it's not my presenter. A little bit of background. If you've been around the OpenStack community, you know if you've looked at the NOVA support document, live migration in a hypervisor scenario is sort of accepted and in many of the hypervisors that are supported, this is the capability that you have today. You can look there on the OpenStack documentation site, but also obviously VMs are just sort of one class of virtualization now that if you're here in the last talk, containers and their use across the ecosystem continue to grow, like Magnum, Murano, Kala, Courier, Kubernetes. There's obviously an expectation that as container use grows that there also could be interest in the same capability to live migrate my container, not my VM, but my container between compute hosts. That's kind of the background for what we'd like to talk to you about. Why don't we have this today? Why is it already existing in the hypervisors that you saw on the last slide? One of those obvious reasons is that a hypervisor is really controlling a completely emulated machine. If you know anything about BIOS and bare metal hardware, suspend-resume is a solved problem, and therefore hypervisor-based live migration, again, I'm glossing over some of those details, but it's a solved problem. It's a known entity of how to do that, and so hypervisors that support live migration can take advantage of this solved problem. Containers, as many of you know, are simply a process on your Linux system. There's a lot of interesting things going on there related to isolation, but at its source, they're just processes. We have to ask, can the Linux kernel migrate a process? The answer is, well, not very easily, until in recent years, the CreeU project checkpoint restoring user space is what that stands for. This work began around 2011 or 2012 in the kernel. By the time you got to the 3.11 kernel and the CreeU code base in late 2013, you could actually use CreeU to checkpoint and restore a process on your Linux system. Earlier this year, CreeU 2.0 was released. This brought about, again, the capability to start to have this ability to migrate a process. Again, CreeU wasn't necessarily about migration, but more about checkpointing, so freezing and getting all the state. Let's talk about that just a little bit to give you a better picture. So CreeU was proposed, a project led by Pavel, many of you know him from the OpenVZ team. Key difference. Again, I sort of jumped in with that history and started with CreeU. That wasn't the first opportunity that people in the Linux kernel community attempted a checkpoint restore capability, but the key difference here was that Pavel's proposed implementation would do much of that work in user space, and so much of the kernel work was done early on, and as you see here, completed by the 3.11 release. So what CreeU does, based on these capabilities in the kernel, it handles freezing all this process state, and so taking things like the process, thread info, capabilities, the UID GID, all the memory, the open files, Unix sockets, network sockets, and there's some very interesting magic that they got into the Linux kernel about a new state, a new TCP state that allows this migration of even a live socket. IPC, timer, signals, all these things are collected by CreeU when you ask it to checkpoint a process, and this becomes a set of metadata that I can now take somewhere else and restore this process. That's a very quick and brief overview. Again, I'm glossing over many of the complex details. The CreeU.org wiki has a lot more detail on how that works and some of the history behind that. So we've talked about CreeU. Again, this is some Linux kernel capabilities, a binary, you can build CreeU from source, run it on your Linux system. What needed to happen next for a container engine like Docker was to marry these two worlds of Docker's running a process, can I now have CreeU do the same checkpoint and restore capability inside the Docker engine? And so Ross Boucher has been working on this for a very long time. Initially, as some of you may know, the Docker engine now relies both on lib container, which it has for a couple years, but lib container became part of the OCI, and run C is the sort of wrapper around lib container. So CreeU support came to run C. I should have put that in the charts. I think by summer of 2015 it was already there. If you went to DockerCon in San Francisco that year, you would have seen a popular demo of Quake, the game backend being migrated using the support. But to actually bring that to the Docker engine required sort of a design of what are going to, what will the capabilities be, what part of the CreeU interface will we expose, and what will that look like? So that work was finally finished and we merged that into the Docker engine just last month. And so that's not even in a Docker release yet, what we're running today in our demos is a binary built from master with this PR merged. But when the Docker 113 release comes out in about a month, you'll be able to use the checkpoint capabilities. So very quick overview. Again, the full documentation on Docker checkpoint and Docker start from a checkpoint will be in the Docker 113 release. You can look at that PR that was on the last slide and get a lot more detail. But the simplest view is that when I've got a running container, I can say Docker checkpoint, the name of my container or its ID and the name of a checkpoint. And that will basically pause the container, run CreeU to generate the set of metadata. And now I'm able to Docker start from a name checkpoint and restart that container. For Docker 113, this will be part of what's known as the experimental build to allow some time to sort of shake out use cases and make sure that the way this is exposed to users is valuable. And really this model really associates that checkpoint with a very specific container. And we'll get into why that makes sense, because once I try and do it on another system or with a different container, I have to consider all these metadata concerns about how to get the exact configuration into another image or another container. So in essence, live migration is not an intended feature of this initial release. In fact, the CreeU Wiki again has a lot more information on some of the challenges and how to do live migration with CreeU. And so some of what we put together for today is the same challenges and concepts brought to Docker and how we've overcome them and how they may be solved in the future. So that gives you a good background on where CreeU came from, what its capabilities are, how it came to be in Docker. And now I'm going to turn it over to Sean and he's going to talk through kind of, again, these challenges in working with, using the CreeU support in Docker to actually support migration to another host. Thanks, Phil. So I'll be talking to you about some of the challenges and things that we've done as part of demos going forward. And this is all going to be running on some OpenStack VMs. So when we look at container migration, there's really kind of two paths that a process or container takes. One is an in-memory path. So if you have a process or an application that does everything in memory without any file system changes, that's one path that we'll go through those details. And the second is if the application or service running inside the container makes any changes within the file system to keep track of any state or other metadata information that it requires to run. So the first in-memory only sequence, these are sort of the steps that we've done to demonstrate the migration of how a container moves from one host to the other. So the first thing, as Phil mentioned, there's a checkpoint that has been integrated into the experimental release of Docker. And that kind of freezes that container. The second thing is that we need to then take that checkpoint metadata, which is stored as part of the container information. On the remote host, we can then pull or prefetch the Docker image, the base image that was used for the initial host container onto the target. And that kind of speeds up that migration because all we're doing then is migrating that metadata, the checkpoint metadata information over to the new host and dropping it into the new container that we're starting up on that remote host. And we specify which particular checkpoint we're restoring on that remote host. So the second path, again, is the file system changes. So, for example, if an application that's running some maybe state or index information to a file system, this process is a little more complicated. Because that information is now within the container's file system, it's no longer just in memory. So now we've got to migrate both the in-memory context as well as the file system changes. And that requires us to take the running container and migrate that entire thing over to the remote host. So the first two steps are effectively the same. We checkpoint the container. When the container is checkpointed, we take that metadata, this is tarred up. And then we also need to make sure we inspect the container. So the inspect on the container's metadata tells us things like what's running, what commands were executed, environment variables, things like that. And as we'll mention later in the challenges, a lot of this information typically you would get as part of either the base image metadata, the metadata when you export from a running container does not get carried over to the other host. So we need to make sure we recreate that when we start to create the container on the new host. So once we create a container on the new host from the exported container, we then have to, we go through the same steps of taking the checkpoint metadata, dropping it into the right file system and then starting it up. So other challenges, again, handling, as I was mentioning, the handling metadata, there are certain things that are deficient right now with current implementation that we don't get all that running state information within the, of the running container when we migrate it over. So we have to make sure that we do this as a sort of another step as part of that migration path. The second kind of key point is sort of network information. So a lot of, I guess the underlying premise for when you checkpoint or restore a container, all the network information persists in that checkpoint metadata. So it assumes that it's going to come back up with that same IP and MAC address. So when you're migrating the container over to the next host, if you already have a container running with an IP that is expecting to start back up, it runs into issues and can't, you can't restore that container that point because it's, the Docker engine complains that this IP cannot be allocated to that container. And then finally volume support. This is more of a traditional challenge of migrating persistent data from one host to the other. There could be other data sources running to it. So these, I think these are some of the things that is being kind of looked at within the Docker community as well as part of some of these use cases. And then I think in the OpenSat community, I think even the talk in this room, the next talk considers that world not the runtime migration, but the whole idea of persistent data migration in the OpenStack world. So that may be interesting for those that are interested in that piece as well. Okay. So we want to kind of spend the rest of the time demonstrating a few different use cases of how you would utilize this checkpoint restore feature. So the first one we're going to do is show just a very simple checkpoint restore on a single host just to give you an idea of the command sequence that we run through for checkpointing and starting that back up. The second one is more applicable to like real world scenarios if you have like an in-memory database. And we want to migrate that because maybe the host that is running on has to do maintenance or it just crashes. The third, if we have time, is a video streaming. So for example, if you have video streaming service, we have transcoder containers streaming out. We're going to try and connect to it and then migrate it and resume from that checkpointed state. So go over some overviews and then we'll jump directly into some shell. So the first one again, what you're going to see on one host, we're going to run a busybox container with a counter and all it's going to do is start counting. Once we checkpoint, it'll freeze and we'll then restore it and it should resume because all of this is really happening within the container that we see. Do you want to make that larger? Is that good? Larger? Larger? Larger? More? More? Yeah. Is that good? All right. Okay. So what you see here is we did a docker run with called simple using a busybox image and we're just iterating over our loop and counting every second. So here we can see what the docker logs were just outputting the numeric sequence. And so when I hit the space bar that invoked the docker checkpoint command. So we see here docker checkpoint creating a checkpoint of the running container named simple called first. And so we can see here that this simple container has exited so it's no longer running and what we're doing now is restoring. So we start using the checkpoint flag for with the first checkpoint name and restarting that container called simple and the counter just keeps going. It's probably worth noting that the counter didn't restart at zero obviously. Right. The logs dash dash f replayed that part if you noticed. If we don't do that it will even it would start from zero if we if we didn't actually restore. Right. Obviously the whole point of checkpointing is that the in-memory state of that process was preserved at that moment and we didn't have to obviously start a new process and that's obviously the magic of CRIU is restoring that process to its exact state using the metadata. All right time for a more interesting example. So this is going to be an in-memory database migration example. So we created a front end which is going to show us information that's coming in from a tweet feed. So we're going to start up I think we're going to start up two containers both searching Twitter from a Twitter stream looking for a particular string that's going to feed into our reddit service that we're going to start as a container running on host B. And then what we're going to do is checkpoint that container while we're getting the feeds migrate over to the second host C and resume it and what we should see is that within the viewer the count and the uptime stays consistent and continues on with while resetting. So again this is all happening inside of reddit running in an in-memory mode. I feel you want to go through while I set this up. Yeah so I'll talk so Sean can type. So as you mentioned there'll be a couple containers involved so we have a simple go application that's using Twitter streaming API searching for strings that we can set up that's simply writing to a reddit queue and the tweet viewer is a simple Node.js app connects to reddits and says hey what are your hashtags what are you searching on and just starts displaying it has two different views an admin view and a viewer so as that picture showed we need a load balancer because as we're going to migrate this container we don't want the front end to have to reconnect to a different host so the load balancer provides that place to register the new instance on host C. So what I'm showing here is we have container called LB it's an engine x container and here what we'll see and what happens is that these services as they get executed and started they will register into the load balancer and we do some kind of service discovery here and what you'll see is that this services comp should start showing the new upstream path to those containers so that's what I have running there so we're gonna do a reddits right so we got a container running all right so our reddits store is now running I can see it's running on port 3288 mapped to the internal reddits port and on this side it registered itself so we registered host A or B with the same port and so basically now when we connect through this load balancer it will feed all the information over to that particular reddits server so now we're going to do a run tweet tweet demo so what run tweet demo did is start up two tweet2reddits containers and these tweet2reddits containers are basically searching the twitter stream for particular strings and what we're going to be looking for is the keyword open stack and docker and then the third container is the tweet view container and that's going to be our admin view that we'll show you what's currently running so i'll switch over to that the viewer is running on port all right so the admin view is showing that the tweet2reddits containers have started collecting tweets there's six open stack tweets and three docker tweets the uptime here is highlighted in purple so our reddits container has been up for almost a couple minutes and tweets should continue to appear so if you want to help the demo out you can keep tweeting about open stack and docker and this kind should go up faster but so now what we're going to do is just migrate the reddits container over to another host and what you see is that this upstream server changed to the new host so that the the viewer going through the load balancer should then automatically redirect all the traffic to the new container running on well the same container actually running on yeah exactly so so like those steps that that shown toward the end of the slides he scripted this up so that all those steps about exporting the file system exporting the metadata getting that under the new host creating the container with the same data so this is the new host there's no docker containers running we'll just do a migration so i'm going to migrate to host 10333 and so this is where yeah docker checkpoint command is running now it's sshing to the target setting up the container if we go look at the yeah if we look at this host there's no longer a reddits container running here if we look at the new host the reddits container is now running there so this is 10333 on port 32774 and that got updated automatically yeah the load balancer has been updated so if things work like they should just refresh this thing yeah you can refresh that or you can leave it running either way it's gonna it's gonna reconnect through the load balancer and you notice our uptime didn't go back to zero it's still climbing and the number of tweets is still climbing and if we looked at our viewer it'll start reading from that reddits feed and and and displaying them and if we go switch back and force between the admin you can see the counts either going down or up as it as it displays them in the viewer it subtracts from the queue and obviously effectively reddits is like a queue for yeah reddits is a queue between these two applications so do you want to migrate it back to yeah first host so the the you know some of the interesting challenges here we've already talked through some of them but one of the interesting things we found is that even though we fronted this with a load balancer we found that the nojs reddits client implementation seems to get really frustrated with socket connection refused and trying to reconnect so we had to add more code than we would have liked to have had to add to to try and handle the fact that the reddits server goes away even even if it's for a second or two the interesting thing is that the go library which the tweet to reddits containers are using we had to do nothing so again migration of socket-based services really depends again on on the library you're using and its capability to handle some of those some of those conditions so the container was migrated load balancer updated back to 332 and in the ui oh and there's Sean nice beautiful admin count still going right so our uptime is still climbing the tweets are still changing and we've migrated away from host b to c and back to b all right so we are six minutes from closing time so do you want to try the video sure streaming so the final final demo again deals in this area of sort of trying to handle migration of a tcp based service using a load balancer front end and i think what shawn found after quite a few days of pain is that it's actually very difficult to keep that stream alive for a client again in our case using vlc maybe there are other clients that handle that in a different way but we we found it to be a bit challenging so what i did here was i started up a a container sorry i should show this container streaming this buck for you know free buck bunny video uh running on this host and i'm routing it through the the load balancer so you can see here so now we have a htp stream pool uh running on this particular host so i'm going to connect through that load balancer as well so here we have the the video running so the reason why it's kind of blipping and stuff what we're running this in a a small vm there created an open stack and i forgot to bump up the resources so it's actually doing the transcoding within that small vm and so that's why it's kind of slow right now but what i'll do is uh simulate a migration so we ran into some issues perfectly with some networking issues with the the checkpoint restore and docker and so what we'll do is checkpoint it on this same host though yeah so in essence trying to to keep that stream alive at least for where we are at the moment this one's just doing a checkpoint and restore back to the same host any even with that i think you still have to basically ask the vlc client to continue so uh it the container was checkpoint and restored load balancer was updated right now it's probably buffered but what's going to happen is since the socket information was changed because of the port change and so forth the client should have to we would manually have to reconnect at this point so go back to charts all right so uh given there's only a few minutes left and maybe someone has a burning question let's let's uh let's uh bring up you want to go back to the slides sorry though i forgot to go through yeah anyway so this is what kind of happened for the video stream right um so obviously integration with open stack there's a couple options here um all these migration steps that were kind of handling manually in scripts um if this were a capability that a project like magnum wanted to offer then then these capabilities would have to be integrated at some level in a project like that to basically handle all the things that we talked about this container metadata migration the file system migration uh hopefully some of the networking challenges could be mitigated with overlay support for the for this kind of thing and the last bullet there kind of reflects that with a orchestrator that can handle service discovery that would reduce our requirement to have to run our own load balancer and and do the registration uh as containers come and go ourselves so there's a couple of our ideas any quick questions before we close out uh can you use the mic please the mic right there can you please show the origin of docker which origin you use oh this is 113 experiment well yeah so it's 113 pre-release because 113 is just about to go to code freeze so this the pull request that adds the checkpoint restore capabilities um is is about to be available in 113 any other questions like after you know migration right if the other node we have the same port or same process running right does they conflict because you're sending the metadata so yeah it's i'm not sure i fully understand the the question uh like in one node uh my particular docker is a process right and they have some process id and maybe some of the applications running some kind of port right if we migrate in the other node we have some kind of same port or process running so it will change the process id or the port or dynamically or what yes so so when we're in the docker world you can get auto so what was happening is you're getting auto assignment of a free port on the target and that's our load and then we were registering that with the load balancer to the port we we want fronted for the application but yes you could get conflicts without that kind of setup yeah and that's what we talked about if the ip is already allocated is already allocated then and those are things that can be corrected with other capabilities that exist right yes did you measure the downtime of the migration especially for the container with big memory the downtime yeah so um so obviously that during that time crue is handling the collection of that metadata so as memory is larger that can affect the performance what we found is the bigger issue is going to be if you actually have to migrate file system content so maybe a complete solution would be something with a clustered file system so you're not actually copying like a obviously you wouldn't want to copy movies and this movie player as part of that migration because that copy time could be significant the checkpoint metadata is relatively small compared to the image size so all right question behind you there so sort of sort of follow on to that I mean have you looked at what a migration model would be for what the migration time would be depending on what your application looked like you know how much of the file system used you know how would you be able to predict how long the migration time would be in order to do something like to have a migration policy you know which containers do you move yeah so we we we haven't spent time on that obviously as I said at the beginning this is kind of early work and I think a lot of that would be kind of the next step I know I've been waiting for the for you work to to mature okay so you know these are the questions I'm sort of looking at myself right all right to honor the next talk I think we need to end here we'll be around over here on the side if you have more question but thanks for coming thank you