 Good morning. Good afternoon. Good evening, everyone. Hello, and thank you for joining us on the OpenShift live streaming networks. We are live streaming on Twitch, Facebook live and YouTube live. Today I am joined by Adrian Reber from Red Hat, and I will let him introduce himself. But the topic of today is, we'll discuss in a second, but the correct pronunciation of Crewe, and how to deep dive into the dive into some of the things that can show you. So yeah, Adrian, please introduce yourself for the audience and tell us more about what we're diving into. Thanks. So my name was already set, and I'm working on Crewe for, I'm working on process migration for at least 10 years now. And Crewe came up around 2012, and since then I'm somehow involved in Crewe in different parts. First in high performance computing more. And when I joined Red Hat in 2015, I started to look more into container migration. And that's also what I want to talk about today here, container live migration, and especially how it's connected with Crewe, and how Crewe enables you to migrate a process and a container in the end. And give an overview of what's currently doable, show demos, how to do it. And yeah, I'm going to start sharing my slides now. Fire away, buddy. Let's do this. This should be my slides. Okay, live migrations. Yeah, so that's my introduction. So a lot of things which I'm mentioning today are also available in a blog post at redhat.com. And this is all based on the rail 8.1 beta, which is what is written there. A little bit more has more rough edges than today. So what you're going to see today is it's actually easier than what you see in the blog, but it's all written down all the steps there if you follow those, you can actually do the same things I'm doing today. So I want to show us a few use cases, how you could use container migration, what it's good for, then give a lot of details about Crewe, how it enables you to do checkpoint restore. And then a few demos for the use cases I presented and a few things I'm looking what could be done in the future with container live migration, how it is working today. So I found out a good thing to do is give a different definition what container live migration is, because this is often a thing people ask me, what is it and for me it's pretty simple, it's basically the same what a virtual machine migration would be. So you just transfer a running container from one system to another, it could be stateful migration, it could be live migration, it's the thing just keeps on running in the same point in time it stopped on one system and then it continues to run on the other system. And the steps to do it are pretty easy, you somehow serialize your running container on one system, you transfer it to the destination system and on the destination system you restore it and then it keeps on running. So that's the basic idea how this could work or should work. And everything I'm talking about today is based on Crewe. Crewe is short for checkpoint restore in user space. Awesome. I think there are multiple integrations of Crewe in different container run times container engines. And I will mainly focus on on podman today because that's what I have been working on the last two years or so. And before continuing with the use cases what how to use live migration and I think I want to show a short demo how to migrate a single process with Crewe. No containers, nothing just a single process. I will stop sharing my slides. I will start sharing my demo window. Beautiful. Can you see it? Yes. Okay. So I have a really simple program I call it's called minimal that's my minimal test case if I do changes to Crewe. And what it does it uses 100 megabytes of memory writes the memory and it sleeps for a second, and then it prints out the host name and continues doing this and the reason I'm printing out a host name is so that I can see if I migrated to another host that it's actually another host name. So, let's switch to another screen here. And now I will do and Crewe and then dump to checkpoint the process to dump the process on this. I will tell it I want to, I have to specify the PID I want to dump. It's called minus T because Crewe always dumps a complete process three so you cannot just dump one process without a child process you have to dump always all the child processes for process. And this is because it's pretty unlikely that it will work if you lose one of your child processes so it will dump the whole thing. And I'm, oh, let me let me show it to you that it's actually running here. Yeah, so this it's running here using 100 megabytes of memory. And now I also know the PID what I usually do is I just say, take off minimal. And then because the process is using the console this is this is actually this is a bit more tricky part to to dump correctly but Crewe has made it possible so I say minus J for shell job. And then I say a destination directory where I want to dump it. I just say dump and then I also say leave running so the process continues to running continues to run on my source system of the migration and and now I actually have to create the directory I think before running the dump. That helps. So let's hope it works. It sounds good. It says interrupt the system call because I actually sleep for four seconds so it has to interrupt the system call there. But now I should have a dump here in a checkpoint in my dump directory and it's 100 megabytes this is the same as the memory and it used when running. If I switch back to my other screen I see the process is still running so now I have a copy running and I have a copy stopped. What I will do now I will transfer the checkpoint from one. I'm running here on two virtual machines. So my laptop is running fedora and I have two rel 8 of two VMs which I'm using here. So now I transfer it to the other one. Let's see. So it's transferred to the other one. Let's SSH into that machine. And now the PS tree the stats the whole nine yards that's cool. Okay. Cool. Yeah. The whole process has been has been dumped and we can actually look inside and I hope there's a there's a tool called crit which is crew image tool. And it should be able to tell me information about my my dump process. Maybe maybe I tried to have a look at files should be interesting. Maybe I don't know. JQ or something. Yeah. So it says at least here you can see it. Yeah. The route. The mode. The whole nine yards. The shell terminal it uses and which libraries are using it. This ties this ties so well with the stream we ended with yesterday with Scott McCarty just basically showing us like this is you know containers are basically just stuff on a file system just and the process gluing it all together kind of thing. So yeah, like this is a very nice dovetail into that. Beautiful. Yeah. And one important thing is crew you actually checks if the if the library has changed and you cannot restore a process if the library has changed because if the library has changed. Then the functions the process is using of the library are probably at another address and because the libraries are not loaded again they are just mapped into the process again. It will it would just jump somewhere into the library where the wrong code is so create you actually checks and make sure that the library is the same. During and before migration and and this is the reason why containers I especially well suited for crew you because the libraries will not change in a container if you reboot your system. If you update your container you cannot check pointed. Right. So quick note here no one can see your terminal. Everyone sees multiple extension integrations exists. I can see your terminal and zoom. So that's interesting. Okay, do me a favor stop sharing your screen and then start sharing your terminal again. Okay, I stopped sharing the screen. And I share the application window again. Better now. Yes. It's like it came through it took a second but yes. All right. Okay, so I'll continue. So basically he did a cry you dump. Did the look up a pit and. Yeah, people are asking like just restart the entire thing. That makes total sense to me. Yeah. Yeah. Sorry, folks. So the process here I will also restart the process. So the process I had is share minimal. I already said this is my minimal test case. It basically just prints out a host name and uses 100 megabytes of memory. And I will remove the check point because it would work even if I don't remove it. But just that I don't have to transfer to a checkpoint from one system to another. So create a directory where I can dump the process into. And then I said, could you dump? That's the command sub command for checkpointing. Then I say minus T for the process tree with this PID. And then I'm lazy. I'm not looking up the PID. I just tell pit of get me the kid of my test process. Then I'm saying minus J to make it work with a with the terminal. Then I say minus D for destination directory where to write the checkpoint. And then I say leave running. Do not stop the process ticket running while I'm doing the check. So the same again interrupt system because the process is not doing much more than sleeps. And if I look back the other end of my screen, I see that the process is still running. I will transfer the checkpoint once again to the destination system and go to my destination system. And then if I go into temp dump and do crit show files like we did before, I can see now information about my checkpoint. I see the binary which is used is called. She have minimal and I see which libraries are used with size they have and so on. And so this is all the information crew collects is in these files. And I say crew restore to restore the process from the checkpoint. I say from this directory temp dump. And I also say minus J because it's again running on the console and I want to connect it to this console here on the other hose. So let's look where the other process is. It's currently saying step 32 and rel 05. And what should happen now? It should now say step at a point in time when we check pointed the process. And because how my test program is set up, it will first print out the host name of the old name because what my test does, it does a sleep. It does prints out the host name. Then it reads out the host name. Yeah. So it'll have that old host name still and then it'll pick up the new one. Yeah. Correct. Cool. Let's hope it works that way. Okay. Yeah. Oh look. And now it should say step 16 on rel 08. So it has reread the host name and so checkpointing restoring migration worked dumping it on one system transferring it to another restoring it. Okay. So that's the basic crew functionality. Yeah. Like that's really, really awesome to be honest with you. Like I was saying on the stream yesterday with Scott McCarty, please check it out. Go to our video archive or look on YouTube. You can see it. The things he was showing us yesterday were basically that the container is a process that has a bunch of files on this. Right. And it's just, you know, looking at those files and changing things as necessary and processing information and then writing to files and so forth. So on. So this crew scoops it all up and gives you the capability to say, all right, it's running on this box, move it over here to that box off it goes. Right. Like I have a pod man bot, you know, running for a telegram channel. If I needed to move systems, this immediately comes in handy. Right. And you see my use case for this. Right. So what else is going on inside? Like where keep going, please continue down your demonstration rabbit hole. I like where this is going. I'll just switch back to my slides for a bit. And then I will quickly be back also at demo time with potman then. So I said multiple integration exists and I will focus on on potman here. You see my slides now again. Right. Checking. Nope. So I'll share and then reshare. Okay. I have to do it twice every time. Okay. That's fun. Okay. All right. I see in zoom. Do I see over here in zoom? No. Wow. Okay. So this is fun folks. I'm looking at the zoom that's going out on the broadcast. I don't see it. I see it on my screen over here though that I'm watching and talking through. So fun times with zoom screen sharing folks. Okay. I'll try it once more. Yeah. Why not? See. It happened over here and over here. Yay. Cool. So now it should be going out to the street. Okay. Perfect. So now let's talk about use cases when talking about containers and one of the use cases is I have a container running and the container to forever to initialize and it has all its caches are all hot and it's really running fast and it would take me a half an hour to get back to this point. So one thing you can do, but you need the new kernel because of some security vulnerability, vulnerability you want to fix. So one thing you can do is and save the state of the container using checkpoint restore reboot quickly. If your system was quickly, of course, and then continue running the container from that same point in time. So I actually made a few pictures there. Here this is the colors are basically the memory of the process. So I have all the memory of my host running and my container is running on the host, my container host thing. And so what I want to do now, I'll checkpoint the container. So I kind of take out the memory of the running process and write it to disk. And then I reboot my host. It's all the memory there is gone and I restore it and it's now a different memory because it's a new kernel or whatever, but I can still take my old memory filled container and use it as it just was before. And so I'll go again to my... All right, let's see how all this works now. Yes, how well this works. So share terminal. I've got it in one place. I've got it in two places. We've got this figured out, I think, hopefully. So let's see, are there any Totman containers? So there's no container currently running. So to demonstrate this, I actually have... This is all wrong. Let's try it again while fly... It's built while fly around in your search. Yeah, so... Okay, this is too long a month. So I created... So the goal for this demonstration is to have a stateful container basically. So you want... So I think in container, in pure container theory, containers are stateless. You can start as many as you want and stop. And you don't care about the state in the container. But the whole container migration idea kind of depends on it if there is a state in the container you actually want to preserve. So for this demonstration, I created a container. It's wildfly-based. So this is a Java application server. And I created... And I rewrote the Hello World application to be even simpler than it was. So what it basically does, it returns a number, increments it and waits for the next request. And the next request will be one number larger. So it's as simple as it can get for a stateful container. So let's start it there. So let's hope it's running. And I type correctly. So the container is running there. And let's see if I can talk to the container. I have too many tests running here before. And that's it. So this is basically just makes... I know this is also wrong. This is better. So what I'm doing now, I'm saying curl to talk to the container. And I need to know the IP address of the container. So I just use podman inspect from the last container. I ask it for the IP address and use that in curl to talk to my wildfly stateful application. And if I do this command now, I should get a zero. And it's a zero. And if I do it again, it's a one. So it's beautiful and really simple. So now let's say podman. And Java. Yeah. And Java. So yeah. All right. I actually did this also for the documentation. I rewrote it in Python. So it's much simpler. But I don't know. At the time I wrote it, I somehow I felt it would be interesting in Java. The interesting thing about Java here and which I will talk, which is actually my second use case is that until this container is ready to answer requests, it takes like 10 seconds until it's up. And if I restore it from a checkpoint, it only needs five seconds. So I can already on a really small scale and gain 50% in startup time by starting it from a checkpoint with initialized the Java libraries. Wow. But anyway, so let's go podman container checkpoint. And then I just say last because I just want to checkpoint my last container. So now it writes the image to disk just like before it podman adds a few additional container information. And but now the checkpoint is written. If I now try to talk to my container, I just get a bad URL because the podman inspect doesn't return. So let's reboot my VM. This should be fast usually like, I don't know. Hopefully fast up any upon your end of me or whatever. Yeah. Yeah. So let's see is it back again? Yeah, there it is. And if I now say podman PS, there's no container running. If I say podman PS A, I see there's a stop container. And this actually is the container with a checkpoint. What doesn't exist in podman? You cannot see if the container has been checkpoint or if it just has been stopped. It would look like it would look the same. But if I now do a restore, the restore would fail and say this is not a checkpoint. So let's podman container restore again minus L for latest and now the container is running again. Nice. Okay. And now I can again open up for three minutes. Interesting. Yeah. It remembers the time it had from previous time. So now I get three. So it doesn't start at zero. I was able to keep my state and it continues to run where it was. So this is the reboot use case. So the other use case is the quick startup use case. I'm just going to do the demo right now of the quick startup use case because of the, so that we don't have to switch to often between slides and criminal. Cool. So let's do that. So I have this container running now and I can talk to it and my previous command was podman checkpoint minus L. And now what I can also do, this is also needed for container migration. I can actually tell podman to export the checkpoint to a file and taking this file, I just can copy this file to another system and then I do a podman container restore based on this file. And the file contains everything. If the image is missing on the destination system, podman will actually pull it from the registry and then restore the container. But this is also what I'm using now to create additional copies of the same container. So what I'm going to do now, I say now container podman container checkpoint. And I'm looking at the help quickly. So I say podman container checkpoint. And I say again, minus L for latest, then I say minus, minus export to export it to an external archive, which can be later imported from podman. So I say export and I call it just checkpoint power. And then I say minus error, I leave the container running the other tests when I did a reboot. The normal way crew and podman and podman then also work. If you checkpoint something, the process is stopped, it's basically killed. But you can always tell it to keep the process running. So that's what I'm going to do now. So I now say checkpoint this thing, export it to a file, but keep the container running. Oh, this is interesting. This is actually fixed upstream. And there was a change in RunC at some point which was not tested with the checkpoint correctly. So RunC kind of expects when you run checkpoint that the container should be gone. It's not gone. It looks bad, but it doesn't do anything. Yeah, it's a, yeah, I think we saw some errors like that yesterday, same kind of thing, right? Where it was like, oh yeah, it's not quite fully baked in upstream yet kind of deal. Yeah. So the container still runs. I still can talk to it. I get back a four or five. So let's restore a copy of that container. I say Potman container restore. Then I say import and then I give it my checkpoint image. And then I have to give it a new name because what Potman would do now it would now try to restore the container with the same container ID and the same name it had originally. And this will fail. We can actually try this, I guess. And this will fail because the container exists. It says that ID already is in use. So what I can do is now I say minus, minus name. I just give it a new name. And because my application is called hello, I just call it hello one. Let's restore a few copies here. So this now copies all the memory back into the process. So it takes like five seconds to restore the whole Java thing as it was before. And now if I do a Potman PS, I can see, okay, I have hello one and hello two. And the original name, which I don't know how to pronounce. Xenoduck Yolomar. Yeah, that one. So now I can say, Potman inspect. And now I'm not using minus L. Now I'm just going to give it a name. So now if I say Potman inspect hello one, I should get back. I forgot before, I guess. Because that was. I think you went, yeah. I think you hit three. So I get a four here. And if I go to hello two, I also get a four. And now it's five and six. And if I do hello one again, it's five and six. And if I go to the original container, which is called Potman PS. So let's see what I get there. It's also six. Now I have three almost identical container running with, I think now they're all at six, but the states are basically now independent of each other. So they can answer requests to whichever client they get from. So this, my measurements here where if I do, maybe I can, how did I do that? I just want to show it that it takes longer to start a new wild flight and to restore it from step check point. Let's try Potman run. Yeah. That's all wrong. That's for SQL Server. Yeah. Actually, I yesterday I tried to migrate SQL Server and it also works. Oh my gosh. Okay. I took some time. No, no. Really? Yeah. It just, it just worked. I was amazed. Okay. I wasn't expecting it to work. So what I'm going to do now, I'm going to start a container and then I will print out date to see when Potman run finished. And then I will do a few curls with the date command to see how long it, so it's not very scientific my measurement here, but let's see how it goes. And so, so I can still cannot talk to the container. It's still refused. So now I, a wild flight is ready and needs to load the application. And so I'm now at a second 36. And when I started, Oh no. This is no screen here. I cannot scroll the screen command somehow broke my terminal. So I cannot roll. Let's say it took about 10 seconds. And if I try to restore it from a check point, let's see. I just print out a date, restore it as hello three and print out a date again. And it usually takes like five seconds here. Six seconds. Six seconds. Oh, the other one takes like 10 seconds. So it's, it's a bit faster and I guess if you get really big wild flight container, it could be really fast, but yeah. Like that's significant. You know, considering that, you know, some Java apps I've seen could take, you know, several minutes to get everything loaded up and going. Right. Like if you're restoring, if you take your Java app as a dev and you check it in as a, you know, running container and like a, some kind of, you know, archive or something. And you just say, take this tar ball and then run it as a container somewhere. And it'll just spin up quickly. That's an interesting way to do development, you know, like all of a sudden your stuff's running instead of starting in production. Right. Like you cut the start out, you cut the startup time just right out. Right. Like the startup time is just merely getting everything lined up on the, the new system and starting it. Yeah. That's very, very interesting. Okay. Cool. We're actually talking with a few JVM developers who want to get, who want to use Cree you inside of the JVM to reduce startup time. So they, they showed it already that it, that it can be integrated and, and one of the problems they are dealing right now is, I think, is that the currently Cree you will require us to run as root because it needs a few things. Yeah. And then again, uninteresting for, for the JVM, if you always have to run it as root, but just a few weeks ago, we started discussions upstream Linux kernel and hoping to get a non-root Cree working there. And then it could be also interesting for, for Podman if, if we actually get into the kernel of necessary interfaces there. Yeah. Like I could definitely see some, some serious gains there just to be had in the JVM, the kernel itself. Yeah. There's a lot of interesting points and pieces here that it could actually speed this up too. Right. Like, huh, tighter integration with the kernel and the JVM and the whole line yards. Right. Like if I am, if I'm a Java developer, I am now looking at this technology and thinking, I need to keep this in the back of my head because this is something that under, under the hood is going to save me some time. Okay. So I'll stop sharing the demo screen and let go back to my flights. Hope that works. If you see slides back here, you're good. There was my container. So the other one was use case was quick startup, which I just showed now you can, from a checkpoint, you can create multiple copies from an already initialized container. And again, my nice diagrams, I have the container with its state. I'll take it out of the host and I put it back in multiple times into the host. And the other use case is container life migration. What is my main topic kind of the talk? And in my diagrams, it would like to look something like this. I have my source and destination host. I take out a container from one host, transfer it to another host and restore it once or multiple times on the other host. And this way I have migrated my container from one system to another. So there is a question in chat. And I don't know if you're going to touch on this. How does this work with Kubernetes or OpenShift or any of the orchestration platforms? Okay. So one, I think one of my last slides is, is what is planned for the future. And one is the, I can, I can talk about it now. Yeah. We'll get to the question elastic or as one elastic. However you say that wonderful screen name. One thing is we're working for the future is, is the non-root checkpoint restore. That's an interesting thing which, which I think is important to work on. And the other one is, which was kind of always one of my goals is how to get this into Kubernetes. And I've been thinking a couple of years already about this. And in the beginning I was very nervous about it because it felt like it's all, it seems like I'm touching almost a philosophical part of containers. They are stateless. They do not need to be migrated. Right. So it was always, I was afraid bringing this up at a Kubernetes level because they would say, you're crazy. Nobody needs this. So interesting development from, from Google's side is, which I was also mentioning later, but that's okay now. I can mention it now. So they use internally, they use the container runtime. I think it's called Borg. And when, when one of the central conferences where the career developers and the container developers meet is usually Linux Plamos conference. And two years ago at Linux Plamos conference Google presented how they use career in production in Borg. So Google actually uses this to live migrate containers from one system to another in production in Borg. So this is the point where I think if, if Google is involved in Kubernetes and if they use it internally, even if they, they saw different groups, and this might make it easier to get migration into Kubernetes at some point. So, and this is definitely something I am interested in and working towards to get this done. I think a few months ago I actually started coding it and I hit a problem pretty fast because it's not a, it's not a bad problem. It's just, and so in Kubernetes you usually have, I think there are always pods. You have multiple containers and a pods and they all share the PID namespace and then what query, and if you point to a process which is in a PID namespace, but it's, but if, I forgot it. But the problem is Cree, you cannot checkpoint a process, I think out of a PID namespace and into another PID namespace into, I opened the poll request at CreeU, it's currently in discussions. So I guess we will have pretty soon the possibility in CreeU to get a process out of a PID namespace and into a PID namespace. And once that in CreeU, I can continue my work trying it in one of the Kubernetes runtimes. I'm actually trying it in cryo to get container check pointing, working and restoring. And once that's done, I can think about migration and once that's done, I can think about Kubernetes. So it's a lot to do. I think about layers, the many layers of the onion of container orchestration, run times and so forth so on. Yeah, like I get it. Okay. That's, that's about Kubernetes. Okay. So now we already are in 14 minutes. I see. So maybe I'll make it faster now. So I want to talk about CreeU a bit. There's something at, at what's 11, there's another broadcast, right? Yes, at 11 there is another one. Okay. I'll have to finish at some point. Yes. Okay. So about CreeU. So to migrate a container, the first step is you have to checkpoint the process and CreeU does this using ptrace. So it uses ptrace to stop the process. And then it starts collecting all the information about the process and writes it to disk. And it takes information from prog pit to write it to disk. And, and this is also a reason why CreeU is called checkpoint restore in user space. Because before CreeU, there were other implementations of checkpoint restoring in Linux and they partly were working all in the kernel or in some different way to intercept system calls. And CreeU is kind of the result of the last 10, 20 years of implementing checkpoint restore. And CreeU is the thing which it seems most of the communities have accepted as a way to go. So CreeU uses from the beginning existing interfaces like prog pit to get information out, out of it. And one other interesting thing about CreeU is what, what it's called the parasite code. The parasite code is kind of my most favorite part of CreeU because it's also the craziest part of CreeU. If you know what it's doing, it's parasite code like it name says it's injected into the running process. And then it's running as a demon in the address space of the process using this technique. The main CreeU process can talk to the parasite code and get information about the process from within the address space of the process. So we're inside of the process, getting all the information out, writing it to disk. And at the end, the parasite code is removed again from the process. The process never knows it was under control of CreeU or the parasite code. And then it just continues to run or the process is stopped, killed, whichever you want. Yeah. So that's interesting. Yeah. In a simple diagram, it would look something like this here. We have the original process code and replace. We move out one part of the code and replace the parasite code. We keep the old code available so we can put it back in later. And so the parasite is running. And once we finished, we just put in the old code back into the process, in the address space there. And then checkpointing is finished and all the information is written on disk. And then I think I mentioned this target process is killed or continues to run whatever you want. So yeah. I could definitely see why that doesn't quite jive with the whole name spacing paradigm and everything else. Yeah. But that's very, very cool in the sense of like how you can manage like everything is a process essentially, right? And you can manage processes in any number of ways. Please continue. Yeah. So I have a few slides about container life migration and AC Linux. I'll skip those, but to get a crew working with a part man on Fedora and REL, I of course needed complete AC Linux support there so that the container is running with the same labels after restored and it was before. This was somewhere there was support in crew since 2015, but it was only focused on up armor at that point for like C for the like C and integration and it was basically the AC Linux support was basically this, if you're running under some kind of context, which is different from unconfined, I'll just stop doing crew, just stops doing it and says, I don't know how to handle it. So there were a couple of places I had to change in crew to get it working directly, especially if you think about the parasite code. The parasite code is running inside of the container process and suddenly something from inside of the container wants to talk to a process on the outside. So this is things AC Linux will not allow out of the box. So we had to get additional policies and do the labeling of the sockets and processes correctly. And then especially during restore, it's also interesting because crew does a lot of things which you do not expect from a process happening. So we have also additional policies there to make crew run under control on AC Linux. And the basic thing what we did is we basically write the AC Linux labels as late as possible. So we run on the outside labels as long as possible during restore and then just one of the few last things we do before giving control back to the process, we switch to AC Linux label and we managed to do it in a way that it actually works. So as the Linux is finished, so now we have checkpointed the process, we have dealt with AC Linux. Now the second last step of container migration is to restore the process and this basically is we read all the checkpoint images into memory like the ones we've seen before and try to recreate the process as it was before checkpointing. What crew does is basically in my initial test program minimal, it was just a single process so crew has just to create one process but if you have a large process tree like with the Java a program with a lot of process and threads crew will create a child process and a new thread for each thread it has to restore and using clone or clone 3, this was also something I was planning to mention but I'm not so we're doing something called a PID dance. One of the problems with crew is if you're not running as a container and if crew wants to restore a process it requires that the restore process has the same PID as the checkpointed process and this can lead to PID collisions and if the PID is already used then crew will just stop restoring and if you do it in a PID namespace you of course do not have this problem because the PID namespace is empty when you do a container restore. So this means crew has to recreate a process with the same PID and that's what we used to call the PID dance because it's a bit complicated or it used to be a bit complicated in Linux now it's easier when we introduced setting of the PID with a new system called clone which was introduced last year just going over here and this is what I meant here crew creates all the child processes and then the processes are morphed into the destination processes and one interesting thing to this or one easy example is at least I think it's good to understand a file descriptors when crew checkpoints a process it records the file descriptor ID and the position of the file descriptor and during restore it just recreates the file descriptor with the same ID pointing to the same file name pointing to the same position in the file and when the control then is given back to the code of the original code the file descriptor will be the same pointing to the same file to the same location and if the process reads something or writes something to the file it will be at the right location and that's what crew basically does with a lot of the resources it tries to restore it maps all the memory pages back to the right setting that plays lots of security settings as late as possible as mentioned and then crew just jumps into the old code into the restored process and continues to run the process and that's how crew does its magic magic right here and now to contain a live migration a bit more so there's the whole crew you thing came from from the container runtime openvz because they they invented crew to provide container migration for their users from the beginning they worked on it they invented crew they wrote all lot of code for crew so that's why I'm mentioning openvz here as the first I personally use it myself so I'm not familiar with it but that's where crew comes from and it's integrated there then like I mentioned Borg from Google they use it in production to live migrate containers from one host to another they use it especially if the load on one host is critical so if they have to free up for I think what they said is especially if you have interactive jobs like I don't know mail or something right and the users want to have it should be fast so if there's something running in the background which takes away your resources like video transcoding or something like this this will be migrated off they cannot migrate things like interactive mail reading things or search because that crew is too slow you would see it in your web browser that something was not working as expected for background jobs which take a few hours and they use live migration there wow that's cool I mean that actually could explain why some of my crazy search filters take some time Gmail right it might actually be migrating that container to another system it's entirely possible but I mean you're filtering through thousands or tens of thousands of emails but I've noticed that it does take some time to create complex filters and this might be part of it interesting who knows what's going on back there in Borgland yeah and then there's an integration like C with Quill for some time already they had a pretty nice integration there then there's a Docker integration which I would say at this point is basically maintained so nobody's actively working on the Docker part of the crew as far as I know and then there's the portman integration which I did for the last two years and I'm pretty happy about the results it works pretty good especially I'm happy about a possibility to export the whole checkpoint into a file and then transfer just one single file to the destination system and have container migration working exactly it was that use case that I was telling you about earlier if I got a bot and I need to move it there's only one container on one box I can kick it off to another box anywhere I want yeah that's cool and then portman I don't have to mention that here I guess so yeah the first discussions I had with the portman developers was I think in 2018 and I think I was talking to them at DEF CONF and I was asking what they think about checkpoint restore and it was really I feel you provide a patches sure we will merge it and then I I think the first code was there around May 2018 and in October it was then merged this was just a checkpoint restore support which we saw in my first demo where I rebooted the system and this required changes to run C and crew and portman and then I continued working on it and to get a live migration working with the exporting of the checkpoint this was done in June this was finished in June 2019 again run C crew as a Linux portman at all levels so this makes it really interesting because you have to wait until all the all the stacks all the different patches are applied across the board wow right complex problems and this is actually what I mentioned the checkpoint includes file system changes this is something which is important for users which makes it easy so you don't but did you do a I don't know portman commit and then portman export the file system and import it on the other side so this needs to be all in the checkpoint and it was my goal to have it to make it easily for users of portman container migration so now I have a few slides showing my demo I will do it live now this is just back up if it might fail so I will unshare okay I will share my terminal again in the reflection behind me it looks like it popped right up there we go all right look at you yeah so portman ps so I actually have it already prepared I think a lot of there I already did an export let's do an export once more again portman container checkpoint last and minus error keep it running and then say to the checkpoint now it's there we saw this message before ignore the error that's expected yeah and now let's transfer this container checkpoint file to my other host 79 megabytes so it's not that big so let's go to the other host and have a container running here no portman container restore import and now it's doing the restore should take like five seconds something like this there we are see if I can talk to my container again my magic long command I say portman inspect to get the IP address and then I talk to the container I say we now should get back a seven or eight or something like this what do we get for four okay but if we now do a restore a second time under a different name hello five we should get back again four I guess from the same checkpoint and now I say again last container IP address and there's a so I now have hello three and hello five and restart here running migrated from the other system live migrated probably depends on your definition of live migration but at least a stateful migration right stateful migration I think like whether it was automated or not I think the fact you can demonstrate a stateful migration is pretty cool yeah so so there's actually crew supports so which machine migration as far as I know they they what they do do they do pre copy so they copy the memory before then do migrated and then they do post copy to get the memory and you can do this and I've done a demo migrating a container actually once around the world in I think it was always Europe 2017 I was migrating a run city container once around the world and I was using pre copy exactly for that because so the client connection would not abort because of long container down time so pre copy basically does a dump of the container the container keeps on running and I'm transferring the data to the destination and once I have the initial checkpoint transferred only the delta is transferred and the downtime is much shorter during that time so this is already my my last demo and I'm already almost at the end of my slides I think this should be the right slide can I see it in your background no okay I think if I do not actively cancel the sharing if I just select a new slot a new window fun that's good to go okay now this is so the slides now are only my commands I was using for container migration this is still migrating the same container this is also the same container here and this is the thing I have no idea what it will be called but something the migrator container under Kubernetes whatever that might look like and then the other thing I mentioned is non-root checkpoint for rootless spotman containers or for JVMs or whatever use case everything in high-performance computing people are also interested in checkpointing restoring and yeah like I would imagine the CRUN folks are like super interested in this I'm already working with CRUN it's already it's creative support and CRUN is almost finished there's a one there's one small pull request which I need to get into CRU and once the next CRU is released I can update CRUN so I have locally all CRUN working with CRU but there's again a few layers I have to go through yes of course there's always layers yes awesome well Adrian this was wonderful please finish your slides okay this is just a summary and a few links and that's it so yeah if you want to give me a link where folks can find your slide at some point I can share that out okay I will I will do that yes I can share that on social media and folks just look to the OpenShift Twitter stream for those links and yeah we can go from there and Adrian thank you so much for joining this is super cool stuff I got to get my hands dirty with this I can already see some use cases for myself right like so thank you so much for coming on today stick around folks I will be joined here momentarily by the one and only Jimmy Alvarez and we're going to install Red Hat's advanced cluster manager and get crazy with it so cool Adrian thank you so much for coming on and showing us CRU I appreciate all your work on the project and all the other various projects in between to get to where we are today because I think this is genuinely helpful for folks so thank you so much okay thanks for having me alright folks stick around we'll be right back yeah bye