 Okay, so the next step is going to be Android Red Hat and Red Hat, just to wrap it up, I'm only going to be talking about optimized container-led migration. Yeah, as Stefan said, Adrian and I work together on bringing optimized container-led migration to Lexi and Flexi, and we're going to talk about how much time we have while doing this. So, first slide, those are the results. Again, we try to make sure that it actually helps what we did, and the red bars are the optimized container-led migration, and the blue bars are the unoptimized container-led migration, life-migration, the scalar seconds we see for different test spaces, all time the optimized space was better than an unoptimized space. I will go into details about our test spaces and all the setup at the end, but I just want to start with the results that it actually made sense for me to try to do. Right, so this is just a recap. I'm not going to go into this in any depth because you heard about this a bunch today. So, we did this using Lexi, which is the system container manager that sits on top of Lexi, which is just a shared library, and the shared library actually does all of the heavy lifting for life-migration. It has an API that's based on Creo, essentially, and what Lexi allows you to do by default, and this is all on Tyco who gave a talk before, is that you can already move containers or life-migrate containers between different Lexi instances. We have this command called LexiMove, and if your container is running, and you send it to a remote Lexi, you're running on a different host, you can perform a life-migration, meaning who is familiar with the concept of life-migration. Okay, so it's basically just you're dumping all of the volatile state to this, you sink over the file system, and then you sink over the volatile state, you restore the task on the receiving system, and you're good. Yeah, exactly. So, yeah, here, as I said before, it's based on Lexi Checkpoint. Lexi Checkpoint is the implementation in the low-level library, and it's based on Clue. So, LifeMemory is a checkpoint-restoring user space where you snapshot the process. You can imagine it like this, dump its memory state to this, and then you restore it to the exact state of execution rate of snapshot. That's, so to speak, the idea which is a great use-base for example databases, any kind of scenario where you're really interested in the volatile state of a different process, or a process tree. And the migration steps for a container that you move with LexiMove, for example, in these cases, you sink the file system while the container is running, you dump all the processes using Clue, which stops the container, and this is crucial, then you transfer all of the volatile state, so everything you dump with Clue, and then you do a final file system, and then you restart the container on the destination, restoring all this volatile state, using Clue again. And the thing is all of these three steps, using Clue to dump the process, and transfer the Clue dump and transfer the file system, and all of these three steps, the container, stop. So the migration time actually depends on the memory size of the process. This is Clue to Clue, and how long it takes Clue to take the snapshot of the given process, and then the file system change rate. So for example, if you have a lot of IO on the host, on the source, and then you have to do a file system sink, obviously you have to do a rule of IO. So pre-copy is essentially a way, thanks. To optimize this process at least for the volatile state of the given process, the implementation in, well, basically it uses the soft dirty bit from the page table entry, so this basically just means every time a process writes to virtual memory, the kernel will set the soft dirty bit in the page table entry, at which point basically the memory of this page table entry has changed. So what you need to do is you clear the page table entry, and then you wait for the process to do anything interesting, and then you check the page table entry for the soft dirty bit again, and then you know that page has been bidden to you. So what you do now is when you freeze the process, you dump the memory, the process can continue to run, and then you can transfer the memory state to the destination. And the trick is you only dump the memory changes, memory changes, which have changed. So you can do this iteratively, right? Every time the process changes, the memory changes, you dump it again, you transfer it again, but you can also use it to calculate the delta, which basically means you say given threshold, and when this threshold is reached, I want to do a final dump, and then to transfer the rest of the memory state over, and then you can, for example, start with the file system or the final file system sync, and go down. And for completeness, I want to mention the other optimization tree you offer, so tree that does not always support pre-copy migration, you can also do post-copy migration, this is based on universal, and has been developed for the last few years by my people there and me, and what you basically do, you freeze the process, you transfer only really a minimal state from one system to another, and if you have a 100 megabyte process you want to live migrate, and the pulse reaches not memories, like 200 kilobytes, so you transfer 200 kilobytes, then you restart the process on the destination system, and once the process keeps running, and hits a page which is not there yet, it follows the page follows the user space, and user space can request the page from the other system, and transfer the page to the other system, and the destination system, and the process can continue running until it hits another page which is missing, and so this is just for completeness, and we didn't do it for this first round of optimizing container migration, and the optimal thing would be the combination of both the first two multiple freedoms, until you're satisfied with the amount of memory which has been transferred from the process into the destination system, and at the end, the missing pages you are pulling over using lazy migration, so you have the best of both approaches, and you can hopefully decrease the containing downtime in the program, and like I said, we did pre-copy migration now for this approach, and so what we actually did is we had first to look at low level library LXC, which we looked at, and the pre-copy migration support for LXC actually already existed, but as far as I know it wasn't really used by anybody, and it was actually broken, so not the whole thing was broken, but some small parts around it, so we had to fix it, this was a quick fix and it was not really difficult, and to go a step further, we implemented in the low level library check if the current platform actually supports pre-copy migration support or pre-dumping, and the reason for this is this depends on many features, it actually depends on the, not really on the architecture, but on the kernel for the architecture, because not all architectures, kernel architectures are implementing the software to be in the page table, then it depends on the kernel version if it actually knows about software to be it, and then it has to be enabled to support in the kernel, and then the Clio version needs to be near enough to support pre-copy migration, and all these things can be luckily easily checked by just calling Clio and asking for a certain feature, and we added this whole checking thing into LXC so that we can use it from LXC to detect if the current running system actually supports pre-copy migration, so when this was done, we started to look at LXC and what we are currently doing to pre-copy my great container from one system to another, the first step is we check, like I described right now, if the underlying system supports both sides, supports pre-copy migration, and if they do, then we check if the user didn't disable it, because right now if the platform supports pre-copy migration, LXC will always do pre-copy migration, and if the user dislikes it, he can still disable it, or to slow, or to much overhead, or whatever, and then LXC does the first freedom of the, so it still does all the steps it did before, it syncs the files in the first time, but then instead of doing the one dump of the container, it does the first freedom, and then we check if the percentage of pages which are unchanged during the pre-dump iteration are above a certain threshold, the default is 70%, so 70% of the memory pages do not change, we do one final freedom and do the actual migration to the destination system. In addition to the threshold check, we have also another check, to check the maximum amount of pre-copy cycles so that we don't get into an answers loop and do pre-copy cycles forever because the memory just changes the path, and if both checks are true, then we move the container to the destination side, and those are the settings you can set to configure LXC to do pre-copy migration and how you want to do it, the first one is to disable and enable pre-copy migration, the second one is the threshold in percentage to tell what memory needs to be the same before both being in the last, being in the top of the cycle 200, and I think the threshold is 70% that we do maximum 10 cycles because it does not set it to a higher or lower number. So, and we were really, really happy, it was really cool that it worked, and we were really sure that it actually helps or that it makes things worse. So, we were looking for a test case, and a test case was a bit complicated in the beginning to define because we wanted to see if the container downtime during migration actually decreases and we wanted to see from the viewpoint of the container. So, what we did, we have created a process in the container, it allocates one gigabyte of memory and then it reads out the clock, it sleeps for 100 microseconds, it reads out the clock again, and then we compare those two values and if it's lower than 400 microseconds then we think we didn't sleep, we didn't miss one of our sleep cycles and if it's higher than 400 microseconds then something was missing and we could have been migrated. And if we do an actual migration, the number of sleep cycles is like 20,000, so we're missing a lot of cycles so the number is really big there. But we can then read out that number from the locks and try to calculate backwards how much time the process was actually sleeping during the migration. And in addition to sleeping, we are also writing one byte in each memory page of the gigabyte of memory and we do it, we either write no memory, so we just do the sleep cycle or we write every memory, or we touch every memory page every second, every fourth memory page and our testing was done through the end and I have a short demo which I can show it and it was using latest LAC, LAC from Git and that's where I come back to our results. And so the leftmost test page the memory doesn't change at all so you see pre-copy migration is really good because we can do one dump while the container keeps on running and transfer the dump while the container keeps on running and the actual downtime is really small because the delta we have to transfer is like, I don't know. That's the case and because the memory doesn't change so this is completely useless test page because it's unrealistic we have a process which doesn't change its memory but it's as good as a baseline to know how fast it can be and so if our test case is written correctly and we calculated the sleep time correctly backwards from the missing sleep cycles the actual downtime goes down from like 22 seconds to two or three seconds which is pretty nice and the second problem is we are changing every fourth byte and the third is we're changing every second every fourth byte, every fourth memory page the third is we're changing every second memory page and the last is we're touching every memory page and I actually don't know why pre-copy migration is better but it should actually be worse that maybe my test page was not perfect It was possible So we don't understand the last one but we compressed the previews really good so we're happy there and now, okay, now just an overview about the container run times I've been working on to understand in how far they support container migration and optimized container migration so LXB, LXB does now pre-copy supported container migration RunC has support for pre-copy and post-copy and the combination of both optimizations to migrate the container from the bottom, I only know they have some support from Cree but I prefer for container migration without any optimization but I never actually use it and now for the demo so what I have here is a container, it's L-time it's really simple and I say LXB move A1 to my kernel system and I can put a time before you don't see anything it just moves the container and it's gone and say LXB list now it's here and I can move it back again like here again I can also enable pre-copy support but unfortunately this test container is so small that we don't see anything but that's what it's about yeah, any questions? Anyone like right can just do about in a TCP session yes, so the test cases we made here was for containers without networking but I've done container migration using RunC and TCP and you really see of course it doesn't matter the IP address has to be the same if it changes it doesn't work it's the same if it works I don't know, it works on LXB with... yeah, it also works as long as you need to make sure that both of those are connected to the same physical network and for you to do the right thing you might just need to wait for ARM to have that specific approach to a new machine I think that's true but there's so much to do send the... send an ARM packet someone's got to work with something like that to just technically send you back at your switch and hey, I've hit an ARM so that you don't need to wait for it to work so LXB doesn't work because you don't want to run a safe code so it might be on two different machines yes but the move works yes other questions? yes I want to ask you what is going on if one machine have a time shift in seconds and minutes I mean the servers have to be friends in time and you restore a container and time is in the past or in the future so I would say this is only a problem of the applications running inside of the container and their time sensitive I never thought about it but I know there are applications which just stop because time changes but I've never seen any problems with the process or container migration you know something that looks like the systems don't care so I don't know if that's how it is if that's how it is or if they do so yeah, that's a really good question but I don't know so I've met whenever I use it I never think it maybe my machine will sink in time so but I actually don't know how the host time relates to the container okay I'm curious you how would the method you proposed then I first of all I'm very interested in using this this kind of stuff but one thing which I was interested in is you mentioned databases for example as an example of something which you need to keep a lot of state another thing which is true about databases they tend to have a lot of things like critical sections places where if you have a page fold things are pretty bad already and unfortunately what you propose involves a page fold which could take on the order like a ridiculous amount of time so is there any way that you can work to prioritize kind of these these page folds which might end up occurring in critical sections the question was what is the fault of the and the lazy migration what happens if a page is missing that's really critical for the process I guess my question is like there are some things which are fine to listen in migration there are definitely some pages I definitely don't want to listen in migration like if this page is not here we should not be running I saw you do have some controls like this particular percent should be already there but do you have like the application can specify through some kind of advisory for example if this is not here I cannot run recently and we didn't do the lazy migration it was just for convenience what crew can do it's only the pre-copy migration for the LAC integration okay as kind of a second part to that I'm curious if you thought about using working set metrics to work out which pages might be worth prioritizing so I guess this is about crew's lazy migration support and how it actually works to some extent I guess I'm mostly just curious how crew selects the pages which are the most preferable to do the instant migration so currently it doesn't move but this is a place where crew needs development to be smarter about its lazy migration and algorithms so right now it only does you request something we send it to you and if at some point it's not even requested you just push all the other stuff to the destination cool, all right let's talk a little bit about that and I think we're out of time so thank you very much